Sunday, June 18, 2006

What is Data Mining

C.R. Rao (of the the famous Cramer-Rao Inequality) is certainly the greatest living statistician and probably an all-time great. This is what he has said about "data mining".

Much of current statistical methodology is model based, without any guidelines for the choice of the underlying stochastic model for data. The methodology developed was geared to the analysis of small data sets (samples). However, with modern technology and available resources, it is now possible to generate large data sets in any investigation. This raises new problems in computing and the possibility of extracting information from data without using a stochastic model. A new methodology is being forged for this purpose under the name of "data mining" or "computational statistics."

-Interview with Anil Bera in Econometric Theory, 19, 2003, 331-400

0 Comments:

Post a Comment

<< Home