In: Statistics and Probability
a. What is data mining?
b. What is specification searching?
c. Engaging in such behavior when conducting empirical research is generally viewed negatively? Why?
a) Data Mining:
Data Mining is an important analytical process designed to explore data. The most important task in data mining is to extract non-trivial nuggets from large amounts of data. The important steps are to extract the targeted data from the complete data by using cleaning/filtering process, Then pre- process the data by applying techniques to sort/arrange it in a particular manner, use the patterns or models on the transformed data to gain knowledge or extract insight from the data.
The main confusion is that sometimes data mining is understood as like pulling the customers belonging to the same city, but actually it is the process of finding out no of customers that have a similar taste /preferences so as to take a business decision to improve gains/profits. Organizations are using data mining to identify association among internal aspects like cost, employee skills and external aspects like economic scenario, competition etc.
So, Data Mining constitute of two different powers, one is descriptive i.e. to find interesting, human-interpretable patterns that describe the data. And the other is predictive i.e. using features to predict unknown or future values of the same or other feature.
b) Specification Searching:
Any process, such as running multiple regressions algorithms and disproportionately reporting those that are significant or collecting more data until results are significant, that leads the statistical significance of reported results to be inflated is called as specification searching. According to me the specification is created with respect to business and searching the data as per your requirements is what we can call in layman terms as specification searching. For example, if we need to find out the growth of a stock price or the way a stock would be behaving , we will have to analyse/study the trend of that stock and searching/managing data corresponding to the requirements is what we can call as specification searching.
c) The question is not very clear about what is implied by "such behavior".. Nevertheless I would like to answer the question upto my best.
Empirical research
Empirical research is the collection and analysis of primary data based on direct observation or experiences in the 'field'. The data collection is based on the actual experience and not the factual beliefs and thus reflects the most practical information on the subject being studied upon. The knowledge gained from this data will define the population, behaviour or phenomena being studied.
The organisation or the group researching the data should be impartial while acquiring the data and thus will play a vital role in increasing the internal validity. While obtaining the data, the biased nature of the researcher is considered negative as this would be having an impact on the data collected and thus would interfere with the results or predictions made. Thus these kind of behaviours should be avoided.