Question

In: Statistics and Probability

a. What is data mining? b. What is specification searching? c. Engaging in such behavior when...

a. What is data mining?

b. What is specification searching?

c. Engaging in such behavior when conducting empirical research is generally viewed negatively? Why?

Solutions

Expert Solution

a) Data Mining:

Data Mining is an important analytical process designed to explore data. The most important task in data mining is to extract non-trivial nuggets from large amounts of data. The important steps are to extract the targeted data from the complete data by using cleaning/filtering process, Then pre- process the data by applying techniques to sort/arrange it in a particular manner, use the patterns or models on the transformed data to gain knowledge or extract insight from the data.

The main confusion is that sometimes data mining is understood as like pulling the customers belonging to the same city, but actually it is the process of finding out no of customers that have a similar taste /preferences so as to take a business decision to improve gains/profits. Organizations are using data mining to identify association among internal aspects like cost, employee skills and external aspects like economic scenario, competition etc.

So, Data Mining constitute of two different powers, one is descriptive i.e. to find interesting, human-interpretable patterns that describe the data. And the other is predictive i.e. using features to predict unknown or future values of the same or other feature.

b) Specification Searching:

Any process, such as running multiple regressions algorithms and disproportionately reporting those that are significant or collecting more data until results are significant, that leads the statistical significance of reported results to be inflated is called as specification searching. According to me the specification is created with respect to business and searching the data as per your requirements is what we can call in layman terms as specification searching. For example, if we need to find out the growth of a stock price or the way a stock would be behaving , we will have to analyse/study the trend of that stock and searching/managing data corresponding to the requirements is what we can call as specification searching.

c) The question is not very clear about what is implied by "such behavior".. Nevertheless I would like to answer the question upto my best.

Empirical research

Empirical research is the collection and analysis of primary data based on direct observation or experiences in the 'field'. The data collection is based on the actual experience and not the factual beliefs and thus reflects the most practical information on the subject being studied upon. The knowledge gained from this data will define the population, behaviour or phenomena being studied.

The organisation or the group researching the data should be impartial while acquiring the data and thus will play a vital role in increasing the internal validity. While obtaining the data, the biased nature of the researcher is considered negative as this would be having an impact on the data collected and thus would interfere with the results or predictions made. Thus these kind of behaviours should be avoided.


Related Solutions

(a) Briefly explain the data mining process. (b) What are the different problems that data mining...
(a) Briefly explain the data mining process. (b) What are the different problems that data mining can solve in general? Explain.
What are some pros and cons to data mining? Provide an example of when data mining...
What are some pros and cons to data mining? Provide an example of when data mining was used and the outcome provided an incorrect assumption or issue. How can these types of situations be avoided in the future?
What is data mining? What are the steps in the data mining pipeline? What are the...
What is data mining? What are the steps in the data mining pipeline? What are the different kinds of tasks in data mining? What are the different types of data? How do you manage data quality? What is sampling? How do you design algorithms for sampling? How do you compute similarity and dissimilarity in data? How do you mine frequent itemsets and discover association rules from transaction data? How are space constraints handles in transactional data while designing algorithms to...
When we consider the potential ramifications of engaging in certain behavior we are using. based ethics
When we consider the potential ramifications of engaging in certain behavior we are using. based ethics
Given the data below, a lower specification of 78.2, and an upper specification of 98.9, what...
Given the data below, a lower specification of 78.2, and an upper specification of 98.9, what is the long term process performance (Ppk)? Data 101.4791 94.02297 95.41277 106.7218 90.35416 86.9332 94.87044 95.91265 93.98042 108.558 86.17921 85.01441 96.14778 92.59247 92.49536 87.4595 104.3463 90.57274 99.38992 80.61536 82.31772 97.05331 87.64293 103.3648 98.09292 87.72921 110.1765 86.76847 87.22422 94.88148 86.01183 91.43283 104.0907 97.77132 98.69138 95.55565 106.2402 95.96255 88.05735 100.2796 99.00995 86.18458 95.41366 99.32314 95.75733 88.82675 93.39986 98.34077 94.72198 99.14256 92.37767 94.94969 89.63531 87.56148 88.02731 97.57498...
Given the data below, a lower specification of 62.6, and an upper specification of 101.8, what...
Given the data below, a lower specification of 62.6, and an upper specification of 101.8, what is the long term process performance (ppk)? Data 66.06284 82.57716 78.64111 92.72893 76.18137 71.46201 76.24239 74.83622 69.87486 77.90479 82.39439 79.18856 84.34492 77.32829 80.50536 83.36017 97.34745 84.56226 87.95131 65.64412 70.73183 74.28879 89.07007 78.50745 77.51397 89.04946 73.75787 91.30598 87.12589 89.29855 81.398 86.52962 84.33249 80.48321 81.87089 83.54964 71.19464 80.02001 90.00112 82.29257 77.55125 88.07639 88.95467 83.92542 88.33509 84.36723 77.89679 82.38985 67.81415 80.68263 87.25767 81.1521 82.15546 72.52171 67.58353 86.11663...
C++ - Checks the relevance of a data structure in terms of following the interface specification...
C++ - Checks the relevance of a data structure in terms of following the interface specification for an ADT that represents a linear data structure: Depending on the ADT of linear data structure, you must create the operations CRUD (Create, Read (search), Update, Delete) elements in the data structure. Some operations do not apply for certain data structures Create: Description: Insert an element in the data structure (create) according to the access policy of the structure Input:Data structure and element...
A descriptive explanation of process a case work : social work) A) Engaging B) exploration c.)...
A descriptive explanation of process a case work : social work) A) Engaging B) exploration c.) plannin Intervention E) Evaluation and termination
What is Data mining in healthcare? Explain briefly the most common challenges of data mining on...
What is Data mining in healthcare? Explain briefly the most common challenges of data mining on Medical Databases? add the references at the end of your paper
When looking at a Software Requirements Specification document: What are the strengths of an SRS? What...
When looking at a Software Requirements Specification document: What are the strengths of an SRS? What are the weaknesses of an SRS?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT