In: Computer Science
Compare and contrast SEMMA and CRISP-DM. Discuss when application of SEMMA or CRISP-DM might be most appropriate by providing a specific example related to each of the two processes.
Answer:-----------
SEMMA:----------------
SEMMA is the methodology for data mining processes proposed by the
SAS Institute--one of the most important companies that develop
statistical software applications--with the software package
Enterprise Miner .
In SEMMA, SAS offers a data mining process that consists of five
steps: sample, explore, modify, model, and assess.
This methodology begins by analyzing a small portion of a large
data set.
The next step is to explore the data and the information by looking
for trends and anomalies in the data with the purpose of gaining
some information about the data.
In the third phase, data is modified to create, select, and
transform the variables for the study.
A valid model is then created using the software tools, which
search automatically for combinations of rules and patterns that
reliably predict the observed results.
Finally, the last step of the SEMMA methodology consists of
evaluating the usefulness and reliability of the
findings.
CRISP-DM:------------------
Another data mining methodology is CRISP-DM (cross-industry
standard process for data mining).
CRISP-DM was originally conceived in late 1996, but it was not
completed until 1999; it is intended to be industry-, tool-, and
application-neutral.It was developed by a consortium of data mining
vendors and companies through an effort funded by the European
Commission.
The four partners of this project were NCR, Daimler Chrysler, OHRA,
and Integral Solutions Limited (ISL), which became part of SPSS in
1998. The CRISP-DM 1.0 methodology comprises a hierarchical
breakdown in which the data mining process is divided into four
levels of 28 abstraction: phases, generic tasks, specialized tasks,
and process instances.
CRIPS-DM 1.0 also recognizes four different dimensions of data
mining context that drive the generic and specialized levels of the
CRISP-DM.
The four dimensions are :
1) application domain
2) problem type
3) technical aspect,
4) tools and techniques.