Question

In: Math

Suppose you have been building a model using the k-means clustering algorithm and you keep finding...

Suppose you have been building a model using the k-means clustering algorithm and you keep finding that a certain variable is essentially ignored by the model (in other words, the variable is very similarly distributed across all clusters). Describe a method that can be used to exaggerate or minimize the impact of a variable when using k-means clustering. Why does this method work?

no additional info available, predictive analysis

Solutions

Expert Solution

Method?

A method that can be used to exaggerate or minimize the impact of a variable is Projected clustering.

What is Projected clustering, and why does it work?

Projected clustering is often used to cluster high dimensional data, when the variability among different variables is different, and on a different scale.

Projected clustering assigns each point in the dataset to a new unique cluster, but each cluster may also exist in different subspaces altogether. The general approach of using projected clustering is to use a special distance function (which may be user designed) together with a regular clustering algorithm.

For example, the PreDeCon algorithm which is used, often checks which attributes among the available ones seem to support a clustering for each of the available points, and then adjusts the distance function accordingly, such that the dimensions which have low variance are often amplified in the distance function.

If the distance function which is being used weights the attributes differently, but never with a 0 (and hence never ever drops the irrelevant attributes), the said algorithm is called a "soft"-projected clustering algorithm, signifiying that the number of variables never decreases, only their relevance is affected/ changed.


Related Solutions

In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering.
In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering.
Question: In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering. The...
Question: In MATLAB, Implement a hybrid clustering algorithm which combines hierarchical clustering and k-means clustering. The hybrid algorithm will use hierarchical clustering to produce stable clusters and k-means clustering will initialize seeds based on the centroids of the produced stable clusters (instead of randomly initialized seeds) Background Information: Both hierarchal clustering and k-means clustering group similar data objects into clusters. However, the two algorithms have their pros and cons. For example, hierarchical clustering produces stable clusters while k-means clustering generates...
K-means clustering: a. In the k-means lab, you examined different values for k using the "knee"...
K-means clustering: a. In the k-means lab, you examined different values for k using the "knee" heuristic to pick the best value of k. Explain what is so special about the k values on the “knee”? Hint: There are two properties that together make these values of k special. b. Give an example of a type of data (data type) that k-means should not be used for and explain why.
What is clustering? Explain how K-Means Clustering Algorithm works? What are the Advantages and disadvantages of...
What is clustering? Explain how K-Means Clustering Algorithm works? What are the Advantages and disadvantages of Clustering ALgorithms discussed in our class (K-Means,Hierchal)? Which Clustering Algorithm is better K-Means or hierarchical Clustering? Explain with a proper example which is better algorithm?
INTRODUCTION TO DATA MINING Question 3: K-means clustering Use the k-means algorithm and Euclidean distance to...
INTRODUCTION TO DATA MINING Question 3: K-means clustering Use the k-means algorithm and Euclidean distance to cluster the following seven examples into two clusters: A1=(1, 1), A2=(1.5, 2), A3=(3,4), A4=(5,7), A5=(3.5,5), A6=(4.5,5), A7=(3.5,4.5) Suppose that the initial seeds (centers of each cluster) are A1 and A4. Run the k-means algorithm for 2 epochs. At the end of this epoch show: a) Distance matrix by calculating Euclidean distance. b) The new clusters (i.e. the examples belonging to each cluster) c) The...
One way to cluster objects is called k-means clustering. The goal is to find k different...
One way to cluster objects is called k-means clustering. The goal is to find k different clusters, each represented by a "prototype", defined as the centroid of cluster. The centroid is computed as follows: the jth value in the centroid is the mean (average) of the jth values of all the members of the cluster. Our goal is for every member a cluster to be closer to that cluster's prototype than to any of the other prototypes. Thus a prototype...
Try to use K means clustering to segment an image. You can use Matlab function: kmeans(...
Try to use K means clustering to segment an image. You can use Matlab function: kmeans( )
Question 1. What is k-means clustering? How does it work? Give a few examples that you...
Question 1. What is k-means clustering? How does it work? Give a few examples that you would use this algorithm. ---------------- Question 2. What is k-nearest neighbor? How does it work? Give a few examples that you would use this algorithm.
In your own words, summarize the steps of K-means clustering. Make sure to give example(s). What...
In your own words, summarize the steps of K-means clustering. Make sure to give example(s). What are the advantages and disadvantages of the K-means clustering? Any limitations?
Data mining--> Please Perform Principal Component Analysis and K-Means Clustering on the Give dataset Below. [50...
Data mining--> Please Perform Principal Component Analysis and K-Means Clustering on the Give dataset Below. [50 Points] Dataset Link : https://dataminingcsc6740.s3-us-west-2.amazonaws.com/datasets/homework_2.csv 10 Points for Data Preprocessing. 15 Points for PCA Algorithm along with plots and Results Explaination. 15 Points for K-Means Algorithm with plots and Results Explination. 10 Points for Comparing the results between PCA and K-Means and whats your infer- ence from your ouputs of the algorithms. Hints: As per the data preprocessing step convert all the variables in...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT