In: Computer Science
Drop any missing values using drop_na(). Then select all variables except name, mfr, type, weight, shelf, cups, rating to create a subset of several features we will use to cluster the different cereals variables.
I just need to know how to code the drop_na() and to create a subset of several features, which are the variables.
Missing info you need.
All the variables from the data.
name mfr
type calories
protein fat
sodium fiber
carbo sugars
potass vitamins
shelf weight cups
rating
Assuming that there is a list, and the data is provided as a map with the keys as the variables. We will return a map with only the features (variables) required along with the list (subset).
Map<List> drop_na(List<HashMap> dataSet) {
Map<String, List> finalData = new HashMap<String, List>();
String [] notRequired = {"name", "mfr", "type", "weight", "shelf", "cups", "rating"};
for(HashMap data : dataSet) {
Set<String> keys = data.keySet();
for(String key : keys) {
boolean toInclude = true;
for(int i=0; i<notRequired.length; i++) {
if(key.equals(notRequired[i])) {
toInclude = false;
break;
}
}
if(toInclude) {
if(finalData.get(key)==null) {
List temp = new ArrayList();
temp.add(data.get(key));
finalData.put(key, temp);
}
else {
List temp = finalData.get(key);
temp.add(data.get(key));
}
}
}
}
return finalData;
}