Question

In: Computer Science

During class we used R for our practical data mining applications. (a) We often used objects...

During class we used R for our practical data mining applications.

(a) We often used objects of class data.frame as data structures. Explain the key features of the data.frame objects. What are some of its strengths and weaknesses?

(b) Which elements (i.e., rows and columns) of the mtcars data.frame does the following command return?

mtcars[mtcars$mpg < mean (mtcars$mpg), c ("mpg" , "hp" )]

(c) What is an R package?

(e) What command would you use to fit a logistic regression model?

(f) Write the first six lines of a hypothetical external data file named data.csv that will be read without error by the following R command:

mydata <- read.table ("data.csv" , skip=3 , header=FALSE , sep="," ,

colClasses=c ("character" , rep ("numeric" , 3 ) ) )

Lines of data.csv:

1 :

2 :

3 :

4 :

5 :

6 :

Solutions

Expert Solution

Ans (a):- A data frame is a table or a two-dimensional array-like structure in which each column conatins values of one variable and each row conatins one set of values from each column.

Key features of dataframe are:-

  • The column names should be non-empty.
  • The row names should be unique.
  • The data stored in a data frame can be of numeric, factor ot character type.
  • Each column should conatin same number of data items.

Ans (b):- R package are collections of fucntions and data sets developed by the community. they increase the power of R by improving existing base R functionalities, or by adding new ones. For example, if you are usually working with data frames, probably you will have heard about dplur or data.

Ans (e):- This way, you tell glm() to put fit a logistic regression model instead of one of the many other models that can be fit to the glm. As you can see, summary() returns the estimate, standard errors, z-score, and p-values on each of the coefficients.

Just as ordinary least square regression is the method used to estimate coefficients for the best fit line in linear regression, logistic regression uses meximum likelihood estimate (MLE) to obtain the model coefficient that relate predictors to the target.


Related Solutions

We often need to shuffle data in many applications. Consider a case of a game that...
We often need to shuffle data in many applications. Consider a case of a game that plays cards for example or a system that does a draw for lottery. The purpose of this lab is to write functions that will help us shuffle elements of an array. Create a function that takes an array of integers and its size as parameters and populates the array with the values: 0, 1, 2, ..., size-1, where size is the size of the...
write reflection about data mining theory and applications course
write reflection about data mining theory and applications course
There are many practical applications of biological knowledge that we mostly accept and take for granted....
There are many practical applications of biological knowledge that we mostly accept and take for granted. Many (though not all) modern techniques in medicine, agriculture and sanitation are examples of widely accepted practices. However, our knowledge about biology has enabled us to do things that raise ethical and social issues. Cloning is one example of such a technology, but there are many others. Choose one such example, then: Describe the technology. Describe the ethical and social issues that are raised...
How do you the practical properties and visual applications of painting media influence our the selection...
How do you the practical properties and visual applications of painting media influence our the selection of process and medium? Might one media better suit the message an artist wants to communicate when they create their work then another? Explain. Printmaking is a process oriented medium. Many of the artist who work in this medium enjoy the step-by-step character of the process, and the stark graphic quality of prints. Why would an artist who wants an unlimited number of prints...
Cyanide is used in several industrial applications, such as mining gold and electroplating silver. A danger...
Cyanide is used in several industrial applications, such as mining gold and electroplating silver. A danger of working with cyanide solutions is the potential hazard of releasing highly toxic hydrogen cyanide gas. What conditions are necessary to ensure that less than 1% of the cyanide in a particular solution is chemically present as HCN rather than CN-? (HCN pKa= 9.2) (A) pH > 11.2 (B) pH > 10.2 (C) pH > 9.2 (D) pH > 8.2 (E) pH > 7.2...
11) When we solved problems dealing with objects in free-fall, we often exploited a symmetry that...
11) When we solved problems dealing with objects in free-fall, we often exploited a symmetry that led to the conclusion that the time it takes an object to go from a starting height to a maximum height (where the vertical (y-direction) velocity is zero) is the same as the time that it takes for the object to fall from that maximum height back to the starting height. If we introduce air resistance as a force similar to a friction force,...
We will be exploring decay reactions used in radioactive dating of objects. This will be a...
We will be exploring decay reactions used in radioactive dating of objects. This will be a short lab to leave you time to prepare for the Unit 3 project. 1. Explore the PhET simulation Radioactive Dating Game . Try all the tabs to figure out why there is more than one element used to estimate how old things might be. 2. What elements’ isotopes are used to estimate how old something is? Why do are more than one type used?...
R problem 1. Student records. Create an S3 class studentRecord for objects that are a list...
R problem 1. Student records. Create an S3 class studentRecord for objects that are a list with the named elements ‘name’, ‘subjects completed’, ‘grades’, and ‘credit’. Write a studentRecord method for the generic function mean, which returns a weighted GPA, with subjects weighted by credit. Also write a studentRecord method for print, which employs some nice formatting, perhaps arranging subjects by year code. Finally create a further class for a cohort of students, and write methods for mean and print...
Data Science for Data Mining Why is it often better to perform reductions using operators rather...
Data Science for Data Mining Why is it often better to perform reductions using operators rather than excluding attributes or observations as data are imported? (Write Minimum 100 words)
We see that art is often a form of social disobedience. Whether it is on our...
We see that art is often a form of social disobedience. Whether it is on our bodies in the form of a tattoo, on a canvas, or on the wall in the form of graffiti. in many cultures, body art is associated with enhancing beauty, and thus related to gender So, can we see the same expression of a social goal in regards to political art? Does creating political art actually help a society to address social problems and ills...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT