Question

In: Computer Science

can anyone explain the code from R below? spam is a dataset sample <- sample( c(TRUE,...

can anyone explain the code from R below?

spam is a dataset

sample <- sample( c(TRUE, FALSE), nrow(spam), replace=TRUE)

train <- spam[sample,]
test <- spam[!sample,]

when i run train and test, we got two different datasets that are split from the original spam dataset. But i don't understand the first line of code. not sure why this line of code can split data into two sets.

Solutions

Expert Solution

sample() is an inbuilt matlab function which randomly reorders the elements passed as the first argument and creates a vector of the size passed in the second argument.

sample <- sample( c(TRUE, FALSE), nrow(spam), replace=TRUE)

Here the sample function returns a vector whose size is the number of rows of the spam dataset and it contains randomised TRUE and FALSE values .

here is what sample function is returning (I've used a random dataset)

train <- spam[sample,]

This line returns those rows whose corresponding value in sample vector is TRUE

i.e It returns ith row if ith value in sample vector is TRUE


test <- spam[!sample,]  

This line returns those rows whose corresponding value in sample vector is FALSE

i.e It returns ith row if the ith value in sample vector is FALSE

That is why you could see that train and test contains different rows and thus sample function has split your dataset.

Please Leave a LIKE . If you have any further query you can ask in the comments


Related Solutions

Using dataset "PlantGrowth" in R (r code) Construct a 95% confidence interval for the true mean...
Using dataset "PlantGrowth" in R (r code) Construct a 95% confidence interval for the true mean weight. Interpret the confidence interval in in the context of the problem.
anyone can explain why this code in R doesn't work? i also tried to use filter...
anyone can explain why this code in R doesn't work? i also tried to use filter function, no hope either! b <- subset(sub,NEIGHBORHOOD == "HARLEM-CENTRAL") the name of the variable is correct, the condition. is correct too.
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing...
ANSWER USING R CODE Using the dataset 'LakeHuron' which is a built in R dataset describing the level in feet of Lake Huron from 1872- 1972. To assign the values into an ordinary vector,x, we can do the following 'x <- as.vector(LakeHuron)'. From there, we can access the data easily. Assume the values in X are a random sample from a normal population with distribution X. Also assume the X has an unknown mean and unknown standard deviation. With this...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can...
Fitting a linear model using R a. Read the Toluca.txt dataset into R (this dataset can be found on Canvas). Now fit a simple linear regression model with X = lotSize and Y = workHrs. Summarize the output from the model: the least square estimators, their standard errors, and corresponding p-values. b. Draw the scatterplot of Y versus X and add the least squares line to the scatterplot. c. Obtain the fitted values ˆyi and residuals ei . Print the...
In the dataset airways (see R code), we have the change in airflow from moderate exercise...
In the dataset airways (see R code), we have the change in airflow from moderate exercise for 19 subjects under 2 different exposure conditions – regular air (air) and 0.25% sulpher dioxide (so2). a) Look at the correlation, and use the t-table to test the null hypothesis that air flow change under these two conditions is uncorrelated. Test at significance level 0.05. Show your work. b) Use a linear model and the summary function in R to test the null...
Does anyone know how to import data from Microsoft Excel code to use in R programming...
Does anyone know how to import data from Microsoft Excel code to use in R programming ?
In c++, Can anyone make these code int user input as user have to put all...
In c++, Can anyone make these code int user input as user have to put all the amount. int main()    {        Polygon p1(3,4,4.5,5.5);        Polygon p2(4,4,5.5,6.5);        Polygon p3(5,4,6.5,7.5);               cout<<"Area of Polygon#1:"<<p1.getArea()<<endl;        cout<<"Perimeter of Polygon#1:"<<p1.getPerimeter()<<endl;               cout<<"\nArea of Polygon#2:"<<p2.getArea()<<endl;        cout<<"Perimeter of Polygon#2:"<<p2.getPerimeter()<<endl;               cout<<"\nArea of Polygon#3:"<<p3.getArea()<<endl;        cout<<"Perimeter of Polygon#3:"<<p3.getPerimeter()<<endl;                         return 0;     ...
R has many build-in dataset. The data mtcars is one of them. The following R code...
R has many build-in dataset. The data mtcars is one of them. The following R code read-in data and save the data to input.                   input <- mtcars[,c("am","cyl","hp","wt")]              Write a few line of R code to conduct a regression analysis with am as the response variable, and              cyl, hp, wt as explanation variables.
Hey anyone good with R studio? How do you create a square shape with R code...
Hey anyone good with R studio? How do you create a square shape with R code by using plot() and lines() please show your code !
Sample Code and Models Run each of the models below and explain the code function and...
Sample Code and Models Run each of the models below and explain the code function and your findings for each system, do they agree/disagree with what you understand and why ?? Matlab Code % Winter 2018 Control Engineering % Lab No.3 - Root Locus problems % Mark Clarke clear s = tf('s') K = 1150; %Proportional Controller Gain, May need to be altered? % Enter Model 1 % This is a model of a simple 2nd order with no zeros...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT