Question

In: Math

Describe how and why you should partition your data when using classification techniques like k-nearest neighbors...

Describe how and why you should partition your data when using classification techniques like k-nearest neighbors and logistic regression.

Solutions

Expert Solution

Partitioning is now a days generally referred as Test Train set splitting process of the data. For logistic / knn we can split the data as follows.

A brief description of the process is given below

  • training set—a subset to train a model. (Generally 70-80% of the data)
  • test set—a subset to test the trained model. (Generally 30-20% of the data)

You could imagine slicing the single data set as follows:

Slicing a single data set into a training set and test set.

Make sure that your test set meets the following two conditions:

  • Is large enough to yield statistically meaningful results.
  • Is representative of the data set as a whole. In other words, don't pick a test set with different characteristics than the training set.

Assuming that your test set meets the preceding two conditions, your goal is to create a model that generalizes well to new data. Our test set serves as a proxy for new data. For example, consider the following figure. Notice that the model learned for the training data is very simple. This model doesn't do a perfect job—a few predictions are wrong. However, this model does about as well on the test data as it does on the training data. In other words, this simple model does not over-fit the training data.

Hope the above answer has helped you in understanding the problem. Please upvote the ans if it has really helped you. Good Luck!!


Related Solutions

Project 4 – The Nearest Neighbors Classification Algorithm This project will require you to implement a...
Project 4 – The Nearest Neighbors Classification Algorithm This project will require you to implement a version of the Nearest Neighbors classification algorithm. This version, the Three Nearest Neighbors (or 3NN for short), is one of the more intuitive classification algorithms, and one of the easier ones to code. This Nearest Neighbors family of algorithms is useful because it does not require any special “training” to work. You simply need previous data to compare to. What is classification? With classification,...
you are given the option of when you would like to wash your neighbors car. You...
you are given the option of when you would like to wash your neighbors car. You may choose any time in the next three days (today, tomorrow or the day after tomorrow). Your consumption utility for washing a car is u(c) = -50c. You have a daily discount rate of 0.25. a. Using the standard economic model of exponential discounting, when do you choose to wash your neighbors car? b. If you derive utility from anticipation and consumption and your...
Which of the following statements are true? Briefly explain your answer. 1. Training a k-nearest-neighbors classifier...
Which of the following statements are true? Briefly explain your answer. 1. Training a k-nearest-neighbors classifier takes less computational time than testing it. 2. The more training examples, the more accurate the prediction of a k-nearest-neighbors. 3. k-nearest-neighbors cannot be used for regression. 4. A k-nearest-neighbors is sensitive to the number of features.
1 When using ASK, FSK or PSK techniques, how can we increase data rate? a. by...
1 When using ASK, FSK or PSK techniques, how can we increase data rate? a. by decreasing "M" value b. Hartley’s Law tells us that baud rate will always equal data rate when using ASK, FSK or PSK c. by increasing M, we can increase the data rate when compared to the baud rate d. by increasing M, we can increase the baud rate when compared to the data rate 2 Determine the number of “M” value and number of...
Explain using examples, ‘how’ and ‘why’ you would collect Sensitivity and Specificity data when performing a...
Explain using examples, ‘how’ and ‘why’ you would collect Sensitivity and Specificity data when performing a Validation Study on a new DNA STR Profiling Kit.
Describe the process of choosing an appropriate way to present your data. Scenarios you describe should...
Describe the process of choosing an appropriate way to present your data. Scenarios you describe should include, using tables, scatterplots, boxplots and/or bar charts. Can anyone help me with the above question? Thank you
You would like to have $600,000 when you retire in 30 years. How much should you...
You would like to have $600,000 when you retire in 30 years. How much should you invest each quarter if you can earn a rate of 3% compounded quarterly? a) How much should you deposit each quarter? $ b) How much total money will you put into the account? $ c) How much total interest will you earn?
Describe life-cycle budgeting and life-cycle costing and when companies should use these techniques - Describe price...
Describe life-cycle budgeting and life-cycle costing and when companies should use these techniques - Describe price discrimination and peak-load pricing How do antitrust laws affect pricing
Do you think teens should be treated like adults when they commit serious crimes? Why or...
Do you think teens should be treated like adults when they commit serious crimes? Why or why not? What is the connection between drugs and crime? Are you in favor of decriminalization of any federally controlled substances? Why or why not? What is the difference between legalization and decriminalization of drugs? What are some of the causes of terrorism? List at least 3. List 3 efforts are being made to combat terrorism in the United States. What is the relationship...
Recall a time when you felt like you could not trust members on your team. Why...
Recall a time when you felt like you could not trust members on your team. Why did you feel that way? How did that affect the team's performance? Can you think of strategies that can help build trust among virtual team members? Imagine you are a manager at a national corporation. You have been asked to select employees for a virtual problem-solving team. What types of employees would you include and why?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT