Question

In: Computer Science

Standardization Goal: Perform the transformation on validation and test sets in a right way The following...

Standardization

Goal: Perform the transformation on validation and test sets in a right way The following code shows two ways to standardize validation and test sets (here is only shown on a test set).

  • 1- Run the following code to see the values of X_test_std1 and X_test_std2
  • 2- Re-apply standardization using StandrdScaler from scikit-learn
  • 3- Assuming the StandardScaler result is the correct transformation, is the following statement correct?
  • "We should re-use the parameters estimated from the training set to transform validation and test sets"

Code:

LN [] import pandas as pd

X_train = pd.DataFrame([10 ,20, 30])
X_test = pd.DataFrame([5,6,7])

mu_train, sigma_train = X_train.mean(axis=0), X_train.std(axis=0)
mu_test, sigma_test = X_test.mean(axis=0), X_test.std(axis=0)

X_train_std = (X_train - mu_train) / sigma_train
X_test_std1 = (X_test - mu_test) / sigma_test
X_test_std2 = (X_test - mu_train) / sigma_train

LN [] # Add your code for step 3 here

Solutions

Expert Solution

Since jupyter notebook can't be added, I amadding the codes as separate scripts.

1. Run the given code

import pandas as pd

X_train = pd.DataFrame([10 ,20, 30])
X_test = pd.DataFrame([5,6,7])

mu_train, sigma_train = X_train.mean(axis=0), X_train.std(axis=0)
mu_test, sigma_test = X_test.mean(axis=0), X_test.std(axis=0)

X_train_std = (X_train - mu_train) / sigma_train
X_test_std1 = (X_test - mu_test) / sigma_test
X_test_std2 = (X_test - mu_train) / sigma_train

2. Apply standardization using sklearn library

from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_test)
scaler.transform(X_test)

3. What is actually indicated by re using the parameters from train set is that the scaler is fit to the training data and then used to transform the test data and the validation. If the validation and test data are splitter from the same dataset as the train then this won't affect much and will make the program better. But if the problem statement is such that the datasets are from very different circumstances then re using the parameters received from fitting the train data is not a very idea to evaluate the validation and test datasets.

Let me know in comments if further explanation is needed for any part. Please leave an upvote if this helps at all.


Related Solutions

Does it make sense to perform a hypothesis test if the primary goal is to obtain...
Does it make sense to perform a hypothesis test if the primary goal is to obtain a confidence interval estimate? Why or why not?
Some modelers prefer to partition the data into three data sets (training/validation/test) vs. the more typical...
Some modelers prefer to partition the data into three data sets (training/validation/test) vs. the more typical two data sets (training/validation). A test set can be used during the final modeling step to measure the expected prediction error in practice given that it has been totally separated from the modeling/validation process. Do you think it is important to partition the data into three data sets (training/validation/test) or just two (training/validation)? Justify your opinion by discussing the pros and cons of each...
4.Explain one way you can perform the oxidase test. a.Name a bacterial species that will test...
4.Explain one way you can perform the oxidase test. a.Name a bacterial species that will test negative for this test.
Using the data below, perform a two-way ANOVA. Test the hypothesis of interaction at the 1%...
Using the data below, perform a two-way ANOVA. Test the hypothesis of interaction at the 1% level of significance. Also, use a 1% level of significance to test the null hypotheses of equal column and equal row means. Factor Level 1 Level 2 Level A 14 16 18 12 16 16 Level B 10 12 16 12 12 14
1. perform Levene’s test for equal variance. Note, this is a one‐way ANOVA testing for the...
1. perform Levene’s test for equal variance. Note, this is a one‐way ANOVA testing for the equality of 16 variances (each combination of promotion/discount). 0.1 signigicance 2. perform 2-way anova with replication To answer these questions, an experiment was designed using laundry detergent pods. For ten weeks, 160 subjects received information about the products. The factors under consideration were the number of promotions (1, 3, 5, or 7) that were described during this ten‐ week period and the percent that...
perform Levene’s test for equal variance. Note, this is a one‐way ANOVA testing for the equality...
perform Levene’s test for equal variance. Note, this is a one‐way ANOVA testing for the equality of 16 variances (each combination of promotion/discount). 0.1 signigicance 2. perform 2-way anova with replication To answer these questions, an experiment was designed using laundry detergent pods. For ten weeks, 160 subjects received information about the products. The factors under consideration were the number of promotions (1, 3, 5, or 7) that were described during this ten‐ week period and the percent that the...
For each of the following sets of results, compute the appropriate test statistic, test the indicated...
For each of the following sets of results, compute the appropriate test statistic, test the indicated alternative hypothesis, and compute the effects size(s) indicating their magnitude: set hypothesis 1 2 D n α a) μ1 ≠ μ2 32.6 32.1 4.2 24 0.05 b) μ1 > μ2 101.9 95 10 27 0.01 c) μ1 < μ2 74.6 70.2 6.6 24 0.10 a) Compute the appropriate test statistic(s) to make a decision about H0. critical value? =  ; test statistic? = Decision?:  ---Select--- Reject...
For each of the following sets of results, compute the appropriate test statistic, test the indicated...
For each of the following sets of results, compute the appropriate test statistic, test the indicated alternative hypothesis, and compute the effects size(s) indicating their magnitude: set Hypothesis μ0 σ n α a) μ ≠ μ0 51.4 50 3.6 49 0.05 b) μ > μ0 39.7 40.1 6.2 31 0.15 c) μ < μ0 31.8 30 8.9 33 0.10 a) Compute the appropriate test statistic(s) to make a decision about H0. critical value =__________ ; test statistic = ________________ Decision:  ***(choose...
For each of the following sets of results, compute the appropriate test statistic, test the indicated...
For each of the following sets of results, compute the appropriate test statistic, test the indicated alternative hypothesis, and compute the effects size(s) indicating their magnitude: set Hypothesis μ0 σ n α a) μ ≠ μ0 54.4 50.4 3 30 0.20 b) μ > μ0 43 40.5 6.5 40 0.10 c) μ < μ0 38.8 32 9.6 34 0.01 a) Compute the appropriate test statistic(s) to make a decision about H0. critical value = _______; test statistic = ____________ Decision:...
For each of the following sets of results, compute the appropriate test statistic, test the indicated...
For each of the following sets of results, compute the appropriate test statistic, test the indicated alternative hypothesis, and compute the effects size(s) indicating their magnitude: set Hypothesis μ0 n α a) μ ≠ μ0 99.9 99.1 3 26 0.05 b) μ > μ0 22.6 20.6 6.6 27 0.10 c) μ < μ0 49.1 55 8.1 23 0.01 a) Compute the appropriate test statistic(s) to make a decision about H0. critical value =  ; test statistic =   Decision:  ---Select--- Reject H0 Fail...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT