Question

In: Computer Science

Write a single python file to perform the following tasks: (a) Get dataset “from sklearn.datasets import...

Write a single python file to perform the following tasks:

(a) Get dataset “from sklearn.datasets import load_iris”. Split the dataset into two sets: 20% of samples for training, and 80% of samples for testing.

NOTE 1: Please use “from sklearn.model_selection import train_test_split” with “random_state=N” and “test_size=0.80”.
NOTE 2: The offset/bias column is not needed here for augmenting the input features.

  1. (b) Generate the target output using one-hot encoding for both the training set and the test set.

  2. (c) Using the same training and test sets generated above, perform a polynomial regression (utilizing “from sklearn.preprocessing import PolynomialFeatures”) from orders 1 to 10 (adopting the weight-decay L2 regularization with regularization factor λ=0.0001) for classification (based on the

    one-hot encoding) and compute the number of training and test samples that are classified correctly.

NOTE 1: The offset/bias augmentation will be automatically generated by PolynomialFeatures. NOTE 2: If the number of rows in the training polynomial matrix is less than or equal to the number of

columns, then use the dual form of ridge regression (Lecture 6). If not, use the primal form (Lecture 6).

Submit a single python file with filename “A2_StudentMatriculationNumber.py”. It should contain a function A2_MatricNumber that takes in an integer “N” as input and returns the following outputs in the following order:

  • X_train: training numpy feature matrix with dimensions (number_of_training_samples ⨯ 4). (1%)

  • X_test: test numpy feature matrix with dimensions (number_of_test_samples ⨯ 4). (1%)

  • y_train: training target numpy array (containing values 0, 1 and 2) of length

    number_of_training_samples. (1%)

  • y_test: test target numpy array (containing values 0, 1 and 2) of length number_of_test_samples. (1%)

  • Ytr: one-hot encoded training target numpy matrix (containing only values 0 and 1) with dimension

    (number_of_training_samples ⨯ 3). (1%)

  • Yts: one-hot encoded test target numpy matrix (containing only values 0 and 1) with dimension

    (number_of_test_samples ⨯ 3). (1%)

  • Ptrain_list: list of training polynomial matrices for orders 1 to 10. Ptrain_list[0] should be polynomial

    matrices for order 1 (size number_of_training_samples x 5), Ptrain_list[1] should be polynomial matrices for

    order2(sizenumber_of_training_samplesx15),etc. (1.5%)

  • Ptest_list: list of test polynomial matrices for orders 1 to 10. Ptest_list[0] should be polynomial

    matrices for order 1, Ptest_list[1] should be polynomial matrices for order 2, etc. (1.5%)

  • w_list: list of estimated regression coefficients for orders 1 to 10. w_list[0] should be estimated regression

    coefficients for order 1, w_list[1] should be estimated regression coefficients for order 2, etc. (2%)

  • error_train_array: numpy array of training error counts (error count = number of samples classified incorrectly) for orders 1 to 10. error_train_array[0] is error count for polynomial order 1, error_train_array[1]

    is error count for polynomial order 2, etc. (2%)

  • error_test_array: numpy array of test error counts (error count = number of samples classified

    incorrectly) for orders 1 to 10. error_test_array[0] is error count for polynomial order 1, error_test_array[1] is error count for polynomial order 2, etc. (2%)

    Please use the python template provided to you. Remember to rename both “A2_StudentMatriculationNumber.py” and “A2_MatricNumber” using your student matriculation number. For example, if your matriculation ID is A1234567R, then you should submit “A2_A1234567R.py” that contains the function “A2_A1234567R”. Please do NOT zip/compress your file. Because of the large class size, points will be deducted if instructions are not followed. The way we would run your code might be something like this:

    >> import A2_A1234567R as grading
    >> N = 10
    >> X_train, X_test, y_train, y_test, Ytr, Yts, Ptrain_list, Ptest_list, w_list, error_train_array, error_test_array = grading.A2_A1234567R(N)

Solutions

Expert Solution

I have written the structure for a few tasks for this assignment. Now, you should be able to do the required ones.

# Importing required Libraries
import pandas as pd
from sklearn.model_selection import train_test_split
# for polynomial regression
from sklearn.preprocessing import PolynomialFeatures
# Ridge Regression is used for L2 Regularization
from sklearn.linear_model import Ridge 
from sklearn.preprocessing import OneHotEncoder
# Import Iris Dataset from sklearn
from sklearn.datasets import load_iris
# To calculate errors
from sklearn.metrics import mean_absolute_error, mean_squared_error

# Load the DataSet
iris = load_iris()

# Making pandas DataFrames
df = pd.DataFrame(data= iris.data, columns= iris.feature_names)
df1 = pd.DataFrame(data= iris.target, columns= ['species'])
def int_co_string(species):
    if species == 0:
        return 'setosa'
    elif species == 1:
        return 'versicolor'
    else:
        return 'virginica'
df1['species'] = df1['species'].apply(int_co_string)

# Merge df and df0
df = pd.concat([df, df1], axis= 1)

# Object to Number
df.drop('species', axis= 1, inplace= True)
df1 = pd.DataFrame(columns= ['species'], data= iris.target)
df = pd.concat([df, df1], axis= 1)

# defining variables for test train
X= df.drop(labels= 'sepal length (cm)', axis= 1)
y= df['sepal length (cm)']

# Dividing the Dataset into test and train after taking inpur N
N=int(input())
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size= 0.80, random_state= N)

# Encoding
enc = OneHotEncoder(handle_unknown = 'ignore')
enc.fit(X_train)
enc.transform(X_train).toarray()
enc.transform(X_test).toarray()

# Applying PolynomialFeatures to Ridge (L2) Model
# linear = LinearRegression()
ridgereg = Ridge(alpha=0.0001, normalize=True)
polynomial = PolynomialFeatures(order='C')
X_poly = polynomial.fit_transform(X_test)
polynomial.fit(X_poly, X_test)

# Training the Model
ridgereg.fit(X_train, y_train)

# Predicting for the Model
pred = ridgereg.predict(X_test)

# Checking Model's Performance
print('MAE:', mean_absolute_error(y_test, pred))
print('MSE:', mean_squared_error(y_test, pred))

Related Solutions

IN PYTHON Complete the following tasks, either in a file or directly in the Python shell....
IN PYTHON Complete the following tasks, either in a file or directly in the Python shell. Using the string class .count() method, display the number of occurrences of the character 's' in the string "mississippi". Replace all occurrences of the substring 'iss' with 'ox' in the string "mississippi". Find the index of the first occurrence of 'p' in the string "mississippi". Determine what the following Python function does and describe it to one of your Lab TAs : def foo(istring):...
C++ coding question From the text file given to you- “worldpop.txt”, perform the following tasks using...
C++ coding question From the text file given to you- “worldpop.txt”, perform the following tasks using Boolean function. PS-*Write separate codes for each task* Task 1. Display the names of the countries with: 1. Population >=1000,000,000 2. Population <= 1000,000 Task 2. Display the names of the first 10 countries Task 3. Display the names of the last 10 countries contents of worldpop.txt: Afghanistan 32738376 Akrotiri 15700 Albania 3619778 Algeria 33769669 Andorra 72413 Angola 12531357 Anguilla 14108 Argentina 40677348 Armenia...
Copy the following Python fuction discussed in class into your file: from random import * def...
Copy the following Python fuction discussed in class into your file: from random import * def makeRandomList(size, bound): a = [] for i in range(size): a.append(randint(0, bound)) return a a. Add another function that receives a list as a parameter and computes the sum of the elements of the list. The header of the function should be def sumList(a): The function should return the result and not print it. If the list is empty, the function should return 0. Use...
IN PYTHON: Write a program that displays the lines from the total.txt file in the following...
IN PYTHON: Write a program that displays the lines from the total.txt file in the following output. Use a try catch phrase to check for errors. Use only one function for this portion [main()]. Remember to use Python. Sample output below: 19 16, 29 3, 30 4, 34
Write a Python program stored in a file q3.py that: Gets single-digit numbers from the user...
Write a Python program stored in a file q3.py that: Gets single-digit numbers from the user on one line (digits are separated by white space) and adds them to a list. The first digit and the last digit should not be a zero. If the user provides an invalid entry, the program should prompt for a new value. Converts every entry in the list to an integer and prints the list. The digits in the list represent a non-negative integer....
write the pseudocode to process these tasks: From the random module import randint to roll each...
write the pseudocode to process these tasks: From the random module import randint to roll each die randomly # in pseudocode, import a random function # the name is helpful for the next M5-2 assignment Define a class called Dice In Python, the syntax has a colon after it: class Dice(): In pseudocode, you can specify it generally or be specific Under the class declaration, list the attributes. Here are some tips: # attributes are what we know about a...
Perform a sentiment analysis of a big text file in python Extract each word from the...
Perform a sentiment analysis of a big text file in python Extract each word from the file, transform the words to lower case, and remove special characters from the words using code similar to the following line:w=w.replace(':','').replace('?','').replace(',','').replace('.','').replace('"','').replace('!','').replace('(','').replace(')','').replace('\'','').replace('\\','').replace('/','') Utilize the lists of positive words, found in positive.txt to perform a sentiment analysis on the file (count how many positive words there are in a file) positive.txt crisp crisper cure cure-all cushy cute cuteness danke danken daring ... file.txt ...has a new...
From the MNIST dataset introduced in class, write code in Python that a) Splits the 42000...
From the MNIST dataset introduced in class, write code in Python that a) Splits the 42000 training images into a training set (50% of all the data) and a test set (the rest). The labels should also be split accordingly. (PLEASE ONLY SOLVE 2 & 3) 2) Basically repeat Part 1, but now use 80% of the images for training and the other 20% for testing. Report scores. [10 points] 3) Use the SVM model from part 2 to print...
From the MNIST dataset introduced in class, write code in Python that a) Splits the 42000...
From the MNIST dataset introduced in class, write code in Python that a) Splits the 42000 training images into a training set (50% of all the data) and a test set (the rest). The labels should also be split accordingly. [10 points] b) Use this training set for training a multi-class logistic regression model and test it on the test set. Report the score. [10 points] c) Use this training set for training a multi-class support vector machine model and...
Write a program to perform the following two tasks: 1. The program will accept a string...
Write a program to perform the following two tasks: 1. The program will accept a string as input in which all of the words are run together, but the first character of each word is uppercase. Convert the string to a string in which the words are separated by spaces and only the first word starts with an uppercase letter. For example, the string "StopAndSmellTheRose" would be converted to "Stop and smell the rose". Display the result string. 2. Then...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT