In: Computer Science
Exercise 1 (1 point) For this step, you will load the training and test sentiment datasets "twitdata_TEST.tsv" and "allTrainingData.tsv". The data should be loaded into 4 lists of strings: X_txt_train, X_txt_test, y_test, y_train. Note, when using csvreader, you need to pass the "quoting" the value csv.QUOTE_NONE.
import csv X_txt_train = ... y_train = ... X_txt_test = ... y_test = ...
assert(type(X_txt_train) == type(list())) assert(type(X_txt_train[0]) == type(str())) assert(type(X_txt_test) == type(list())) assert(type(X_txt_test[0]) == type(str())) assert(type(y_test) == type(list())) assert(type(y_train) == type(list())) assert(len(X_txt_test) == 3199) assert(len(y_test) == 3199) assert(len(X_txt_train) == 8018) assert(len(y_train) == 8018) print("Asserts Completed Successfully!")
ANSWER:
I have provided the properly commented
and indented code so you can easily copy the code as well as check
for correct indentation.
I have provided the output image of the code so you can easily
cross-check for the correct output of the code.
Have a nice and healthy day!!
CODE
import csv
# reading train data file
file_train = open("allTrainingData.tsv")
# reading file using csv reader, using correct arguments
reader_train = csv.reader(file_train,delimiter="\t", quoting=csv.QUOTE_NONE)
# defining empty lists X_txt_train, y_train
X_txt_train = []
y_train = []
# looping through each row of reader and appending data in defined lists
for row in reader_train:
# storing y and text data from row in temp variables
y = row[2]
# joining all further data into one with '\t' seperator
X_txt = "\t".join(row[3:])
# appending data in respective lists
X_txt_train.append(X_txt)
y_train.append(y)
# reading test data file
file_test = open("twitdata_TEST.tsv")
# reading file using csv reader, using correct arguments
reader_test = csv.reader(file_test,delimiter="\t", quoting=csv.QUOTE_NONE)
# defining empty lists X_txt_test, y_test
X_txt_test = []
y_test = []
# looping through each row of reader and appending data in defined lists
for row in reader_test:
# storing y and text data from row in temp variables
y = row[2]
# joining all further data into one with '\t' seperator
X_txt = "\t".join(row[3:])
# appending data in respective lists
X_txt_test.append(X_txt)
y_test.append(y)
###
assert(type(X_txt_train) == type(list()))
assert(type(X_txt_train[0]) == type(str()))
assert(type(X_txt_test) == type(list()))
assert(type(X_txt_test[0]) == type(str()))
assert(type(y_test) == type(list()))
assert(type(y_train) == type(list()))
assert(len(X_txt_test) == 3199)
assert(len(y_test) == 3199)
assert(len(X_txt_train) == 8018)
assert(len(y_train) == 8018)
print("Asserts Completed Successfully!")
OUTPUT IMAGE