In: Computer Science
Write a function which takes in X and Y arrays, a column number, and a threshold. The function should return arrays X0 and Y0 containing all rows where the value in the specified column falls strictly below the threshold, as well as arrays X1 and Y1 containing all rows where the value in the specified column is above or equal to the threshold.(use numpy)
def split_on_feature(X_test, Y_test, column, thresh):
## TYPE ANSWER HERE
Used a non-standard library (Numpy), please install, if not installed. It allows for faster numerical computations in Python.
Code Snippet (Python): Please make sure that the unpacking of the arguments (return statement) is correctly done.
import numpy as np
def split_on_feature(x_test,y_test,column,thresh):
x_0 = x_test[np.where(x_test[:,column] < thresh)]
y_0 = y_test[np.where(y_test[:,column] < thresh)]
x_1 = x_test[np.where(x_test[:,column] >= thresh)]
y_1 = y_test[np.where(y_test[:,column] >= thresh)]
return x_0,y_0,x_1,y_1
Sample code with use of the above function :
import numpy as np
A = 1
B = 100
N = 4
x_test = (A + np.random.random((N,N)) * (B - A)).astype(int)
y_test = (A + np.random.random((N,N)) * (B - A)).astype(int)
print("X_test: ")
print(x_test)
print()
print("Y_test: ")
print(y_test)
def split_on_feature(x_test,y_test,column,thresh):
x_0 = x_test[np.where(x_test[:,column] < thresh)]
y_0 = y_test[np.where(y_test[:,column] < thresh)]
x_1 = x_test[np.where(x_test[:,column] >= thresh)]
y_1 = y_test[np.where(y_test[:,column] >= thresh)]
return x_0,y_0,x_1,y_1
threshold = int(np.random.random()*100)
x_0,y_0,x_1,y_1 = split_on_feature(x_test,y_test,3,threshold)
print("Threshold : " + str(threshold))
print()
print("X_0: ")
print(x_0)
print()
print("Y_0 : ")
print(y_0)
print()
print("X_1: ")
print(x_1)
print()
print("Y_1: ")
print(y_1)
Output: