In: Computer Science
Question #1 Please specify he output of the following code (2.4) # import the pandas library and aliasing as pd import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(8, 4), index = ['a','b','c','d','e','f','g','h'], columns = ['A', 'B', 'C', 'D']) # for getting values with a boolean array print df.loc['a']>0
Question #2 Please specify the output of the following code (2.2)
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(5, 3), index=['a', 'c', 'e', 'f', 'h'],columns=['one', 'two', 'three']) df = df.reindex(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h']) print df.dropna(axis=1)
Question #3 Given the following temp.csv (2.1)
S.No,Name,Age,City,Salary 1,Tom,28,Toronto,20000 2,Lee,32,HongKong,3000 3,Steven,43,Bay Area,8300 4,Ram,38,Hyderabad,3900
Please specify what is the output of the following code:
import pandas as pd df=pd.read_csv("temp.csv",names=['a','b','c','d','e'],header=0) print df
Question 4 Please describe what the following code is doing (2.3)
import pandas as pd import numpy as np df = pd.DataFrame(np.random.randn(10, 4), index = pd.date_range('1/1/2000', periods=10), columns = ['A', 'B', 'C', 'D']) print df r = df.rolling(window=3,min_periods=1) print r['A'].aggregate(np.sum)
The Solution for 2.4 is: A,B,C,D with random True or False as follows:
=========================RUN-1================
A False
B False
C True
D True
Name: a, dtype: bool
==========================RUN-2=================
A False
B False
C False
D True
Name: a, dtype: bool
==================================================
solution for 2.2
Empty DataFrame
Columns: []
Index: [a, b, c, d, e, f, g, h]
solution for 2.1
header =0 will remove the header of the csv file
a,b,c,d,e column names are used.
SOLUTION FOR 2.3 is:
It will take the date and divide the date in range 1/1/2000 with columns A,B,C,D.
Finally, it sums the A column values and prints as shown in the image above.