In: Computer Science
Use pd.crosstab() to count the number of regions of each cover type there are for each of the 40 soil types. Pass this function the Cover_Type column as its first argument and the Soil_Type column as the second argument. Store the results in a DataFrame named ct_by_st and then display this DataFrame.
soil = np.unique(fc['Soil_Type'])
palette = ['orchid', 'lightcoral', 'orange', 'gold', 'lightgreen', 'deepskyblue', 'cornflowerblue']
Perform the following steps in a single cell:
1. Start by converting the count information into proportions. Create a DataFrame named ct_by_st_props by dividing ct_by_st by the column sums of ct_by_st. The column sums can be calculated using np.sum() or the DataFrame sum() method.
2. We will be creating a stacked bar chart, so we need to know where the bottom of each bar should be located. This can be calculated as follow: bb = np.cumsum(ct_by_st_props) - ct_by_st_props
3. Create a Matplotlib figure, setting the figure size to [8, 4].
4. Loop over the rows of ct_by_st_props. Each time this loop executes, add a bar chart to the figure according to the following specifications.
• The height of the bars should be determined by the current row of ct_by_st_props.
• The bottom position of each bar should be determined by the current row of bb.
• Each bar should have a black border, and a fill color determined by the current value of palette.
• The label for the legend should be set to the value of Cover_Type associated with the current row.
5. Set the labels for the x and y axes to be "Soil_Type" and "Cover_Type". Set the title to be "Distribution of Cover Type by Soil Type". 6. Add a legend to the plot. Set the bbox_to_anchor parameter to place the legend to the right of the plot, near the top. 7. Display the figure using plt.show().
Elevation Aspect Slope Hori
Hydrology Vertical Hori Roadways Hill_9am
Hill_Noon Hill_3pm Hori Points
Wilderness_Area Soil_Type Cover_Type
2596 51 3 258 0 510 221 232 148 6279 Rawah 29 5
2590 56 2 212 -6 390 220 235 151 6225 Rawah 29 5
2804 139 9 268 65 3180 234 238 135 6121 Rawah 12
2
2327 188 15 339 144 1256 220 250 159 1101 Cache la Poudre 6 4
2298 129 21 255 115 1326 249 222 90 999 Cache la Poudre 3 4
2289 133 21 234 106 1345 248 225 95 973 Cache la Poudre 3 4
2274 142 23 201 111 1383 246 227 96 924 Cache la Poudre 3 4
2850 359 12 30 4 1585 202 218 153 1187 Comanche Peak 31 5
2888 311 14 95 9 1774 180 229 188 1418 Comanche Peak 32
5
2903 0 5 134 19 1865 212 230 156 1463 Comanche Peak 32 5
2902 7 8 170 11 1892 211 225 151 1480 Comanche Peak 32 5
3598 20 15 342 61 1848 208 207 133 1673 Neota 40 7
3318 96 12 95 -5 1224 239 222 111 1411 Neota 38 7
3433 342 14 551 204 1044 189 217 166 1442 Neota 40 7
3218 49 18 0 0 1822 225 197 100 1673 Neota 23 2
ANSWER:
I have provided the properly commented and indented code
so you can easily copy the code as well as check for correct
indentation.
I have provided the output image of the code so you can easily
cross-check for the correct output of the code.
Have a nice and healthy day!!
CODE
# import important modules
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
fc = pd.read_csv("ct_by_st.csv")
# using pd.crosstab with input args as cover_type and soil type
ct_by_st = pd.crosstab(fc['Cover_Type'],fc['Soil_Type'])
# displaying dataframe
print("ct_by_st Dataframe is:")
print(ct_by_st)
palette = ['orchid', 'lightcoral', 'orange', 'gold', 'lightgreen', 'deepskyblue', 'cornflowerblue']
# 1. defining ct_by_st_props dataframe
ct_by_st_props = ct_by_st/ct_by_st.sum()
# 2. botton of each bar
bb = np.cumsum(ct_by_st_props) - ct_by_st_props
# 3. create Matplotlib figure, setting the figure size to [8, 4].
fig = plt.figure(figsize = (8,4))
# 4. loop over the rows of ct_by_st_props
for i in range(len(ct_by_st_props)):
# fetching row with respect to index
row_ct = ct_by_st_props.iloc[i].values
row_bb = bb.iloc[i].values
Cover_Type = ct_by_st_props.index[i]
#
plt.bar(list(range(len(row_ct))), row_ct, bottom=row_bb,label=palette[Cover_Type-1])
# 5. labeling plot
# setting xticks value
plt.xticks(list(range(len(ct_by_st_props.columns))), ct_by_st_props.columns)
# other labeling
plt.xlabel("Soil_Type")
plt.ylabel("Cover_Type")
plt.title("Distribution of Cover Type by Soil Type")
# 6. show legend
plt.legend(bbox_to_anchor=[1, 1],loc='upper right')
# 7. show plot
plt.show()
OUTPUT IMAGE