In: Computer Science
#All the code solutions should only use Pandas/Numpy and Matplotlib.
Initialize the US Zipcode dataset as shown below: usZipCodeData = pd.read_csv('http://people.bu.edu/kalathur/datasets/uszips.csv', converters={'zip': lambda x: str(x)})
Q1. Show the top 20 zip codes for Massachusetts by the decreasing order of density attribute.
Q2. Show the top 20 zip codes for Massachusetts by the decreasing order of population attribute.
Q3. What zip codes are common between Q8 and Q9. Use the numpy intersect1d method.
Q4. For Massachusetts, show a scatter plot of latitude versus longitude using color as the log of the population and size as 1/25 of the density.
Q5. For the top 75 populous zip codes in the usZipCodeData, show a pie chart with the distribution of the states and the frequencies of the zip codes in those states. (Hint: Use value_counts. Do not use any aggregate functions not yet covered).
Q6. Using the 7930 as the seed, pick 10 random rows from the Massachusetts data. Show the resulting data frame. Show the horizontal bar chart of the populations with the city as the label.
import numpy as np
import scipy as sp
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
usZipCodeData = pd.read_csv('http://people.bu.edu/kalathur/datasets/uszips.csv')
usZipCodeData = usZipCodeData.sort_values(by = 'density', ascending = False)# acessing Q2
usZipCodeData = usZipCodeData.sort_values(by = 'population', ascending = False) # Q3
usZipCodeData.head(20)
lat = usZipCodeData["lat"]
lng = usZipCodeData["lng"]
x = []
y = []
x=list(lat)
y=list(lng)
plt.scatter(x,y)
plt.xlabel('lat')
plt.ylabel('lng')
plt.title('Scatter plot for Massachusetts')
plt.show()