In: Computer Science
######################LANGUAGE PANDAS####################
#####################MATPLOTLIB###########################
#########################################################
# read ufo.csv into a DataFrame called 'ufo'
# print the head and the tail
# examine the default index, data types, and shape of ufo dataframe
# count the number of missing values in each column
# count total number of null vlaues in the dataframe
# print those rows which has null values
# fill null values,
#if any column is numerical has null value than fill this column
with mean of that column
#if any column is categorical than fill this column with most frequent value of that column
# calculate the most frequent value for each of the columns (in a single command)
# what are the four most frequent colors reported?
# for reports in VA, what's the most frequent city?
# show only the UFO reports from Arlington, VA
# show only the UFO reports in which the City is missing
# how many rows remain if you drop all rows with any missing values?
# replace any spaces in the column names with an underscore
# create a new column called 'Location' that includes both City
and State
# For example, the 'Location' for the first row would be 'Ithaca,
NY'
# map existing values to a different set of values
# like in column 'is_male', convert F value to 0 and M to 1 with
pandas
# writing generic code to replace spaces with underscores
# In other words, your code should not reference the specific
column names
# convert datatype of column 'time' to the datetime format
ufo.csv file
City | Colors Reported | Shape Reported | State | Time | |
Ithaca | TRIANGLE | NY | ######## | ||
Willingboro | OTHER | NJ | ######## | ||
Holyoke | OVAL | CO | ######## | ||
Abilene | DISK | KS | ######## | ||
New York Worlds Fair | LIGHT | NY | ######## | ||
Valley City | DISK | ND | ######## | ||
Crater Lake | CIRCLE | CA | ######## | ||
Alma | DISK | MI | ######## | ||
Eklutna | CIGAR | AK | ######## | ||
Hubbard | CYLINDER | OR | ######## | ||
Fontana | LIGHT | CA | ######## | ||
Waterloo | FIREBALL | AL | ######## | ||
Belton | RED | SPHERE | SC | ######## | |
Keokuk | OVAL | IA | ######## | ||
Ludington | DISK | MI | ######## | ||
Forest Home | CIRCLE | CA | ######## | ||
Los Angeles | CA | ######## | |||
Hapeville | GA | ######## | |||
Oneida | RECTANGLE | TN | ######## | ||
Bering Sea | RED | OTHER | AK | ######## | |
Nebraska | DISK | NE | ######## | ||
# Importting libraries
import numpy as np
import pandas as pd
import matplotlib as plt
#
# Reading csv using read_csv method from pandas library
ufo = pd.read_csv("ufo.csv")
ufo
#
# Print the head i.e top 5 rows of ufo dataset
ufo.head()
# Print the tail i.e last 5 rows of ufo dataset
ufo.tail()
#
# Examine default index
ufo.index
# Data types of the data
ufo.dtypes
# Shape of the dataframe(rows, columns)
ufo.shape
#
# count the number of missing values in each column
ufo.isnull().sum()
#
# count total number of null vlaues in the dataframe
print ufo.isnull().sum().sum()
#
# print those rows which has null values
ufo[ufo.isnull().any(axis=1)]
#
#if any column is categorical than fill this column with most frequent value of that column
ufo = ufo.apply(lambda x:x.fillna(x.value_counts().index[0]))
#
# calculate the most frequent value for each of the columns (in a single command)
for each in ufo:print ufo[each].value_counts()[ufo[each].value_counts() == ufo[each].value_counts().max()]
#
# what are the four most frequent colors reported?
ufo['Colors Reported'].value_counts()[0:4]
#
# for reports in VA, what's the most frequent city?
ufo_CA = ufo[ufo['State'] == "VA"]
ufo_CA["City"].value_counts()[ufo_CA["City"].value_counts() == ufo_CA["City"].value_counts().max()]
#
# show only the UFO reports from Arlington, VA
ufo[(ufo['State'] == "VA")&(ufo['City'] == "Arlington")]
#
# show only the UFO reports in which the City is missing
ufo[ufo['City'].isnull()]
#
# how many rows remain if you drop all rows with any missing values?
ufo = ufo.dropna()
len(ufo)
#
# replace any spaces in the column names with an underscore
ufo.columns = ufo.columns.str.replace(' ', '_')
ufo
#
# map existing values to a different set of values
values_update = {"Shape_Reported":{"Circle": 4, "RECTANGLE": 2}}
ufo.replace(values_update, inplace=True)
#
# convert datatype of column 'time' to the datetime format
ufo['Time'] = pd.to_datetime(ufo['Time'])