Question

In: Computer Science

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...

######################LANGUAGE PANDAS####################

#####################MATPLOTLIB###########################

#########################################################

# read ufo.csv into a DataFrame called 'ufo'

# print the head and the tail

# examine the default index, data types, and shape of ufo dataframe

# count the number of missing values in each column

# count total number of null vlaues in the dataframe

# print those rows which has null values

# fill null values,
#if any column is numerical has null value than fill this column with mean of that column

#if any column is categorical than fill this column with most frequent value of that column

# calculate the most frequent value for each of the columns (in a single command)

# what are the four most frequent colors reported?

# for reports in VA, what's the most frequent city?

# show only the UFO reports from Arlington, VA

# show only the UFO reports in which the City is missing

# how many rows remain if you drop all rows with any missing values?

# replace any spaces in the column names with an underscore

# create a new column called 'Location' that includes both City and State
# For example, the 'Location' for the first row would be 'Ithaca, NY'

# map existing values to a different set of values
# like in column 'is_male', convert F value to 0 and M to 1 with pandas

# writing generic code to replace spaces with underscores
# In other words, your code should not reference the specific column names

# convert datatype of column 'time' to the datetime format

ufo.csv file

City Colors Reported Shape Reported State Time
Ithaca TRIANGLE NY ########
Willingboro OTHER NJ ########
Holyoke OVAL CO ########
Abilene DISK KS ########
New York Worlds Fair LIGHT NY ########
Valley City DISK ND ########
Crater Lake CIRCLE CA ########
Alma DISK MI ########
Eklutna CIGAR AK ########
Hubbard CYLINDER OR ########
Fontana LIGHT CA ########
Waterloo FIREBALL AL ########
Belton RED SPHERE SC ########
Keokuk OVAL IA ########
Ludington DISK MI ########
Forest Home CIRCLE CA ########
Los Angeles CA ########
Hapeville GA ########
Oneida RECTANGLE TN ########
Bering Sea RED OTHER AK ########
Nebraska DISK NE ########

Solutions

Expert Solution

# Importting libraries
import numpy as np
import pandas as pd
import matplotlib as plt

#

# Reading csv using read_csv method from pandas library
ufo = pd.read_csv("ufo.csv")
ufo

#

# Print the head i.e top 5 rows of ufo dataset
ufo.head()
# Print the tail i.e last 5 rows of ufo dataset
ufo.tail()

#

# Examine default index
ufo.index
# Data types of the data
ufo.dtypes
# Shape of the dataframe(rows, columns)
ufo.shape

#

# count the number of missing values in each column
ufo.isnull().sum()

#

# count total number of null vlaues in the dataframe
print ufo.isnull().sum().sum()

#

# print those rows which has null values
ufo[ufo.isnull().any(axis=1)]

#

#if any column is categorical than fill this column with most frequent value of that column
ufo = ufo.apply(lambda x:x.fillna(x.value_counts().index[0]))

#

# calculate the most frequent value for each of the columns (in a single command)
for each in ufo:print ufo[each].value_counts()[ufo[each].value_counts() == ufo[each].value_counts().max()]

#

#  what are the four most frequent colors reported?
ufo['Colors Reported'].value_counts()[0:4]

#

# for reports in VA, what's the most frequent city?
ufo_CA = ufo[ufo['State'] == "VA"]
ufo_CA["City"].value_counts()[ufo_CA["City"].value_counts() == ufo_CA["City"].value_counts().max()]

#

# show only the UFO reports from Arlington, VA
ufo[(ufo['State'] == "VA")&(ufo['City'] == "Arlington")]

#

# show only the UFO reports in which the City is missing
ufo[ufo['City'].isnull()]

#

# how many rows remain if you drop all rows with any missing values?
ufo = ufo.dropna()
len(ufo)

#

# replace any spaces in the column names with an underscore
ufo.columns = ufo.columns.str.replace(' ', '_') 
ufo

#

# map existing values to a different set of values
values_update = {"Shape_Reported":{"Circle": 4, "RECTANGLE": 2}}
ufo.replace(values_update, inplace=True)

#

# convert datatype of column 'time' to the datetime format
ufo['Time'] = pd.to_datetime(ufo['Time'])

Related Solutions

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...
######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head and the tail # examine the default index, data types, and shape of ufo dataframe # count the number of missing values in each column # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if any column is numerical has null value than fill this column with mean of that column...
#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe...
#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe #rename the column Runtime (Minutes) with Runtime_Minutes, and Revenue (Millions) with Revenue_Millions # show if any column has null value # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if column is numerical than fill with means (if there is no numerical missing value in #data frame then don't code in...
Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows...
Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows and answer * Use the filter method to select the column names that contain the exact string facebook [ ] * Use the count method to find the number of non-missing values for each column. [ ] * Display the count of missing values for each column
How do I select every row in pandas dataframe?
How do I select every row in pandas dataframe?
Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15]...
Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15] 'missing' : [1,2,4,None,5,7] } replace the missing values in the missing table column with mean values using mean imputation ============ i am trying like this but i am not getting correct output and getting confused please explain with proper output and explanation import pandas as pd pd.DataFrame(data) temp = pd.DataFrame(data).fillna(np.mean()) temp ['missing'] . fillna(temp['missing'].mean()) ================ i am too much confused please write proper program...
IN C LANGUAGE This program will read in a series of strings and print only the...
IN C LANGUAGE This program will read in a series of strings and print only the consonants (including Y) until the word "stop" appears. No string will be longer than 100 characters. A consonant is any letter that is not a vowel. Don't forget to follow the standard read pattern! Examples Enter a string: Hello Hll Enter a string: World! Wrld Enter a string: 123! Enter a string: stop Enter a string: stop
#All the code solutions should only use Pandas/Numpy and Matplotlib. Initialize the US Zipcode dataset as...
#All the code solutions should only use Pandas/Numpy and Matplotlib. Initialize the US Zipcode dataset as shown below: usZipCodeData = pd.read_csv('http://people.bu.edu/kalathur/datasets/uszips.csv', converters={'zip': lambda x: str(x)}) Q1. Show the top 20 zip codes for Massachusetts by the decreasing order of density attribute. Q2. Show the top 20 zip codes for Massachusetts by the decreasing order of population attribute. Q3. What zip codes are common between Q8 and Q9. Use the numpy intersect1d method. Q4. For Massachusetts, show a scatter plot of...
Assume you have the Pandas DataFrame data, with the following contents: our_columns_name column_A column_B column_C column_D...
Assume you have the Pandas DataFrame data, with the following contents: our_columns_name column_A column_B column_C column_D column_E our_index_name                                                    row_name_0               9        93        71    Hello       102 row_name_1              28        64        37       my        92 row_name_2              13        91        93     name       104 row_name_3              45        29        54       is        74 row_name_4               0        36        31    Jason        36 Each column has a dtype (data type). Which of the following could be set of dtypes for this DataFrame? Hint 1: None of the numeric values shows a decimal point. (A float...
I'm working on a scatter-plot program in Python using Pandas, Matplotlib, Numpy, etc. I'm pulling data...
I'm working on a scatter-plot program in Python using Pandas, Matplotlib, Numpy, etc. I'm pulling data from a CSV file, which has no names, just numbers. All I did was to read a .csv file. How do I pull data from three columns which contains about 1500 rows with just numbers and make a scatter plot with two in the x-axis and the third in the y-axis?
Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt...
Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt (available on Canvas) This dataset shows used cars available for sale at a dealership. Each row represents a car record and columns tell information about each car. The first row in the dataset contains column headers. You must use Pandas to complete all 10 tasks.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT