Question

In: Computer Science

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...

######################LANGUAGE PANDAS####################

#####################MATPLOTLIB###########################

#########################################################

# read ufo.csv into a DataFrame called 'ufo'

# print the head and the tail

# examine the default index, data types, and shape of ufo dataframe

# count the number of missing values in each column

# count total number of null vlaues in the dataframe

# print those rows which has null values

# fill null values,
#if any column is numerical has null value than fill this column with mean of that column

#if any column is categorical than fill this column with most frequent value of that column

# calculate the most frequent value for each of the columns (in a single command)

# what are the four most frequent colors reported?

# for reports in VA, what's the most frequent city?

# show only the UFO reports from Arlington, VA

# show only the UFO reports in which the City is missing

# how many rows remain if you drop all rows with any missing values?

# replace any spaces in the column names with an underscore

# create a new column called 'Location' that includes both City and State
# For example, the 'Location' for the first row would be 'Ithaca, NY'

# map existing values to a different set of values
# like in column 'is_male', convert F value to 0 and M to 1 with pandas

# writing generic code to replace spaces with underscores
# In other words, your code should not reference the specific column names

# convert datatype of column 'time' to the datetime format

ufo.csv file

City Colors Reported Shape Reported State Time
Ithaca TRIANGLE NY ########
Willingboro OTHER NJ ########
Holyoke OVAL CO ########
Abilene DISK KS ########
New York Worlds Fair LIGHT NY ########
Valley City DISK ND ########
Crater Lake CIRCLE CA ########
Alma DISK MI ########
Eklutna CIGAR AK ########
Hubbard CYLINDER OR ########
Fontana LIGHT CA ########
Waterloo FIREBALL AL ########
Belton RED SPHERE SC ########
Keokuk OVAL IA ########
Ludington DISK MI ########
Forest Home CIRCLE CA ########
Los Angeles CA ########
Hapeville GA ########
Oneida RECTANGLE TN ########
Bering Sea RED OTHER AK ########
Nebraska DISK NE ########

Solutions

Expert Solution

STEP - BY - STEP PROCESS

(1.) read ufo.csv into a DataFrame called 'ufo'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/ufo.csv')

(2.) print the head

df.head()

(3.) print the tail

df.tail()

(4.) examine the default index

df.index

(5.) examine the data types

df.dtypes

(6.) examine the shape of ufo dataframe

df.shape

18241 rows total

(7.) count total number of null vlaues in the dataframe

df.isnull().sum()

(8.) print those rows which has null values

null_data = df[df.isnull().any(axis=1)]
null_data

(9.) if any column is numerical has null value than fill this column with mean of that column

df.select_dtypes(include=["float", 'int']).columns

no column have numeric values.

(10.) if any column is categorical than fill this column with most frequent value of that column

df.dtypes.name == 'category'

No column have categorical values, all column dtype is 'object'

(11.) calculate the most frequent value for each of the columns

df.mode()

(12.) what are the four most frequent colors reported?

n = 4
df['Colors Reported'].value_counts()[:n].index.tolist()

(13.) for reports in VA, what's the most frequent city?

df['City'].loc[df['State'] == 'VA'].mode()

(14.) show only the UFO reports from Arlington, VA

df1 = df.loc[df['City'] == 'Arlington']
df1.loc[df1['State'] == 'VA']

(15.) show only the UFO reports in which the City is missing

city_miss = df[df.City.isnull()]
city_miss

(16.) how many rows remain if you drop all rows with any missing values?

df = df.dropna()
df.shape

2486 rows left after removing missing values

(17.) replace any spaces in the column names with an underscore

df.columns = df.columns.str.replace(' ', '_')
df.columns

(18.) create a new column called 'Location' that includes both City and State

df['Location'] = df['City'] + ", " + (df['State'])

(19.) convert datatype of column 'time' to the datetime format

df['Time'] = pd.to_datetime(df.Time)
df.head()

NOTE: Use jupyter notebook and python for better understanding.

Thumbs Up Please !!!


Related Solutions

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...
######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head and the tail # examine the default index, data types, and shape of ufo dataframe # count the number of missing values in each column # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if any column is numerical has null value than fill this column with mean of that column...
#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe...
#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe #rename the column Runtime (Minutes) with Runtime_Minutes, and Revenue (Millions) with Revenue_Millions # show if any column has null value # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if column is numerical than fill with means (if there is no numerical missing value in #data frame then don't code in...
Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows...
Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows and answer * Use the filter method to select the column names that contain the exact string facebook [ ] * Use the count method to find the number of non-missing values for each column. [ ] * Display the count of missing values for each column
How do I select every row in pandas dataframe?
How do I select every row in pandas dataframe?
Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15]...
Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15] 'missing' : [1,2,4,None,5,7] } replace the missing values in the missing table column with mean values using mean imputation ============ i am trying like this but i am not getting correct output and getting confused please explain with proper output and explanation import pandas as pd pd.DataFrame(data) temp = pd.DataFrame(data).fillna(np.mean()) temp ['missing'] . fillna(temp['missing'].mean()) ================ i am too much confused please write proper program...
IN C LANGUAGE This program will read in a series of strings and print only the...
IN C LANGUAGE This program will read in a series of strings and print only the consonants (including Y) until the word "stop" appears. No string will be longer than 100 characters. A consonant is any letter that is not a vowel. Don't forget to follow the standard read pattern! Examples Enter a string: Hello Hll Enter a string: World! Wrld Enter a string: 123! Enter a string: stop Enter a string: stop
#All the code solutions should only use Pandas/Numpy and Matplotlib. Initialize the US Zipcode dataset as...
#All the code solutions should only use Pandas/Numpy and Matplotlib. Initialize the US Zipcode dataset as shown below: usZipCodeData = pd.read_csv('http://people.bu.edu/kalathur/datasets/uszips.csv', converters={'zip': lambda x: str(x)}) Q1. Show the top 20 zip codes for Massachusetts by the decreasing order of density attribute. Q2. Show the top 20 zip codes for Massachusetts by the decreasing order of population attribute. Q3. What zip codes are common between Q8 and Q9. Use the numpy intersect1d method. Q4. For Massachusetts, show a scatter plot of...
Assume you have the Pandas DataFrame data, with the following contents: our_columns_name column_A column_B column_C column_D...
Assume you have the Pandas DataFrame data, with the following contents: our_columns_name column_A column_B column_C column_D column_E our_index_name                                                    row_name_0               9        93        71    Hello       102 row_name_1              28        64        37       my        92 row_name_2              13        91        93     name       104 row_name_3              45        29        54       is        74 row_name_4               0        36        31    Jason        36 Each column has a dtype (data type). Which of the following could be set of dtypes for this DataFrame? Hint 1: None of the numeric values shows a decimal point. (A float...
I'm working on a scatter-plot program in Python using Pandas, Matplotlib, Numpy, etc. I'm pulling data...
I'm working on a scatter-plot program in Python using Pandas, Matplotlib, Numpy, etc. I'm pulling data from a CSV file, which has no names, just numbers. All I did was to read a .csv file. How do I pull data from three columns which contains about 1500 rows with just numbers and make a scatter plot with two in the x-axis and the third in the y-axis?
Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt...
Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt (available on Canvas) This dataset shows used cars available for sale at a dealership. Each row represents a car record and columns tell information about each car. The first row in the dataset contains column headers. You must use Pandas to complete all 10 tasks.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT