Question

In: Computer Science

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...

######################LANGUAGE PANDAS####################

#####################MATPLOTLIB###########################

#########################################################

# read ufo.csv into a DataFrame called 'ufo'

# print the head and the tail

# examine the default index, data types, and shape of ufo dataframe

# count the number of missing values in each column

# count total number of null vlaues in the dataframe

# print those rows which has null values

# fill null values,
#if any column is numerical has null value than fill this column with mean of that column

#if any column is categorical than fill this column with most frequent value of that column

# calculate the most frequent value for each of the columns (in a single command)

# what are the four most frequent colors reported?

# for reports in VA, what's the most frequent city?

# show only the UFO reports from Arlington, VA

# show only the UFO reports in which the City is missing

# how many rows remain if you drop all rows with any missing values?

# replace any spaces in the column names with an underscore

# create a new column called 'Location' that includes both City and State
# For example, the 'Location' for the first row would be 'Ithaca, NY'

# map existing values to a different set of values
# like in column 'is_male', convert F value to 0 and M to 1 with pandas

# writing generic code to replace spaces with underscores
# In other words, your code should not reference the specific column names

# convert datatype of column 'time' to the datetime format

ufo.csv file

City	Colors Reported	Shape Reported	State	Time
Ithaca		TRIANGLE	NY	########
Willingboro		OTHER	NJ	########
Holyoke		OVAL	CO	########
Abilene		DISK	KS	########
New York Worlds Fair		LIGHT	NY	########
Valley City		DISK	ND	########
Crater Lake		CIRCLE	CA	########
Alma		DISK	MI	########
Eklutna		CIGAR	AK	########
Hubbard		CYLINDER	OR	########
Fontana		LIGHT	CA	########
Waterloo		FIREBALL	AL	########
Belton	RED	SPHERE	SC	########
Keokuk		OVAL	IA	########
Ludington		DISK	MI	########
Forest Home		CIRCLE	CA	########
Los Angeles			CA	########
Hapeville			GA	########
Oneida		RECTANGLE	TN	########
Bering Sea	RED	OTHER	AK	########
Nebraska		DISK	NE	########

Expert Solution

STEP - BY - STEP PROCESS

(1.) read ufo.csv into a DataFrame called 'ufo'

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

df = pd.read_csv('https://raw.githubusercontent.com/justmarkham/pandas-videos/master/data/ufo.csv')

(2.) print the head

df.head()

(3.) print the tail

df.tail()

(4.) examine the default index

df.index

(5.) examine the data types

df.dtypes

(6.) examine the shape of ufo dataframe

df.shape

18241 rows total

(7.) count total number of null vlaues in the dataframe

df.isnull().sum()

(8.) print those rows which has null values

null_data = df[df.isnull().any(axis=1)]
null_data

(9.) if any column is numerical has null value than fill this column with mean of that column

df.select_dtypes(include=["float", 'int']).columns

no column have numeric values.

(10.) if any column is categorical than fill this column with most frequent value of that column

df.dtypes.name == 'category'

No column have categorical values, all column dtype is 'object'

(11.) calculate the most frequent value for each of the columns

df.mode()

(12.) what are the four most frequent colors reported?

n = 4
df['Colors Reported'].value_counts()[:n].index.tolist()

(13.) for reports in VA, what's the most frequent city?

df['City'].loc[df['State'] == 'VA'].mode()

(14.) show only the UFO reports from Arlington, VA

df1 = df.loc[df['City'] == 'Arlington']
df1.loc[df1['State'] == 'VA']

(15.) show only the UFO reports in which the City is missing

city_miss = df[df.City.isnull()]
city_miss

(16.) how many rows remain if you drop all rows with any missing values?

df = df.dropna()
df.shape

2486 rows left after removing missing values

(17.) replace any spaces in the column names with an underscore

df.columns = df.columns.str.replace(' ', '_')
df.columns

(18.) create a new column called 'Location' that includes both City and State

df['Location'] = df['City'] + ", " + (df['State'])

(19.) convert datatype of column 'time' to the datetime format

df['Time'] = pd.to_datetime(df.Time)
df.head()

NOTE: Use jupyter notebook and python for better understanding.

Thumbs Up Please !!!

venereology answered 2 years ago

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head and the tail # examine the default index, data types, and shape of ufo dataframe # count the number of missing values in each column # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if any column is numerical has null value than fill this column with mean of that column...

#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe...

#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe #rename the column Runtime (Minutes) with Runtime_Minutes, and Revenue (Millions) with Revenue_Millions # show if any column has null value # count total number of null vlaues in the dataframe # print those rows which has null values # fill null values, #if column is numerical than fill with means (if there is no numerical missing value in #data frame then don't code in...

Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows...

Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows and answer * Use the filter method to select the column names that contain the exact string facebook [ ] * Use the count method to find the number of non-missing values for each column. [ ] * Display the count of missing values for each column

How do I select every row in pandas dataframe?

Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15]...

Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15] 'missing' : [1,2,4,None,5,7] } replace the missing values in the missing table column with mean values using mean imputation ============ i am trying like this but i am not getting correct output and getting confused please explain with proper output and explanation import pandas as pd pd.DataFrame(data) temp = pd.DataFrame(data).fillna(np.mean()) temp ['missing'] . fillna(temp['missing'].mean()) ================ i am too much confused please write proper program...

IN C LANGUAGE This program will read in a series of strings and print only the...

IN C LANGUAGE This program will read in a series of strings and print only the consonants (including Y) until the word "stop" appears. No string will be longer than 100 characters. A consonant is any letter that is not a vowel. Don't forget to follow the standard read pattern! Examples Enter a string: Hello Hll Enter a string: World! Wrld Enter a string: 123! Enter a string: stop Enter a string: stop

#All the code solutions should only use Pandas/Numpy and Matplotlib. Initialize the US Zipcode dataset as...

#All the code solutions should only use Pandas/Numpy and Matplotlib. Initialize the US Zipcode dataset as shown below: usZipCodeData = pd.read_csv('http://people.bu.edu/kalathur/datasets/uszips.csv', converters={'zip': lambda x: str(x)}) Q1. Show the top 20 zip codes for Massachusetts by the decreasing order of density attribute. Q2. Show the top 20 zip codes for Massachusetts by the decreasing order of population attribute. Q3. What zip codes are common between Q8 and Q9. Use the numpy intersect1d method. Q4. For Massachusetts, show a scatter plot of...

Assume you have the Pandas DataFrame data, with the following contents: our_columns_name column_A column_B column_C column_D...

Assume you have the Pandas DataFrame data, with the following contents: our_columns_name column_A column_B column_C column_D column_E our_index_name row_name_0 9 93 71 Hello 102 row_name_1 28 64 37 my 92 row_name_2 13 91 93 name 104 row_name_3 45 29 54 is 74 row_name_4 0 36 31 Jason 36 Each column has a dtype (data type). Which of the following could be set of dtypes for this DataFrame? Hint 1: None of the numeric values shows a decimal point. (A float...

I'm working on a scatter-plot program in Python using Pandas, Matplotlib, Numpy, etc. I'm pulling data...

I'm working on a scatter-plot program in Python using Pandas, Matplotlib, Numpy, etc. I'm pulling data from a CSV file, which has no names, just numbers. All I did was to read a .csv file. How do I pull data from three columns which contains about 1500 rows with just numbers and make a scatter plot with two in the x-axis and the third in the y-axis?

Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt...

Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt (available on Canvas) This dataset shows used cars available for sale at a dealership. Each row represents a car record and columns tell information about each car. The first row in the dataset contains column headers. You must use Pandas to complete all 10 tasks.

Question

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...

Solutions

Expert Solution

Related Solutions

######################LANGUAGE PANDAS#################### #####################MATPLOTLIB########################### ######################################################### # read ufo.csv into a DataFrame called 'ufo' # print the head...

#########################PANDAS LANGUAGE################## #########################MATPLOT LIB######################### # read movie.csv into a DataFrame called 'movie' # describe the dataframe...

Using pandas Read in the movies.csv into a dataframe named movies, display the first 5 rows...

How do I select every row in pandas dataframe?

Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15]...

IN C LANGUAGE This program will read in a series of strings and print only the...

#All the code solutions should only use Pandas/Numpy and Matplotlib. Initialize the US Zipcode dataset as...

Assume you have the Pandas DataFrame data, with the following contents: our_columns_name column_A column_B column_C column_D...

I'm working on a scatter-plot program in Python using Pandas, Matplotlib, Numpy, etc. I'm pulling data...

Analyze used car inventory dataset using Python's pandas library - using DataFrame data structure¶ Dataset: UsedCarInventory_Assignment1.txt...