In: Computer Science
This task exercises your ability to use python to represent data and use flow control and functions to re-organize the data. You need to submit the ipynb file to Moodle.
A data scientist has collected tube information and saved the video info in multiple CSV files. Each CSV file has the following columns:
· video_id
· trending_date
· title
· channel_title
· category_id
· publish_time
· tags
· views
· likes
· dislikes
· comment_count
· thumbnail_link
· comments_disabled
· ratings_disabled
· video_error_or_removed
· description
You are asked to write python code to process CSV data files. You can only use collections, numpy, and CSV modules. The task needs you to solve 6 problems as listed below. Use the template ipynb file to complete your code. For each question, your code should output an answer CSV file, so that the results can be viewed with MS Excel. Use one cell for each question. After running the 6 cells one after another, your solution should output 6 answer CSV files. The answer files must be named as “question1.csv”, “question2.csv”, … , “question6.csv”. Write documentation in your code. Add comments to explain key steps.
2 visible test cases are provided to you. Each has an input.csv file and the corresponding answer files.
Your code will be tested with multiple hidden test cases (HTCs). For each HTC video file, your code should generate the corresponding answers in 6 answer CSV files. The HTCs are similar to the given test cases. You can assume that HTC video files have all the above-mentioned columns. For each column, HTCs have the same value type and ranges. For example, HTC video files have the column “likes” and “dislikes”, with all values being non-negative integers. You don’t have to consider missing values or non-standard values (e.g. “117394” is never saved as “117,394”). The cells in your solution will be executed one after another.
Question 6:
Show all categories with at least 10 videos. For each category, show the category name, number of videos, videoId with the highest views (if same, print the one that appears first in the CSV), average comment count, number of videos disabled comments, and a list of unique channels that published videos in the category (in ascending order alphabetically. separated by '|'). Save the results in the descending order of video count.
Convert the category id to the actual category name. Since different countries have different category id encoding, your code should allow dynamically convert the category id to the category name. That means your code must directly read the categories from the category_id.txt file. Do not hard coding the categories in your code.
The output file should have the following headers:
category_name |
video_count |
most_popular_video |
average_comment |
disable_comment_count |
channels |
I just need help with the last question, question 6.
import csv
with open('file.csv','r') as f:
reader = csv.reader(f)
data = [list(row) for row in reader]
cid=[]
cname=[]
no_0f_videos=[]
popular_video=[]
average_comment_count=[]
disabled_count=[]
unique_channels=[]
popularvideocount=0
pvideotitle=""
commentcount=0
clen=0
commentdisabled=0
for i in range(1,len(data)):
cid.append(data[i][4])
cid=list(dict.fromkeys(cid))
for id in cid:
videocount=0
uniquechannel=[]
for i in range(1,len(data)):
if(id==data[i][4]):
videocount=videocount+1
if(data[i][7]>popularvideocount):
pvideotitle=data[i][2]
commentcount=commentcount+data[i][10]
clen=clen+1
commentdisabled=commentdisabled+data[i][12]
uniquechannel.append(data[i][3])
no_0f_videos.append(videocount)
popular_video.append(pvideotitle)
average_comment_count.append(commentcount/clen)
disabled_count.append(commentdisabled)
uniquechannel=list(dict.fromkeys((uniquechannel))
.extend(uniquechannel)
file1 = open('category_id.txt', 'r')
Lines = file1.readlines()
cname=[]
fcid=[]
for line in Lines:
a,b=line.split(" ")
fcid.append(a)
cname.append(b)
categoryname=[]
for id in cid:
for i in range(1,len(fcid)):
if(id==fcid):
.append(cname[i])
break
for i in range(len(cid)):
print("category name is ",categoryname[i])
print("no of videos are ",[i])
print("video with highest views is ",popular_video[i])
print("average comment count is ",average_comment_count[i])
print("disabled comments is ",disabled_count[i])
print("unique cahannels are ",unique_channels[i].join("|"))
fields = ['category_name', 'video_count', 'popular_video']
# data rows of csv file
rows = [categoryname,no_0f_videos,popular_video]
# name of csv file
filename = "quetion6.csv"
# writing to csv file
with open(filename, 'w') as csvfile:
# creating a csv writer object
csvwriter = csv.writer(csvfile)
# writing the fields
csvwriter.writerow(fields)
# writing the data rows
csvwriter.writerows(rows)