Question

In: Computer Science

Using python: In the directory rootdir some of the files contain randomly generated content while others...

Using python: In the directory rootdir some of the files contain randomly generated content while others are written by human authors. Determine how many of the files contained in the directory are written by human authors. Store your answer in the variable number_human_authored.

Solutions

Expert Solution

The logic I am going to be using here is that human authored files have words which are in english, and randomly generated files dont have english words. So the next problem is how to identify if a word is a proper english word or not. For this purpose we can use the nltk library in python which contains many functions for this very purpose (meaning of words, checking if a word is in english etc.

If you dont have nltk installed, please install the library using

pip install nltk

remember to run the terminal/ command prompt in administrator mode to avoid any permission issues.

The code -

from nltk.corpus import words
import os

path = input("Please enter the path of the folder : ")

files = os.listdir(path)

number_human_authored = 0

treshold = 50

# Dictionary is a set that contains words from the english language.
dictionary = set(words.words())



for f in files:
    complete_path = path + "\\" + f

    File = open(complete_path, 'r')

    total_words = 0
    english_words = 0

    # Open each file, and read word by word.
    for line in File:
        word_list = line.split(" ")
        for word in word_list:
            
            word = word.lower()
            # If word is in dictionary, it is an english word.
            if(word in dictionary):
                english_words+=1
            
            total_words+=1

    # Calculate percentage of english words.
    percent_english_words = (english_words/total_words)*100

    # If percentage is greater than treshold, then the file is human authored.
    if(percent_english_words > treshold):
        number_human_authored+=1

print("The number of human authored files are :", number_human_authored)

    

The threshold is changable, but dont make it too high as the dictionary does not contain informal words or words with punctuations, so 50 percent would be a good threshold.

In the input please pass the path of the folder in which the files are kept and run to get the answer,

The code has been commented so you can understand the code better.

I would love to resolve any queries in the comments. Please consider dropping an upvote to help out a struggling college kid :)

Happy Coding !!


Related Solutions

using python In the directory rootdir some of the files contain randomly generated content while others...
using python In the directory rootdir some of the files contain randomly generated content while others are written by human authors. Determine how many of the files contained in the directory are written by human authors. Store your answer in the variable number_human_authored.
Write a program using python that loops over each file in a specified directory and checks...
Write a program using python that loops over each file in a specified directory and checks the size of each file.You should create 2-tuple with the filename and size, should append the 2-tuple to a list, and then store all the lists in a dictionary.  
essay question: some environmental forces are considered controllable while some others are seen as beyond control...
essay question: some environmental forces are considered controllable while some others are seen as beyond control of the organization discuss:
Essay Question: Some environmental forces are considered controllable while some others are seen as beyond control...
Essay Question: Some environmental forces are considered controllable while some others are seen as beyond control of the organization. Discuss.
While some of life's experiences are under our control, others are not. External factors are those...
While some of life's experiences are under our control, others are not. External factors are those in life which we cannot control (such as who raised us or where we live). Can you identify events in your life that have contributed to the person you are today? Can you identify events in your life that may have contributed to feelings of inadequacy? If faced with the same event today, would your actions and/or feelings regarding the incident be different? How...
While some of life's experiences are under our control, others are not. External factors are those...
While some of life's experiences are under our control, others are not. External factors are those in life which we cannot control (such as who raised us or where we live). Can you identify events in your life that have contributed to the person you are today? Can you identify events in your life that may have contributed to feelings of inadequacy? If faced with the same event today, would your actions and/or feelings regarding the incident be different? How...
Discuss why some fixed assets require alms to be paid while others do not.
Discuss why some fixed assets require alms to be paid while others do not.
Some attempted to explain it as a battle between good and evil, while others sought physical...
Some attempted to explain it as a battle between good and evil, while others sought physical reasons. Compare and contrast the history of the supernatural tradition and the biological tradition of mental health disorders? Also consider how these heave influenced the contemporary focus on the five influences that must be considered as causes of any psychological disorder and give examples on health and/or behavior of each of the five influences. Do you think any one influence is more responsible than...
5. Some dogs bark while trailing, others are silent. The barking trait is due to a...
5. Some dogs bark while trailing, others are silent. The barking trait is due to a dominant gene. Erect ears are dominant to drooping ears. What kind of pups would be expected from a double heterozygous erecteared, barker mated to a droopedeared, silent trailer? Gene B controls the barking ability; gene E controls ear shape. Let B be the dominant allele for the barking trait. Let b be the recessive allele for the silent trait. Let E be the dominant...
Some fats, like oil, are liquid at room temperature, while others, such as butter, are solid....
Some fats, like oil, are liquid at room temperature, while others, such as butter, are solid. A) Explain how the structure of these molecules determines whether they are liquid or solid. B) Somethings, such as sugar, easily dissolve in water while others, like oils, don’t dissolve in water. What aspect of the molecule’s structure determine whether it dissolves in water or not?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT