Question

In: Statistics and Probability

A corpus is a technical term for a collection of texts used to analyze a language...

A corpus is a technical term for a collection of texts used to analyze a language and verify its linguistic properties. The first modern, computer-readable corpus was the Brown Corpus of Standard American English, compiled by Henry Kucera and W. Nelson Francis of Brown University. The Brown Corpus draws from American English texts printed in 1961 and was for many years a widely cited resource in computational linguistics.

The five most frequently occurring words in the Brown Corpus are the, of, and, to, and a. Consider a data set consisting of all occurrences of these words in the Corpus. The values of the variable named Word are and, to, of, the, and a, so Word is a nominal variable with five categories.

Frequency and relative frequency distributions are constructed to summarize the data. They are shown in the table that follows

able 1

Word

Frequency

Relative Frequency

(Thousands of occurrences)

and 28.9 0.1566
to 26.1 0.1415   
of 36.4    0.1973
the 70.0 0.3794
a 23.1 0.1252
Total 184.5 1.0000   

A census is an enumeration of a population. The U.S. Census Bureau conducts a census every 10 years, but in addition, the Population Estimates Program of the bureau publishes population estimates for incorporated places every year. According to 2007 estimates, the five largest U.S. cities (by population) are New York City, Los Angeles, Chicago, Houston, and Phoenix.

Consider a data set consisting of all the residents of these five cities. The values of the variable named City are Phoenix, Chicago, Houston, Los Angeles, and New York City, so City is a nominal variable with five categories. Frequency and relative frequency distributions are provided in the table below.

Table 1

City

Frequency

Relative Frequency

(Millions of people)

Phoenix 1.55 0.0829
Chicago 2.84 0.1519   
Houston 2.21    0.1182
Los Angeles 3.83 0.2048
New York City 8.27 0.4422
Total 18.70 1.0000   

In 1935, Harvard linguist George Zipf pointed out that the frequency of the kth most frequent word in a language is roughly proportional to 1/k. This implies that the second most frequent word in a language has a frequency one-half that of the most frequent word, the third most frequent word has a frequency one-third that of the most frequent word, and so on. A distribution that follows this rule is said to obey Zipf’s Law.

Zipf’s Law has been observed not only in word distributions, but in other phenomena as well, such as the populations of cities. Answer the questions/ blank below in percentages

The frequency of the second most frequent word in the Brown Corpus is ----------- that of the most frequent word. The population of the second largest city in the United States is-------- that of the largest city.

The frequency of the fourth most frequent word in the Brown Corpus is-------------- that of the most frequent word. The population of the fourth largest city in the United States is --------- that of the largest city.

Solutions

Expert Solution

We have rounded answers to nearest integer.

The frequency of the second most frequent word in the Brown Corpus is 52% that of the most frequent word.

Explanation

Frequency of most frequent word = 70.0

Frequency of 2nd most frequent word = 36.4

Percentage = (36.4/70)*100 = 52%

The population of the second largest city in the United States is 46% that of the largest city.

Explanation

Population of largest city = 8.27

Population of the second largest city = 3.83

Percentage = (3.83/8.27)*100 = 46.3%

The frequency of the fourth most frequent word in the Brown Corpus is 37% that of the most frequent word.

Explanation

Frequency of most frequent word = 70.0

Frequency of 4th most frequent word = 26.1

Percentage = (26.1/70)*100 = 37.3%

The population of the fourth largest city in the United States is 27% that of the largest city.

Explanation

Population of largest city = 8.27

Population of the fourth largest city = 2.21

Percentage = (2.21/8.27)*100 = 26.7%


Related Solutions

WRITING ASSIGNMENT AND TECHNICAL TERM-PAPER 1. Please write a technical term paper about GRAVITATION - TOPIC:...
WRITING ASSIGNMENT AND TECHNICAL TERM-PAPER 1. Please write a technical term paper about GRAVITATION - TOPIC: GRAVITATION - INTRODUCTION: should be between 1/2 page to 1 page - MINIMUM OF 3 PAGES OR MAXIMUM OF 5 PAGES - A CONCLUSION - A REFERENCE AT THE END OF LAST PAGE THANK YOU
Analyze the Average Collection Period Based on the table Average Collection Period 2017 2018 2019 Kilroy...
Analyze the Average Collection Period Based on the table Average Collection Period 2017 2018 2019 Kilroy Realty Corporation 124d 130d 147d Cushman & Wakefield 4d 2d 4d Progressive Real Estate 10d 7d 6d Analyze the Fixed Asset Turnover based on the table Fixed Asset Turnover 2017 2018 2019 Kilroy Realty Corporation 5.7 4.5 8.1 Cushman & Wakefield 1.2 1.3 1.2 Progressive Real Estate 1.2 1.1 1.1 Analyze Debt to Asset Ratio based on the table Debt to Asset Ratio 2017...
Based on David Crystal-Texts and tweets: myths and realities. Explain how Internet impacts language, culture, literature,...
Based on David Crystal-Texts and tweets: myths and realities. Explain how Internet impacts language, culture, literature, education and society.
Hospital setting lll:Billing and Reimbursement A. Analyze the collection of data by patient access personnel and...
Hospital setting lll:Billing and Reimbursement A. Analyze the collection of data by patient access personnel and its importance to the billing and collection process. Be sure to address the importance of exceptional customer service. B. Analyze how third-party policies would be used when developing billing guidelines for patient financial services (PFS) personnel and administration when determining the payer mix for maximum reimbursement. C. Organize the key areas of review in order of importance for timeliness and maximization of reimbursement from...
III. Billing and Reimbursement A. Analyze the collection of data by patient access personnel and its...
III. Billing and Reimbursement A. Analyze the collection of data by patient access personnel and its importance to the billing and collection process. Be sure to address the importance of exceptional customer service. B. Analyze how third-party policies would be used when developing billing guidelines for patient financial services (PFS) personnel and administration when determining the payer mix for maximum reimbursement. C. Organize the key areas of review in order of importance for timeliness and maximization of reimbursement from third-party...
Commodity pricing contracts are being used for managing risk in long term producer-processor contracting relationships. Analyze...
Commodity pricing contracts are being used for managing risk in long term producer-processor contracting relationships. Analyze challenges of using long-term supply contracts. (10marks)
Is the term “debit” always used to describe an increase and the term “credit” used to...
Is the term “debit” always used to describe an increase and the term “credit” used to describe a decreas
Mark-5 This assessment contributes to CLO5—Communicate clearly and effectively using the technical language of the field...
Mark-5 This assessment contributes to CLO5—Communicate clearly and effectively using the technical language of the field correctly. Authentic Assessment is a form of assessment where we attempt to provide you with an assessment task which more meaningfully resemble something you may encounter in your professional life as an aspiring Software Tester. Towards this end, we ask that you do some research to uncover new (not discussed in lectures or tutorials) testing trends. You will be required to write a report...
LANGUAGE PYTHON 3.7 Write a collection class named "Jumbler". Jumbler takes in an optional list of...
LANGUAGE PYTHON 3.7 Write a collection class named "Jumbler". Jumbler takes in an optional list of strings as a parameter to the constuctor with various strings. Jumbler stores random strings and we access the items based on the methods listed below. Jumbler supports the following methods: add() : Add a string to Jumbler get() : return a random string from Jumbler max() : return the largest string in the Jumbler based on the length of the strings in the Jumbler....
Given the nested collection that maps each term to a set of strings   Return a string...
Given the nested collection that maps each term to a set of strings   Return a string of terms that are repeated in all the nested sets Given : {apple=[apple BALL carrot, ball !carrot! ,!Dog*&]} {apple=[apple BALL carrot, ball !carrot! ,!Dog*&], dog=[ball !carrot! ,!Dog*&]} Return: [ball !carrot! ,!Dog*&] Public static String common(Map<String, Set<Sting>> map) { }
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT