R code:
## 2. __Basic dplyr exercises__
## Install the package `fueleconomy` and load the dataset
`vehicles`. Answer the following questions.
install.packages("fueleconomy")
library(fueleconomy)
library(dplyr)
library(tidyr)
data(vehicles)
e. Finally, for the years 1994, 1999, 2004, 2009, and 2014, find the average city mpg of midsize cars for each manufacturer for each year. Use tidyr to transform the resulting output so each manufacturer has one row, and five columns (a column for each year). I have included sample output for the first two rows.
Output should like :
# make 1994 1999 2004 2009 2014
# 1 Acura NA 16.50000 17.33333 17.00000 20.60000
# 2 Audi NA 15.25000 16.20000 15.83333 19.08333
In: Statistics and Probability
Visit the NASDAQ historical prices weblink. First, set the date range to be for exactly 1 year ending on the Monday that this course started. For example, if the current term started on April 1, 2018, then use April 1, 2017 – March 31, 2018. (Do NOT use these dates. Use the dates that match up with the current term.) My class started 14 January 2019. Do this by clicking on the blue dates after “Time Period”. Next, click the “Apply” button. Next, click the link on the right side of the page that says “Download Data” to save the file to your computer. This project will only use the Close values. Assume that the closing prices of the stock form a normally distributed data set. This means that you need to use Excel to find the mean and standard deviation. Then, use those numbers and the methods you learned in sections 6.1-6.3 of the course textbook for normal distributions to answer the questions. Do NOT count the number of data points. Complete this portion of the assignment within a single Excel file. Show your work or explain how you obtained each of your answers. Answers with no work and no explanation will receive no credit. 1. a) Submit a copy of your dataset along with a file that contains your answers to all of the following questions. b) What the mean and Standard Deviation (SD) of the Close column in your data set? c) If a person bought 1 share of Google stock within the last year, what is the probability that the stock on that day closed at less than the mean for that year? Hint: You do not want to calculate the mean to answer this one. The probability would be the same for any normal distribution. (5 points) 2. If a person bought 1 share of Google stock within the last year, what is the probability that the stock on that day closed at more than $950? (5 points) 3. If a person bought 1 share of Google stock within the last year, what is the probability that the stock on that day closed within $50 of the mean for that year? (between 50 below and 50 above the mean) (5 points) 4. If a person bought 1 share of Google stock within the last year, what is the probability that the stock on that day closed at less than $800 per share. Would this be considered unusal? Use the definition of unusual from the course textbook that is measured as a number of standard deviations (5 points) 5. At what prices would Google have to close in order for it to be considered statistically unusual? You will have a low and high value. Use the definition of unusual from the course textbook that is measured as a number of standard deviations. (5 points) 6. What are Quartile 1, Quartile 2, and Quartile 3 in this data set? Use Excel to find these values. This is the only question that you must answer without using anything about the normal distribution. (5 points) 7. Is the normality assumption that was made at the beginning valid? Why or why not? Hint: Does this distribution have the properties of a normal distribution as described in the course textbook? Real data sets are never perfect, however, it should be close. One option would be to construct a histogram like you did in Project 1 to see if it has the right shape. Something in the range of 10 to 12 classes is a good number. (5 points)
In: Statistics and Probability
Consider data regarding a response y and an explanatory variable x, both numeric
x |
y |
|
1 |
2.8 |
0.7 |
2 |
2.6 |
1.3 |
3 |
6.8 |
-1.1 |
4 |
3.0 |
0.2 |
5 |
4.7 |
1.1 |
6 |
5.0 |
-0.1 |
7 |
5.0 |
0.9 |
8 |
2.9 |
1.0 |
9 |
7.0 |
-0.2 |
10 |
3.7 |
0.8 |
The null hypothesis that the expected value of the response is constant for all values of the explanatory variable is:
Select one:
a. Not rejected with a significance level of 5%.
b. Rejected with a significance level of 5%, but not with a significance level of 1%.
c. Rejected with a significance level of 1%, but not with a significance level of 0.1%.
d. Rejected with a significance level of 0.1%
In: Statistics and Probability
A random sample of 16 undergraduate students receiving student loans wasobtained, and the amounts of their loans for the school year were recorded. Use a normal probability plot to assess whether the sample data could have come from a population that is normally distributed. 6,200 |
Using the correlation coefficient of the normal probability plot, is it reasonable to conclude that the population is normally distributed? Select the correct choice below and fill in the answer boxes within your choice.
A. No. The correlation between the expected z-scores and the observed data, _____ does not exceed the critical value, _______.Therefore, it is not reasonable to conclude that the data come from a normal population.
B. Yes. The correlation between the expected z-scores and the observed data, _______ exceeds the critical value, ______. Therefore, it is reasonable to conclude that the data come from a normal population.
C. Yes. The correlation between the expected z-scores and the observed data, _______ exceeds the critical value _____. Therefore, it is not reasonable to conclude that the data come from a normal population.
D. No. The correlation between the expected z-scores and the observed data, _______ does not exceed the criticalvalue, ______. Therefore, it is reasonable to conclude that the data come from a normal population.
In: Statistics and Probability
29; 37; 38; 40; 58; 67; 68; 69; 76; 86; 87; 95; 96; 96; 99; 106; 112; 127; 145; 150
What is the Standard deviation (please round to two decimal places)?
In: Statistics and Probability
Use the following pairs of observations to construct an 80% and a 98% confidence interval for β1.
x |
11 |
55 |
33 |
00 |
44 |
22 |
66 |
|
---|---|---|---|---|---|---|---|---|
y |
11 |
66 |
44 |
22 |
33 |
33 |
77 |
The 80% confidence interval is (______) (Round to two decimal places as needed.)
The 98% confidence interval is (______) (Round to two decimal places as needed.)
In: Statistics and Probability
Action Adventures
The Adventure Toys Company manufactures a popular line of action figures and distributes them to toy stores at the wholesale price of $10 per unit. Demand for the action figures is seasonal, with the highest sales occurring before Christmas and during the spring. The lowest sales occur during the summer and winter months.
Each month the monthly "base" sales follow a normal distribution with a mean equal to the previous month's actual "base" sales and with a standard deviation of 500 units. The actual sales in any month are the monthly base sales multiplied by the seasonality factor for the month, as shown in the subsequent table. Base sales in December 2018 were 6,000, with actual sales equal to (l.18)*(6,000) = 7,080, It is now January I, 2019.
MONTH | SEASONAL FACTOR |
JANUARY | 0.79 |
FEBRUARY | 0.88 |
MARCH | 0.95 |
APRIL | 1.05 |
MAY | 1.09 |
JUNE | 0.84 |
JULY | 0.74 |
AUGUST | 0.98 |
SEPTEMBER | 1.06 |
OCTOBER | 1.1 |
NOVEMBER | 1.16 |
DECEMBER | 1.18 |
Cash sales typically account for about 40 percent of monthly sales, but this figure has been as low as 28 percent and as high as 48 percent in some months. The remainder of the sales are made on a 30-day interest-free credit basis, with full payment received one month after delivery. In December 2018, 42 percent of sales were cash sales and 58 percent were on credit.
The production costs depend upon the labor and material costs. The plastics required to manufacture the action figures fluctuate in price from month to month, depending on market conditions. Because of these fluctuations, production costs can be anywhere from $6 to-$8 per unit. In addition to these variable production costs, the company incurs a fixed cost of $15,000 per month for manufacturing the action figures. The company assembles the products to order. When a batch of a particular action figure is ordered, it is immediately manufactured and shipped within a couple of days.
The company utilizes eight molding machines to mold the action figures. These machines occasionally break down and require a $5,000 replacement part. Each machine requires a replacement part with a 10 percent probability each month.
The company has a policy of maintaining a minimum cash balance of at least $20,000 at the end of each month. The balance at the end of December 2018 (or equivalently, at the beginning of January 2019) is $25,000. If required, the company will take out a short-term (one-month) loan to cover expenses and maintain the minimum balance. The loans must be paid back the following month with interest (using the current month's loan interest rate). For example, if March's annual interest rate is 6 percent (so 0.5 percent per month) and a $1,000 loan is taken out in March, then $1,005 is due in April. However, a new loan can be taken out. each month.
Any balance remaining at the end of a month (including the minimum balance) is carried forward to the following month and also earns savings interest. For example, if the ending balance in March is $20,000 and March's savings interest is 3 percent per annum (so 0.25 percent per month), then $50 of savings interest is earned in April.
Both the loan interest rate and the savings interest rate are set monthly based upon the prime rate. The loan interest rate is set at prime + 2%, while the savings interest rate is set at prime - 2%. However, the loan interest rate is capped at (can't exceed) 9 percent and the savings interest rate will never drop below 2 percent.
The prime rate in December 2018 was 5 percent per annum. This rate depends upon the whims of the Federal Reserve Board. In particular, for each month there is a 70 percent chance it will stay unchanged, a 10 percent chance it will increase by 25 basis points (0.25 percent), a 10 percent chance it will decrease by 25 basis points, a 5 percent chance it will increase by 50 basis points, and a 5 percent chance it will decrease by 50 basis points.
a. Formulate a simulation model on a spreadsheet to track the company's cash flows from month to month. Use Analytic Solver to simulate 1,000 trials for the year 2019.
b. Adventure Toys management wants information about what the company's net worth might be at the end of 2019, including the likelihood that the net worth will exceed $0. (The net worth is defined here as the ending cash balance plus savings interest and account receivables minus any loans and interest due.) Display the results of your simulation run from part a in the various forms that you think would be helpful to management in analyzing this issue.
c. Arrangements need to be made to obtain a specific credit limit from the bank for the short-term loans that might be needed during 2019. Therefore, Adventure Toys management also would like information regarding the size of the maximum short-term loan that might be needed during 2019. Display the results of your simulation run from part a in the various forms that you think would be helpful to management in analyzing this issue.
In: Statistics and Probability
Qualitative versus Quantitative. Determine whether the following variables are qualitative or quantitative.
In: Statistics and Probability
In a survey of 100 U.S. residents with a high school diploma as their highest educational degree (Group 1) had an average yearly income was $35,621. Another 120 U.S. residents with a GED (Group 2) had an average yearly income of $33,498. The population standard deviation for both populations is known to be $4,310. At a 0.01 level of significance, can it be concluded that U.S. residents with a high school diploma make significantly more than those with a GED?
Enter the test statistic - round to 4 decimal places.
Enter the P-Value - round to 4 decimal places.
Can you conclude that U.S. residents with a high school diploma make significantly more than those with a GED?
In: Statistics and Probability
You flip the same coin 900 mores times (1000 total flips). If half of the 900 additional flips are heads and half are tails, what is the empirical probability of getting a heads for this coin? (505 heads in 1000 flips)
In: Statistics and Probability
At a used dealership, let X be an independent variable representing the age in years of a motorcycle and Y be the dependent variable representing the selling price of used motorcycle. The data is now given to you. X = {5, 10, 12, 14, 15} Y = {500, 400, 300, 200, 100}
1) What is the value for S^2?
2) What is the value for s?
3) Construct a 95% confidence interval for B1. what is the upper bound and lower bound?
4) does the data provide sufficient evidence to indicate that X contributes to the prediction of Y?
In: Statistics and Probability
Suppose that a company's equity is currently selling for $26.00 per share and that there are 5.60 million shares outstanding. If the firm also has 46 thousand bonds outstanding, which are selling at 109.00 percent of par, what are the firm's current capital structure weights for equity and debt respectively?
Equity = 5,600,000 x $26 = $145,600,000, so .7438
Debt = 46,000 x 1.090 x 1,000 = $50,140,000, so .2562
Total = $195,740,000
74.38%, 25.62%
In: Statistics and Probability
(A review on SAS data management)
The following is a data set of 12 individuals. And we want to
relate the heart rate at rest (Y) to kilograms body weight (X).
90 62
87 41
87 63
73 46
73 53
86 55
100 70
75 47
76 49
87 69
79 41
78 48
Write a program in SAS which read the given data set into SAS library.
In your program, create a new data set which is a copy of the
first, with the addition of a new variable which represents the
weight in pounds (1 kilogram 2.2 pounds). Use “proc print” to view
the data for all three variables.
(c) Create a new variable called weight_group which divides the
data set into four groups based on the estimated quartiles for the
variable weight. The estimated quartiles may be requested using the
keywords q1, median, and q3, in the “proc means”
statement.
(d) Use “proc freq” to summarize the frequency of subjects in
each of the four groups.
(e) Obtain the mean and standard deviation for the heart rate at rest for each of the four groups. Comment on the results.
(f) Produce a scatterplot of the heart rate at rest versus body
weight in kilograms. Describe what you see.
(g)Fit a straight line model for heart rate at rest versus body weight in kilograms. Comment on the fit of the line.
In: Statistics and Probability
Consider a student who is rather irregular about class attendance. If she attends class one day, the probability is .8 that she will attend class the next day. And if she misses class, the probability is .4 that she will miss again the following day.
a. Set up the transition matrix for this stochastic process.
b. If the student attends the first day of class, what is the probability she will miss the third day of class?
c. In the long run what proportion of the time will the student attend class.
d, If student misses class one day, what is the average number of classes going by before she misses
class again.
In: Statistics and Probability
The number of syntax errors in a programming assignment for 20 randomly selected students are as follows:
21, 5, 44, 44, 22, 49, 2, 48, 17, 27, 21, 34, 24, 12, 43, 12, 30, 10, 15, 15.
a. Construct an ordered stem-and-leaf display for this data.
b. Find the sample variance, S2 , and the sample standard deviation, S, using the short-cut method.
c. Find the sample lower quartile Q1 , median Q2 and Upper-Quartile Q3 .
d. Construct a Box-plot for this data and find the 70th-percentile and interpret it.
In: Statistics and Probability