Question

In: Statistics and Probability

In an effort to characterize the New Guinea crocodile (Crocodylus novaeguineae), measurements were taken of the...

In an effort to characterize the New Guinea crocodile (Crocodylus novaeguineae), measurements were taken of the dorsal cranial length (mm) (the length of the skull from the tip of the nose to the back of the cranial cap, denoted DCL) and the total length (cm) (denoted TL) of 50 harvested adult males. (Data on next page.)

1. Construct a histogram and a boxplot for each of the variables DCL and TL. Comment on the symmetry of the distribution of each variable.

2. Construct a scatterplot with DCL on the vertical axis and TL on the horizontal axis. Based on this plot, what can be said about the relationship between DCL and TL (i.e. if you vary DCL, what happens to TL)?

Data

TL           DCL           Observation

130           169           1

102           154           2

126           160           3

230           290           4

115           151           5

150           209           6

259           344           7

130           183           8

110           153           9

130           183           10

185           237           11

215           288           12

129           187           13

149           189           14

156           203           15

100           143           16

224           294           17

234           318           18

162           229           19

217           299           20

206           283           21

144           198           22

146           203           23

166           229           24

203           275           25

205           266           26

252           350           27

238           318           28

250           330           29

255           351           30

120           169           31

250           332           32

238           307           33

157           205           34

159           216           35

202           261           36

177           237           37

221           288           38

224           294           39

167           232           40

240           316           41

207           268           42

192           242           43

180           248           44

165           226           45

197           267           46

113           162           47

131           183           48

162           234           49

246           310           50

Solutions

Expert Solution

Following is the raw data for our analysis given in a tabular form:

TL DCL
130 169
102 154
126 160
230 290
115 151
150 209
259 344
130 183
110 153
130 183
185 237
215 288
129 187
149 189
156 203
100 143
224 294
234 318
162 229
217 299
206 283
144 198
146 203
166 229
203 275
205 266
252 350
238 318
250 330
255 351
120 169
250 332
238 307
157 205
159 216
202 261
177 237
221 288
224 294
167 232
240 316
207 268
192 242
180 248
165 226
197 267
113 162
131 183
162 234
246 310

Note : All the required plots namely, the histogram, the boxplot and the scatterplot are constructed using Python's Seaborn library in order to complete the project within the time bounds . The steps for manually plotting the same shall be mentioned for a refference:

A. Following steps must be followed in order to construct a histogram for a given data atteribute :

1- Find the smallest and the largest number in the given data.

2- Find the range of the data by subtracting the largest number from the smallest number.

3- Now, we need to find the width of our class using the range. There is no hard-and-fast rule for this.So, we do this intituively by deviding the range by 5 for small ranges any by 10 in large ranges e.g, in our case for the atteribute TL, we devide the range by 10 and round off to obtain a class of width 16 units.

4- Make a frequency table having these two columns:

a) Intervals of the class width ,starting from the minimum to the maximum as per the class width and

b) The number of data points existing in that particular interval.

5- Draw perpendicular lines, the x- and y-axes. Place the frequencies on y-axis and the lower value of respective intervals on the x-axis.with the atteribute name.

6- Draw the bars whose width is from the lower value of first interval to the lower value of the second interval, and so on.

B. Following steps must be followed in order to construct a boxplot for a given data atteribute :

1-Find the min and max values of the atteribute.

2- Calculate the following 3 descriptive quantities of the atteribute:

a) median (Q2)

Formula:

- Sort the data in ascending order.

- If the total number of data-points is an odd number, the median is the {(n+1)/2}th observation .

- If the total number of data-points is an even number, the median is the average of the  (n/2)th and the {(n+1)/2}th observations.

b) 25-th percentile (Q1)

c) 75-th percentile (Q3)

3- Draw a horizontal rectangular box and lets its first edge be Q1 and the second edge be Q3.

4- Devide the rectangle by drawing a vertical line inside it, the median, whose distance from Q1 and Q3 scaled accordin to its magnitude relative to them. Call it Q2

5- Extend horizontal lines on both the sides of scale relative to min and the max valies on the Q1 and Q3, respectively. Cal them min and max

C. Following steps must be followed in order to construct a scatterplot for a given data atteributes :

1- Draw perpendicular lines, the x- and y-axes. Place the first atteribute name on y-axis and the second atteribute name on the x-axis.

2. Make dots corresponding to the adjecent observation pairs with respect to both the axes.

1. Construct a histogram and a boxplot for each of the variables DCL and TL. Comment on the symmetry of the distribution of each variable.

ANS)

DCL atteribute uni-variate analysis:

a. Histogram:

b. Boxplot

OBSERVATIONS:

1- The data is about-normally distributed with a marginal right-skew.

2- Since the skew is minimal, no extreme values are present at all.

3 - Since more than one peaks are appearent, the distribution is bi-modal, hinting in presence two independent groups or clusters.

4- Since the distribution is not having a shark peak, it is a plati-kurtic (-ve kurtosis) distribution citing the fact that the values are not very centered towatds the mean but have a healthy spread.

5- The range of values being 200, is significant citing high variation on length of species.

TL atteribute uni-variate analysis:

a. Histogram:

b. Boxplot

OBSERVATIONS:

1- The data is about-normally distributed with a very marginal right-skew.

2- Since the skew is minimal, no extreme values are present at all in this atteribute as well

3 - Since more than one peaks are appearent, the distribution is bi-modal, high chances of presence two independent groups .

4- It is a plati-kurtic (-ve kurtosis) distribution citing the fact tthat it has a healthy spread, and not centered towards the mean.

5- The range of values being 160, is significant citing high variation on length but less varience than DCL atteribute.

Conclusion can be made that both these atteributes are having roughly identical normal distribition , with to hidden groups.

2. Construct a scatterplot with DCL on the vertical axis and TL on the horizontal axis. Based on this plot, what can be said about the relationship between DCL and TL (i.e. if you vary DCL, what happens to TL)?

OBSERVATIONS

The scatterplot between the variables TL and DCL suggest a very high level of correlation. I.e, there is a positive dependecy between the both quantities. So, with a given increase in the TL, the DCL will probably increase as well and vice-versa.


Related Solutions

In an effort to characterize the New Guinea crocodile (Crocodylus novaeguineae), measurements were taken of the...
In an effort to characterize the New Guinea crocodile (Crocodylus novaeguineae), measurements were taken of the dorsal cranial length (mm) (the length of the skull from the tip of the nose to the back of the cranial cap, denoted DCL) and the total length (cm) (denoted TL) of 50 harvested adult males. DataTL DCL Observation 130 169 1 102 154 2 126 160 3 230 290 4 115 151 5 150 209 6 259 344 7 130 183 8 110...
In an effort to characterize the New Guinea crocodile (Crocodylus novaeguineae), measurements were taken of the...
In an effort to characterize the New Guinea crocodile (Crocodylus novaeguineae), measurements were taken of the dorsal cranial length (mm) (the length of the skull from the tip of the nose to the back of the cranial cap, denoted DCL) and the total length (cm) (denoted TL) of 50 harvested adult males. Assume we are interested in the following events:    A: DCL > 260mm,      B: TL > 180cm.      Use the data and the concept of probability as...
Four important measurements were taken in harsh environmental conditions. Two of the four measurements were lost...
Four important measurements were taken in harsh environmental conditions. Two of the four measurements were lost in the harsh conditions and the remaining two are 5       3 However a scientist recalls the mean and variance of the four measurements were 6 and 20/3, respectively. Find the two missing measurements.
To characterize random uncertainty of a pressure measuring technique, twelve pressure measurements were made of a...
To characterize random uncertainty of a pressure measuring technique, twelve pressure measurements were made of a certain constant pressure source, giving the following results in kPa: 125, 128, 129, 122, 126, 125, 125, 130, 126, 127, 124, and 123. (a) Estimate the 95% confidence interval of the next measurement obtained with this technique; (b) Estimate the 95% confidence interval of the average of next 5 measurements obtained with this technique; (c) Using only the 12 measurements available, how would you...
In a calibration test, 50 measurements were taken of a laboratory gas sample that is known...
In a calibration test, 50 measurements were taken of a laboratory gas sample that is known to have a CO concentration of 70 parts per million (ppm). A measurement is considered to be satisfactory if it is within 5 ppm of the true concentration. Of the 50 measurements, 37 were satisfactory. [5 + 10 points] (a) Suppose it is desirable to estimate the population proportion of satisfactory measurement within ±0.03, how big a sample of measurements should be? (b) Without...
The blood pressure measurements of a single patient were taken by twelve different medical students and...
The blood pressure measurements of a single patient were taken by twelve different medical students and the results are listed below. Construct a​ scatterplot, find the value of the linear correlation coefficient​ r, and find the​ P-value using alphaαequals=0.050.05. Is there sufficient evidence to conclude that there is a linear correlation between systolic measurements and diastolic​ measurements? systolic​ (x) 137137 133133 139139 120120 126126 121121 128128 128128 129129 143143 143143 138138 diastolic​ (y) 9595 9393 101101 8585 8989 8080 8181...
The blood pressure measurements of a single patient were taken by twelve different medical students and...
The blood pressure measurements of a single patient were taken by twelve different medical students and the results are listed below. Answer parts ​a-c. systolic​ (x) 138 132 141 119 123 120 127 129 125 143 142 138 diastolic​ (y) 92 91 99 84 88 82 83 84 82 96 104 97 a. Find the value of the linear correlation coefficient r. b. Find the critical values of r from the table showing the critical values for the Pearson correlation...
Drag measurements were taken for a 5 cm diameter sphere in water at 20 °C to...
Drag measurements were taken for a 5 cm diameter sphere in water at 20 °C to predict the drag force of a 1 m diameter balloon rising in air with standard temperature and pressure. Given kinematic viscosity of water (v) = 1.0 X 10-6 m2/s and kinematic viscosity of air (v) = 1.46 X 10-5 m2/s. Perform the Buckingham Pi theorem to generate a relationship for FD as a function of the independent variables. Assume the drag FD is a...
Drag measurements were taken for a 5 cm diameter sphere in water at 20 °C to...
Drag measurements were taken for a 5 cm diameter sphere in water at 20 °C to predict the drag force of a 1 m diameter balloon rising in air with standard temperature and pressure. Given kinematic viscosity of water (v) = 1.0 X 10-6 m2/s and kinematic viscosity of air (v) = 1.46 X 10-5 m2/s. Perform the Buckingham Pi theorem to generate a relationship for FD as a function of the independent variables. Assume the drag FD is a...
Here are the survival times in days of 72 guinea pigs after they were injected with...
Here are the survival times in days of 72 guinea pigs after they were injected with infections bacteria in a medical experiment. Survival times, whether of machines under stress or cancer patients after treatment, usually have distributions that are skewed to the right. 43 45 53 56 56 57 58 66 67 73 74 79 80 80 81 81 81 82 83 83 84 88 89 91 91 92 92 97 99 99 100 100 101 102 102 102 103...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT