In: Statistics and Probability
The data we have examined about the Titanic begs for further attention. You may have noticed in the presentation of the story that each newspaper headline listed a different number of passengers and survivors. In fact, nobody seems to know exactly how many were aboard; estimates vary by almost 200. The number we have used is at the low end. Neither is it clear how many people survived; estimates range from 705 to 711.
The data reported in this Lesson contain one correction from the data originally published. Those data indicated that all children in first class survived. In fact Helen Loraine Allison, age 2, perished with her parents, although her younger brother Hudson Trevor Allison (11m) survived. You can follow the links through the web sites to read her story (and see her picture!)
Even from the information given you should be skeptical about these data. This is a good point in the course to return to the ideas of the early lessons. Remember the W's! In particular, even though the variable names seem clear, What has really been reported? For example, what is the definition of a "child"? George Sweet, age 14, was travelling by himself and seems to have been coded as an adult (who did not survive). But Lucile Polk Carter, age 14, was travelling with her parents, Mr and Mrs William Ernest Carter. She (and her entire family) survived, but is she an adult female or a child?
Miss Barbara J. West and Miss Constance Mirium West appear to have survived, but we don't know their ages. Were they the daughters of Mr and Mrs Edwy Arthur West, or unmarried women who shared a common surname?
And among the third class passengers and crew, the data degrade further. All we know about W.H. Nancarrow is that this person held a third class ticket and perished. We have no information about Nancarrow's gender or age.
Discuss what warnings you would add to an analysis of these data. Do you think the principal conclusions drawn from analyses of these data are valid? Which parts of the data would you expect to be most reliable? Least reliable? (Hint: are the data on First Class passengers likely to be better or worse than the data on Third Class passengers? Why?) How could we fail to have a complete list of passengers and their rooms? Didn't the White Star Line sell them tickets and assign berths? Did the White Star line have a payroll? Did they even know who (and how many) they had employed?
Given the actual number of passengers aboard is fuzzy, we cannot extrapolate the available samples to arrive at accurate metrics about percentage of survivors, adult/child survivors, or the data about survivors in the different travel classes.
Additionally, the gender and the adulthood of the passengers isn't clear from the available data, though the adulthood might not be an issue if the ages are available.
The data on third class passengers is expected to be the least reliable, since the booking procedure is expected to be more unreliable and ad hoc, i.e. less organized compared to the first class passengers. There is a good possibility that the data on third class passengers wasn't recorded at all, and the tickets were sold over the counter with a good likelihood.
The failure to have complete list of passengers and their rooms might be because the records were aboard the Titanic as well and got sunk along with the ship. The payroll data might be aboard as well at the time the ship sunk. Apart from this, there is a possibility the data was maintained by some employee who was aboard and who could not survive, leading to the data being not trackable later.
Kindly Upvote if Helpful :)
HOPE THIS MAY HELP YOU----------------------
------------------------------THANK
YOU---------------------------