In: Statistics and Probability
Please articulate one major 'take away' from your time learning statistics. Perhaps you will think about data differently, understand probability better, etc.
To my opinion, a researcher should only apply tools (besides all measurement instruments and experimental details including mathematical and statistical tools) he/she really understands well enough to correctly interpret results and identify possible problems, limitations, to judge reliability, validity and generalizability. In many fields this can't be achieved by a single person, so we have to work in teams, with experts in different subjects, where all the experts need enough knowledge from everything else than their main expertise to effectively and successfully communicate with all other team members. I see a major problem that science becomes much too "economized", being degraded to a "career option", focussing on "publication output" and "funding". This seems to lead to an increase of data generation where most people do not anymore *really* understand what is going on. There is no time to learn a method thoroghly. Methods should just be applied, quickly, to generate data and publish highly significant results. I know PhD students who do perform (apply) real-time PCR who don't have any idea (or only a very, very vague idea) about the process, the underlying biochemistry, the principle of signal generation, the principle of the measurement and quantification, and the meaning of the read-out. So they almost know nothing about the method, and the don't know what the read-outs actually mean. How can one expect that they should be able to understand how the generated data should be reasonably analyzed? And they don't need to know (practically), since they will only re-do what they read in other papers, following "cookbook recipies" without understanding. The same I found for post docs. (I am not generalizing! I just say that such cases exist, although I have the feeling that such cases are not quite rare).
In statistics we have an even more severe problem. Statistics is (typically) tought in a non-sensical way, far too much focussing on hypothesis testing (as if this was the showcase field of statistics). And it is woefully neglected to teach the philosophy of (empirical) science, what knowledge means and information, how data is related to information and how this changes knowledge. Then a part of statistics is a mathematical/quantitative treatment of "representation of knowledge" and "information" and their inter-dpendence. Understanding this would put the focus of research from the utterly non-sensical question "it is significant?" more towards the more sensible question "what can we learn from the available data?".
Generally, I find it embarrassing when people publish theit t-tests and ANOVAs and whatsoever and can't reasonable answer why they decided to do a test at all, why they wanted to control what error-rate at what level (why the TWER or why the FWER?, why alpha=0.05? what actually is beta?) and thy only answers you get are
- "because others also did it",
- "it is convention",
- "the stats software does it for such data",
- "my boss/reviewer asked me to do this", and, eventually
- "I actually have no idea"
Sometimes you will happen to hear all the misconceptions about test and decisions, like
- "to proof that the null hypothesis is false / the alternative is true",
- "to demonstrate the reliability/validity of my findings",
- "ton show the relevance of my results" and so on.
No, it is *not* scientific to do tests and present p-values and other statistics. It is especially non-scientific to rely on automated decision-rules and focus on long-run error-rates instead of considering the individual case in its scientific context. It is scientific to understand what the data can tell us and to understand what we can learn from the data. This is 95% expertise and common sense and only 5% statistics, where the most important part of "statistics" is not doing some tests but to think about good summaries of the relevant information from the data, its visualization and smart exploration.