In: Statistics and Probability
Describe the process for applying multivariate analysis.
Section 1 Prologue :
It has been great being part of the Analytical Community the last few years. The excitement is everywhere about “big-data”,“data-science”,“MOOCs”. The talent being attracted into Analytics is awe inspiring.One current trend is ‘a shift from a desire to work for bigger name brand companies like Facebook or Google, to more mission-driven organizations attempting to make an impact on society. Whether it is curing cancer, conserving energy, tracking infectious disease or personalizing education, more data scientists are becoming interested in trying to make the world a better place’ (kdnuggets 2017).
My dissertation is about Data Analytics in a Business Environment. In particular how to empower colleagues to use Data Analytics in problem solving. I work in the London Insurance Markets and am certainly not in a position to help cure cancer or Global Warming! I do interact everyday with colleagues as they process, analyse and act on their data. This thesis is very much about solving the worlds business problems, however small or one-off!
Business Analytics problems are complex. Multivariate Analyis does however offer an opportunity to cut through this complexity and focus on an iterative, scientific process of evaluation. Unfortunately the potential of Multivariate Analysis is poorly understood in the business community. Outside of the Normal Distribution, there is very little understanding of methods for Data Reduction or Simplification, Sorting and Grouping, Investigation of Dependence, Prediction or Hypothesis Testing.
In this document, I aim to help correct this by first summarising key Multivariate Results and then applying them to a detailed Business Problem. My goal is to convince the reader that whenever the data involves simultaneous measurements of multiple variables, there is value in performing a Multivariate Analysis. In particular I investigate and apply techniques that do not rely on a Multivariate Normal Assumption.
The interested reader is referred to (Johnson, Wichern, and others 2014) for a concise introduction to Applied Multivariate Analysis. In this document I have tried to stay true to their approach to Statistics, which is best summarised by the quotation below:
“If the results disagree with informed opinion, do not admit a simple logical explanation, and do not show up clearly in a graphical presentation, they are probably wrong. There is no magic about numerical methods, and many ways in which they can break down. They are a valuable aid to the interpretation of data, not sausage machines automatically transforming bodies of numbers into packets of scientific fact”
— F.H.C. Marriott (Marriott 1974, 89)
It is an unfortunate truth, especially from a Mathematical Perspective, that solving Business Analytics problems requires more than a careful statistical analysis. For example data needs to be extracted from disparate Software Systems and Analytical results need to be published to the Business as re-usable Analytical Tools. In large businesses, the responsibility for this lies with the IT department.
It is my contention however that relying on a non-mathematician, Software Developer to translate Analytical Projects into code is fraught with danger. The resulting Analytics Tool is usually a significant simplification of the initial analysis. This is a consequence of time spent training the developer and accomodating additional testing.
In this document I explore the innovative R Studio technology as a solution to the Analytics Tool development problem. For me, R Studio is a 3D printer for Analytics projects. It empowers the Analyst to perform both statistically rigorous analyses but also to act as a developer and publish results in customised, interactive tools. As a demonstration of the capabilities of R Studio, this document has been written entirely from within R and publised as a website. It includes not only advanced statistical analyses and visualisations but also customised, interactive Analytics tools.
There are two key ideas in this Disseration which I consider innovatice. The first is to take the perspective of the Business Analyst when discussing Statistical Methods. The reality is that Analytical teams in Business act mainly as “Digital Controllers”. They assist in the selection and evaluation of new cutting edge technologies and leave technology development to specialist 3rd party vendors. Business Analysts do not require cutting edge Machine Learning skills. They do however need to apply certain Mutlivariate Analysis techniques to successfully perform a technology selection and evaluation role. I discuss several of these key techniques in my Disseration.
The second innovation is to identify “empowering the Analyst to build and publish his own Analytics tools” as a novel example of Mass Customisation. Mass Customisation is a new and exciting concept from Operations Strategy which is taught in leading Business Schools. It has the potential both to radically reduce the cost of delivering customised Software Tools but also increase the degree of customisation. At the time of writing, purchasing a new Analytics Tool for Insuranace often costs in excess of €500k and a data enrichment service provider may quote €100k for an initial evaluation. This is way too expensive and motivates my detailed exploration of the R Studio system as a way to achieve Mass Customisation in Business Analytics.
I would like to dedicate my thesis to Sir Walter Tyrell of England. Misadventure is a part of everyone’s life but not many people accidentally kill a king!!! The tree responsible is still alive in the New Forest in England!
References :
Johnson, Richard Arnold, Dean W Wichern, and others. 2014. Applied Multivariate Statistical Analysis. Vol. 4. Prentice-Hall New Jersey.
Marriott, F.H.C. 1974. The Interpretation of Multiple Observations. London Academic Press.