In: Statistics and Probability
Describe the advantages of using R to perform basic statistical analysis, as compared to using Microsoft Excel's Data Analysis add-in Descriptive Statistics tool. Provide specific examples that justify the advantages you have described.
The advantages of using R to perform basic statistical analysis, as compared to using Microsoft Excel's Data Analysis add-in Descriptive Statistics tool are :-
1. R can handle very large datasets
Excel is limited in that there are only so many rows and columns
per spreadsheet. So when you run out of rows/columns, you’re forced
to move to a new tab or a new file. While it’s debatable that
needing that many rows or columns of data is unlikely in most
circumstances, there are cases where datasets grow over time and
eventually the excel spreadsheet will not be able to contain all of
that data.
Bottom line: The Excel spreadsheet is finite and
this limits the datasets you can use.
2. R can automate and calculate much faster than Excel
Point 1 brings us to Point 2: I can’t tell you the number of
times I’ve had a gigantic file crash because it contains up to 20
tabs chock-full of data, including a Pivot Table, a tab that
contains over 6 years’ worth of pricing for 3,000+ products, and
countless formulas throughout. Naturally, the file crashes due to
the fact that Excel can handle a certain amount of data, but can
barely function properly when you use it to capacity. This creates
a serious problem when you start losing data because the file seems
unable to save when you add any more data to it.
Bottom line: R is able to not only handle huge
datasets but can still run efficiently while doing so.
3. R source code is reproducible
Research any number of R advocate blogs and you’ll find this
point is a big one. R source codes can be used repeatedly and with
very different datasets in ways that Excel formulas and VBA source
codes cannot. There are statistical source codes available that can
be applied to any dataset with only a few changes to code and
reference data that can then be reapplied several times over very
easily. While VBA can run virtually anything R can, it can be much
more time consuming, and also limited similarly to Excel. R also
has an advantage in that it shows the data and analysis separately,
while Excel shows them together (data within formulas).This allows
the user to view the data more clearly to correct any errors or see
the progression of the data.
Bottom line: Reproducibility of R source code is
much more advanced and easy to use than Excel or VBA.
4. Community libraries worth of R source code are available to all
R has been growing in usage and popularity over the past several
years and with that, the number of users adding new functions to
the available packages and libraries has also increased. This
allows any R users access to not only basic statistical functions,
but to an increasing number of complex new functions that may be
applicable to their data. This creates a community of R users who
are extending their knowledge easily to other R users who may
require a similar solution to their data.
Bottom line: R promotes sharing of functions to
expand libraries with new and different reproducible statistical
functions.
5. R provides more complex and advanced data visualization
Excel can produce several types of basic graphs once you chop up
and select the exact data you want to analyze. R is designed to
much more easily produce graphs without all the pre-graph work, as
well as provide more types of graphs than you’d ever know what to
do with. Take a look here
(http://shinyapps.stat.ubc.ca/r-graph-catalog/ ) to see the types
of graphs R can create. Of course, Excel is perfectly sufficient
when it comes to showing simple, straightforward data analysis, but
R can take very complicated data and turn it into much easier to
understand visual representation.
Bottom line: R can provide advanced data
visualization for more complex datasets.