In: Math
Discuss the advantages and disadvantages of using R to analyze data compared to a spreadsheet tool such as Microsoft Excel or Tableau. Provide specific examples to illustrate your ideas.
Comparisons between R and MS Excel to analyse data:
1. The Excel spreadsheet is finite and this limits the datasets you can use.
R can handle very large datasets but Excel is limited in that there are only so many rows and columns per spreadsheet. So when you run out of rows/columns in Excel, you’re forced to move to a new tab or a new file and hence chunking of data takes place which makes it cumbersome while performing analysis on the complete data-set. Moreover datasets grow over time as the data collection online is increasing at a staggering place and eventually the excel spreadsheet will not be able to contain all of that data and in case we reach the off-sets of Excel then there are severe latency issues.
2. R is able to not only handle huge datasets but can still run efficiently while doing so.
R can automate and calculate much faster than Excel. Point 1 brings us to Point 2: The Excel file crashes when it contains up to 20 tabs chock-full of data, including a Pivot Table, a tab that contains over 6 years’ worth of pricing for 3,000+ products, and countless formulas throughout. Naturally, the file crashes due to the fact that Excel can handle a certain amount of data, but can barely function properly when you use it to capacity. This creates a serious problem when you start losing data because the file seems unable to save when you add any more data to it. So to conclude, R is able to not only handle huge datasets but can still run efficiently while doing so.
3. Reproducibility of R source code is much more advanced and easy to use than Excel or VBA.
R source codes can be used repeatedly and with very different datasets in ways that Excel formulas and VBA source codes cannot. There are statistical source codes available that can be applied to any dataset with only a few changes to code (if you are running in Python) and reference data that can then be reapplied several times over very easily. While VBA can run virtually anything R can, it can be much more time consuming, and also limited similarly to Excel. R also has an advantage in that it shows the data and analysis separately, while Excel shows them together (data within formulas).This allows the user to view the data more clearly to correct any errors or see the progression of the data.
4. R promotes sharing of functions to expand libraries with new and different reproducible statistical functions.
R has been growing in usage and popularity over the past several years and with that, the number of users adding new functions to the available packages and libraries has also increased. This allows any R users access to not only basic statistical functions, but to an increasing number of complex new functions that may be applicable to their data. This creates a community of R users who are extending their knowledge easily to other R users who may require a similar solution to their data. Even though there is support for Excel but the count for R is much more.
5. R can provide advanced data visualization for more complex datasets.
Excel can produce several types of basic graphs once you chop up and select the exact data you want to analyze. R is designed to much more easily produce graphs without all the pre-graph work, as well as provide more types of graphs than you’d ever know what to do with.Excel is perfectly sufficient when it comes to showing simple, straightforward data analysis, but R can take very complicated data and turn it into much easier to understand visual representation.
6. R is free, Excel is not.
R can be downloaded by anyone anywhere on any platform (even more platforms than Excel).