In: Statistics and Probability
I have a project to do, but I have gone through kaggle to look for the datset. Kindly help me for a dataset that I can do all these processes: 1. Data Cleaning, 2. Data exploration 3. Principal component analysis 4. Multiple linear regression 5. Predictive Analsysis using MSE, Lasso, ridge, and AIC and BIC test KNE model ETC. please help me with the dataset. I will do the work for myself. The dataset can either be on economy, health, demographic, Transportation, etc. Please the dataset must get some amount of cleaning process?
i would suggest you the data set available on kaggle for Sales of Walmart
Here is the link:
https://www.kaggle.com/gcarra/walmart
Reasons why i am suggesting the dataset:
1. This data is a multi variable data for weekly sales of walmart for several stores and departments
2. This data has a lot of missing values, with these you can learn systematic imputation of missing values using basic statistical methods like imputation with central tendency, or with advanced algorithm like missForest
3. Data has few outliers too, where you can actually understand the concept of outliers, when to remove an dwhen to not, since it's a sales data, here outliers might be the hype in sales due to festive season
4. Since, the dataset has 45 stores and more than 90 departments, the exploration part becomes very important and interesting to notice which all departments have the same behavior and how the size and type of store affects the sales
5. This data is a time series data with cross-sectional observations. Here you can also apply PCA
6. Along with the time series the data consist of independent variables, where you can try your hand at models like ARIMAX
7. Also the independent variables are not just numeric, they are mix of numeric and categorical, so you can apply all the advanced regression techniques.
this data set is very good to deeply understand the statistical concepts application as well as the effects of factors on sales
I hope the data set will help you for your project. All the best