In: Computer Science
Managing Data (R language)
Reference (Chapter : 4 Managing Data from textbook - Practical
Data Science With R, 1st edition By Nina Zumel and John Mount
Publisher: Manning, ISBN 13: 978-1-617291-56-2)
1 . Explain how you would handle missing data in categorical and numerical variables?
2. Give few data transformation techniques and cases where you would be applying them.
3. Briefly explain the log transformation and when it should be used.
Ans1:
There is various ways to handle missing values of categorical ways.
In case of missing values for numeric variables, we perform following steps to handle it.
Ans:2
The logarithm and square root transformations are commonly used for positive data, and the multiplicative inverse (reciprocal) transformation can be used for non-zero data
They are used in following cases:-
1. Data transformation is applied is when a value of interest ranges is over several orders of magnitude .
2.Transforming to normality
3.Transforming to a uniform distribution or an arbitrary distribution
4.Variance stabilizing transformations
Ans:3
The log transformation is, arguably, the most popular among the different types of transformations used to transform skewed data to approximately conform to normality. If the original data follows a log-normal distribution or approximately so, then the log-transformed data follows a normal or near normal distribution.
The log transformation can be used to make highly skewed distributions less skewed. This can be valuable both for making patterns in the data more interpretable and for helping to meet the assumptions of inferential statistics.