In: Computer Science
Please answer these questions Data Processing and Analysis using python. Thank you
In statistics, an outlier is a data point that differs significantly from other observations. Outliers can cause serious problems in the analysis of data. Please describe one method that can help you detect outliers in a data set.
Data Processing can be presented in different kinds of encoding such as CSV, XML, HTML, SQL, and JSON, etc. For each case the processing format is different. Python can handle various encoding processes, and different types of modules need to be imported to make these encoding techniques work.
Python is an increasingly popular tool for data analysis. In recent years, a number of libraries have reached maturity, allowing R and Stata users to take advantage of the beauty, flexibility, and performance of Python without sacrificing the functionality these older programs have accumulated over the years
Problems in the analysis of data :
1.The amount of data being collected
2.Collecting meaningful and real-time data.
3. Visual representation of data
4. Data from multiple sources
5. Inaccessible data
6. Poor quality data
7. Pressure from the top
8. Lack of support
9. Confusion or anxiety
10. Budget
11. Shortage of skills
12. Scaling data analysis
Outliers in a data set Detection Method - Z-Score or Extreme Value Analysis (parametric)
The z-score or standard score of an observation is a metric that indicates how many standard deviations a data point is from the sample’s mean, assuming a gaussian distribution. This makes z-score a parametric method. Very frequently data points are not to described by a gaussian distribution, this problem can be solved by applying transformations to data ie: scaling it.