Question

In: Computer Science

In real-world data, tuples with missing values for some attributes are a common occurrence. Describe various...

  1. In real-world data, tuples with missing values for some attributes are a common occurrence. Describe various methods for handling this problem.

Solutions

Expert Solution

If You have Any Query Regarding this please ask in comment section I will be there to solve all your query in comment section immediately hope you will like it

Real-world data tends to be incomplete, noisy, and inconsistent and an important task when preprocessing the data is to fill in missing values, smooth out noise and correct inconsistencies.

  • Substitute a value such as mean.

When the percentage is large and also when it makes sense to do something to avoid bias modeling results, substituting a value (e.g. mean, median) is a commonly used way. But this method could cause bias distribution and variance. That’s where the following imputation methods come in.

  • Predict missing values.

Depending on the type of the imputed variable (i.e. continuous, ordinal, nominal) and missing data pattern (i.e. monotone, non-monotone), below are a few commonly used models. If you plan to do it in SAS, there are SAS codes that you can write to identify the missing data pattern.

  • Logistic Regression
  • Discriminant Regression
  • Markov Chain Monte Carlo (MCMC)

(a) Ignoring the tuple: This is usually done when the value is missing. This method is not very effective unless the tuple contains several attributes with missing values. It is especially poor when the percentage of missing values per attribute varies considerably.

(b) Manually filling in the missing value: In general, this approach is time-consuming and may not be a reasonable task for large data sets with many missing values, especially when the value to be filled

in is not easily determined.

(c) Using a global constant to fill in the missing value: Replace all missing attribute values by the same constant, such as a label like \Unknown," Hence, although this method is simple, it is not recommended.

(d) Using the attribute mean for quantitative (numeric) values or attribute mode for categorical (nominal) values: For example, suppose that the average income of AllElectronics customers is $28,000. Use this value to replace any missing values for income.

(e) Using the most probable value to fill in the missing value

I hope You understand very well but if you have any query ask me in comment but and if you like the answer so do thumbs up for your support thank you.


Related Solutions

In some areas of the world, business practices that are contrary to Western values are common....
In some areas of the world, business practices that are contrary to Western values are common. For example, child labor, payment of bribes to government officials, and sex or race discrimination in hiring and promotion. Should U.S. corporations and their expatriate representatives refuse to engage in such practices even if doing so would put the firm at a competitive disadvantage? If sources are used, please list them.
In dealing with large data sets, addressing missing values is an important step. But, some datasets...
In dealing with large data sets, addressing missing values is an important step. But, some datasets contain variables that have a large amount of missing values. In other words, several rows of the dataset have missing values. In such cases, dropping the variable with missing values will lead to a loss of significant data. Imputing the missing values might also be useless, as these imputations will be based on a small number of records. In such cases, what alternatives can...
Minimizing missing data: Here are some types of missing data that you might encounter when implementing...
Minimizing missing data: Here are some types of missing data that you might encounter when implementing a clinical trial. Pick two, and briefly describe a study procedure you could use to minimize the chance of that type of missing data occurring. 1. A participant does not show up for a study visit. 2. A participant does not bring important information (for example, a list of current medications or a pain diary that was supposed to be filled out). 3. Inadequate...
Calculate the missing values: (5pts) Nominal and real interest rates. Provide your calculations. Real Interest Rate...
Calculate the missing values: (5pts) Nominal and real interest rates. Provide your calculations. Real Interest Rate Nominal Interest Rate Inflation 2 6 15 10 8 13 (10 pts) Per capita GDP: Using the following data, organize from the largest to smallest, according the rate of growth of GDP per capita in each year. Explain your answer. (5pts) Year Australia Belgium Brazil Colombia France Japan United States GDP growth (annual %) 2013 2.44 0.02 2.74 4.94 0.66 1.61 2.22 2014 2.50...
Describe some behavioral science principles that support team building. Please give some real world examples of...
Describe some behavioral science principles that support team building. Please give some real world examples of why these principles might be beneficial.
Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15]...
Create a pandas dataframe and then impute missing values . data = { 'test' : [1,2,3,4,10,15] 'missing' : [1,2,4,None,5,7] } replace the missing values in the missing table column with mean values using mean imputation ============ i am trying like this but i am not getting correct output and getting confused please explain with proper output and explanation import pandas as pd pd.DataFrame(data) temp = pd.DataFrame(data).fillna(np.mean()) temp ['missing'] . fillna(temp['missing'].mean()) ================ i am too much confused please write proper program...
The following is a statement of common shareholders’ equity with some numbers missing (in millions of...
The following is a statement of common shareholders’ equity with some numbers missing (in millions of dollars). Balance, December 31, 2011 ? Net income ? Common dividends (132) Preferred dividends (30) Issue of common stock 155 Unrealized gain on securities held for sale 13 Foreign currency translation loss (9) Balance, December 31, 2012 ? a. The market value of the equity was $4,500 million at December 31, 2011, and $5,580 million at December 31, 2012. At both dates, the equity...
1) Describe a real-world example that uses one of the Data Mining Tasks and why is...
1) Describe a real-world example that uses one of the Data Mining Tasks and why is this task best suited to this example? PLEASE EXPLAIN IN DETAIL.
Why have some values changed in the Cost per Incident and Frequency of Occurrence columns? How...
Why have some values changed in the Cost per Incident and Frequency of Occurrence columns? How could a control affect one but not the other? Assume that the values in the Cost of Control column are unique costs directly associated with protecting against the threat. In other words, don’t consider overlapping costs between controls. Calculate the CBA for the planned risk control approach in each threat category. For each threat category, determine whether the proposed control is worth the costs....
Topic: Real-World Monopolies Describe an example of a real-world industry or market that would be considered...
Topic: Real-World Monopolies Describe an example of a real-world industry or market that would be considered by economists to be a natural monopoly. What characteristics of the industry make it a monopoly? What is the impact of the monopoly power on its customers? Why might government want to regulate natural monopolies? How might such regulation be structured?
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT