Question

In: Computer Science

Data mining i have a data column where the required mark is 20 but the data...

Data mining

i have a data column where the required mark is 20 but the data fed in to the column is more than 40

, how do you advise me to clean that data,

and also please state what data is it for instance noisy or what and also name the method to solve it and please tell how to clean it

thanks

Solutions

Expert Solution

ANS:-

FOR CLEANSING SUCH INFORMATION WE NEED TO DO DATA CLEANING FUNCTIONS.

DATA CLEANING:-

The quality of your data is important in getting the final analysis. Any data that is often incomplete, noisy and inconsistent can affect your results. Data mining data mining is the process of finding and removing corrupt or inaccurate records from a record set, table or database

DATA CLEANING FUNCTIONS: -

1. Remove Invalid Values

The first and most important thing you have to do is remove the useless pieces of data from your system. Any useless or inactive data is not what you need. The context of your problem may not match.

You may only need to estimate the average age of your sales staff. After that their email address will not be required. Another example might be looking at how many customers you have connected with per month. In this case, you would not need data from the people you reached last month.

However, before you can delete a specific piece of data, make sure it doesn't work because you may need to check its corresponding values ​​over time (to check consistency). And if you can get a second opinion from an experienced expert before extracting data, feel free to do so.

You will not want to cancel certain values ​​and regret the decision later. But once you are sure the data is not working, finish it.


RANGE:

Some types of numbers must be in a certain width. For example, the number of products you can transport per day should be low and high. There will certainly be some data range. There will be a starting point and an end point.

Incompatible column:

If your DataFrame (Database is a two-dimensional data structure, i.e., data aligned with tabular fonts in rows and columns) contains inactive columns or you will never use them to drop them to give more focus to the columns you will be working on. Let's look at an example of how to deal with such data setup. Let’s take an example of student data setup using the pandas DataFrame.


Related Solutions

Matching Match the term in Column I with the definition in Column II.                   Column I Column...
Matching Match the term in Column I with the definition in Column II.                   Column I Column II 1. _____________ conscious a. depressed 2. _____________ BAR b. impaired consciousness with unresponsiveness to stimuli 3. _____________ coma c. bright, alert, and responsive 4. _____________ lethargy d. deep state of unconsciousness 5. _____________ obtunded e. awake, aware, and responsive; also known as alert 6. _____________ disorientation f. condition in which the animal appears mentally confused 7. _____________ stupor g. drowsiness, indifference, and listlessness...
Matching Match the term in Column I with the definition in Column II. Column I Column...
Matching Match the term in Column I with the definition in Column II. Column I Column II 1. _____________ bacteriuria 2. _____________ glycosuria or glucosuria 3. _____________ nocturia 4. _____________ proteinuria 5. _____________ anuria 6. _____________ oliguria 7. _____________ albuminuria 8. _____________ stranguria 9. _____________ polyuria 10. _____________ pyuria 11. _____________ pollakiuria 12. _____________ ketonuria 13. _____________ crystalluria 14. _____________ dysuria 15. _____________ hematuria 16. _____________ diuresis 17. _____________ calculus 18. _____________ cortical 19. _____________ erythropoietin 20. _____________ cystitis a....
ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science...
ID Documents 1 I love data mining 2 The seven dwarves love mining 3 Data science is a hot new career 4 I don't love my major or career Use the corpus of documents shown in the above table to answer the quiz questions below. What is the inverse document frequency (IDF) of the term "love"? (Round your answer to 2 decimal places). What is the TF-IDF value (importance) of the term "data" to document 1? (Round your answer to...
Matching Match the ocular term in Column I with the definition in Column II.                   Column I...
Matching Match the ocular term in Column I with the definition in Column II.                   Column I Column II 1. _____________ palpebra a. iris, ciliary body, and choroid 2. _____________ orbit b. platelike frame within the upper and lower eyelids 3. _____________ cilia c. eyelid 4. _____________ cornea d. eyelashes 5. _____________ conjunctiva e. bony cavity of the skull that contains the eyeball 6. _____________ tarsus f. transparent anterior portion of the sclera 7. _____________ uvea g. mucous membrane that lines...
what are the formulas for each column so i can understand where the numbers are coming...
what are the formulas for each column so i can understand where the numbers are coming from. refer to Question: Historical data indicate that a student's income for any month...part of the answer has been posted on Chegg but the formulas are needed.Thanks! Assuming the student begins the school year with a balance of $1200, use Excel to simulate 12 months of activity and to predict the position of the student at the end of the year. Historical data indicate...
What is data mining? What are the steps in the data mining pipeline? What are the...
What is data mining? What are the steps in the data mining pipeline? What are the different kinds of tasks in data mining? What are the different types of data? How do you manage data quality? What is sampling? How do you design algorithms for sampling? How do you compute similarity and dissimilarity in data? How do you mine frequent itemsets and discover association rules from transaction data? How are space constraints handles in transactional data while designing algorithms to...
I have a problem, and I have the answer but I don't know where the solution...
I have a problem, and I have the answer but I don't know where the solution comes from. ( I have to be able to solve these myself so please help me by answering the questions about the problem.) Here is the answer given to me by the professor: What is the density of SF4 vapor at 650 torr and 100 C? 650 torr (1atm/760 Torr)=0.855atm 100C=373K PV=nRT n/v=P/RT= 0.855atm/0.8206l-atm/molek) (373)=0.0279 mole/l M.W. of SF4=108.1 gm/mole Density=mass/volume 0.0279 mole/l(108.1gm/mole) Answer...
Pickins Mining Pickins Mining is a midsized coal mining company with 20 mines located in Ohio,...
Pickins Mining Pickins Mining is a midsized coal mining company with 20 mines located in Ohio, West Virginia, and Kentucky. The company operates deep mines as well as strip mines. Most of the coal mined is sold under contract, with excess production sold on the spot market. The coal mining industry, especially high-sulfur coal operations such as Pickins, has been hard-hit by environmental regulations. Recently, however, a combination of increased demand for coal and new pollution reduction technologies has led...
Pickins Mining Pickins Mining is a midsized coal mining company with 20 mines located in Ohio,...
Pickins Mining Pickins Mining is a midsized coal mining company with 20 mines located in Ohio, West Virginia, and Kentucky. The company operates deep mines as well as strip mines. Most of the coal mined is sold under contract, with excess production sold on the spot market. The coal mining industry, especially high-sulfur coal operations such as Pickins, has been hard-hit by environmental regulations. Recently, however, a combination of increased demand for coal and new pollution reduction technologies has led...
BETHESDA MINING COMPANY Bethesda Mining is a midsized coal mining company with 20 mines located in...
BETHESDA MINING COMPANY Bethesda Mining is a midsized coal mining company with 20 mines located in Ohio, Pennsylvania, West Virginia, and Kentucky. The company operates deep mines as well as strip mines. Most of the coal mined is sold under contract, with excess production sold on the spot market. The coal mining industry, especially high-sulfur coal operations such as Bethesda, has been hard-hit by environmental regulations. Recently, however, a combination of increased demand for coal and new pollution reduction technologies...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT