Question

In: Computer Science

1.Brief explain the concept of "weak labels" and when such labels are useful.

 

1.Brief explain the concept of "weak labels" and when such labels are useful.

2.Briefly explain the concept of Missing At Random (MAR).

3.Explain what the following code does (assume pandas has been imported, and aliased as pd):

df=pd.read_csv('college.csv', na_values='.')

Solutions

Expert Solution

Please find the answers below.

1.

Weak labels :

The idea behind the weak label is to label millions of training data as imperfect and inexpensive which can be programmatically generated using heuristics, rules-of-thumb, existing databases, ontologies, etc. Weak label addresses the data labeling bottleneck. Weak labels are used to decrease the cost and increase the efficiency of human efforts and also, we can use a large set of data with weak labels to pretrain a neural network and fine tune the parameters with a small amount of data with true labels. There are three types of weak labels.

  • Imprecise Labels
  • Existing Labels (Resources)
  • Inaccurate labels

Weak labels indicate that the data is imperfect, but it can be used to create a strong predictive model.

2.

Missing At Random (MAR) :

Missing at random (MAR) is one of the missing data models or response models which occurs if the probability of being missing is the same only within groups defined by the observed data. It is a systematic relationship between the inclination of missing values and the observed data, but not the missing data. MAR is a more general and realistic than MCAR .  The example of MAR can be consider as a sample of population data , where the probability to be included depends on some known properties. Most of all modern missing data methods start with the MAR assumption.

3.

df=pd.read_csv('college.csv', na_values='.')

read_csv() function converts a .csv file in to a dataframe . na_values is used to find and interpret the missing values of a dataset.

The above code will read the 'college.csv' file and replace all '.' as NAN and store the result inside the df variable.


Related Solutions

explain what "equivalent units" are and how this concept is useful when assigning cost to products...
explain what "equivalent units" are and how this concept is useful when assigning cost to products manufactured in a process environment? Provide an example to illustrate your comments.
Explain about the concept of current service baseline and why it is useful in economics
Explain about the concept of current service baseline and why it is useful in economics
1.The concept of Pareto efficiency is not controversial and is a useful tool to evaluate tax...
1.The concept of Pareto efficiency is not controversial and is a useful tool to evaluate tax policies. Explain the idea of Pareto efficiency and whether or not you agree with this statement. 2.Terry will trade four pizzas for one six-pack of beer and be equally happy. At the same time, Dean will gladly exchange two of his six-packs for six pizzas. Is the allocation of beer and pizza Pareto efficient? Explain and provide a diagram.
1. Explain the problems that may occur when the DBMS does not support the concept of...
1. Explain the problems that may occur when the DBMS does not support the concept of null. 2. What does it mean that a catalog is self-describing? What does the following statement mean? (( TABLES JOIN COLUMNS ) WHERE COLCOUNT < 3 ) [TABNAME, COLNAME]
What is the Chi-Square test? When and how are they useful? Explain.
What is the Chi-Square test? When and how are they useful? Explain.
Concept questions: 1) Acetic acid is a weak acid. Why is a 2M solution of acetic...
Concept questions: 1) Acetic acid is a weak acid. Why is a 2M solution of acetic acid less hazardous than a 2M solution of hydrochloric acid? 2) What species must be present in a buffer? Write a chemical reaction that explains how the buffer reacts with acid and then write a chemical reaction that shows how the buffer reacts with base. Wxplain how a buffer works.
Q1: Explain, with neat sketches, how is Kronig-Penny Model useful to understand the concept of bandgap...
Q1: Explain, with neat sketches, how is Kronig-Penny Model useful to understand the concept of bandgap in solids Q2: Assume that Fermi energy is 0.20eV below the conduction band and Nc =2.8x10-19 cm-3 and ni=1.5x1010 cm-3 for silicon at 300K. Calculate: 1- Probability that electron occupies a state at conduction band edge. 2- Thermal-equilibrium electron concentration. 3- Thermal-equilibrium hole concentration.
Explain how DNA sequencing can be automated by replacing radioactive labels with fluorescent labels (p. 222...
Explain how DNA sequencing can be automated by replacing radioactive labels with fluorescent labels (p. 222 – 223) (Tablet p.250 – 251) e. Explain the use of next-generation sequencing (pyrosequencing) (
Essay: Do not add graphing. 1. Explain the concept of profit maximization when the marginal revenue...
Essay: Do not add graphing. 1. Explain the concept of profit maximization when the marginal revenue equals marginal cost. 2. Differentiate: Average Fixed Cost, Average Variable Cost, and Average Total Cost. 3. Discuss the relationship between utility and price.
Explain why learning Excel could be useful outside of the work environment. Which unit concept did...
Explain why learning Excel could be useful outside of the work environment. Which unit concept did you think was the most difficult to learn? Describe what you needed to do to clarify that concept. Add the helpful information or video link if applicable.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT