Question

In: Computer Science

Consider the given matrix (the top 250 rated movies according to IMDB). Consider a sample consist...

Consider the given matrix (the top 250 rated movies according to IMDB). Consider a sample consist of 6 movies, after pre-processing create a data matrix for genre and movies.

  1. Identify the attributes type either symmetric or asymmetric.
  2. calculate data similarity on the basis of attribute type.(perform all steps)    

Movie

Action

crime

drama

fantasy

1

0

1

1

0

2

1

0

1

1

3

1

1

1

1

4

0

1

1

0

5

0

1

1

0

6

0

0

1

0

Solutions

Expert Solution

# A binary variable is symmetric if both of its values are equally possible, there is no preference on which outcome should be coded as 1.

# A binary variable is asymmetric if both of its values are not equally possible , there is preference on which outcome should be coded as 1. example positive and negative sentiment.

here all attributes are asymetric because we can not give equal importance while coding value as 1 or 0.

so all attributes are asymetric.

object i object j
1 0 sum
1 a b a+b
0 c d c+d
sum a+c b+d p

data similarity for symetric attributes is invariant or simple matching coefficient and

d(i,j) = (b+c) / (a+b+c+d)

for asymetric it will be non-invarient or jaccard coefficient

d(i,j) = (b+c) / (a+b+c)

1) asymetric -> d(i,j) = (b+c) / (a+b+c)

d(1,2) ->

{

a = 1

b = 1

c = 2

d = 0

d(1,2) = (1+2) / (1+1+2) = 3/4 = 0.75

}

2) asymetric -> d(i,j) = (b+c) / (a+b+c)

d(1,3) ->

{

a = 2

b = 0

c = 2

d = 0

d(1,3) = (0+2) / (2+0+2) = 2/4 = 0.5

}

3) asymetric -> d(i,j) = (b+c) / (a+b+c)

d(1,4) ->

{

a = 2

b = 0

c = 0

d = 2

d(1,4) = (0+0) / (2+0+0) = 0

}

4) asymetric -> d(i,j) = (b+c) / (a+b+c)

d(1,5) ->

{

a = 2

b = 0

c = 0

d = 2

d(1,5) = (0+0) / (2+0+0) = 0

}

5) asymetric -> d(i,j) = (b+c) / (a+b+c)

d(1,6) ->

{

a = 1

b = 1

c = 0

d = 2

d(1,6) = (1+0) / (1+1+0) = 0.5

}

You can calculate same for all other possible combinations:

like d(2,3) = (0+1) / (3+0+1) = 1/4 = 0.25

d(2,4) = (2+1) / (1+2+1) = 0.75

d(2,5) = (2+1) / (1+2+1) = 0.75

d(2,6) = (2+0) / (1+2+0) = 2/3 = 66.667

and so on...


Related Solutions

A sample of 250 adults tried the new multigrain cereal Wow! A total of 187 rated...
A sample of 250 adults tried the new multigrain cereal Wow! A total of 187 rated it as excellent. In a sample of 100 children, 66 rated it as excellent. Using the 0.1 significance level, the researcher wishes to show that adults like the cereal better than children. Which of the following is the alternate hypothesis?
3. Consider the investment project given in the below table: An electric motor is rated at...
3. Consider the investment project given in the below table: An electric motor is rated at 10 horsepower (HP) and costs $1,200. Its full-load efficiency is specified to be 85%. A newly designed high-efficiency motor of the same size has an efficiency of 90%, but it costs $1,600. It is estimated that the motors will operate at a rated 10 HP output for 2,000 hours/year, and the cost of energy will be $0.09/kilowatt- hour. Each motor is expected to have...
Consider the matrix A given by [ 2 0 0 ] [ 0 2 3 ]...
Consider the matrix A given by [ 2 0 0 ] [ 0 2 3 ] [ 0 3 10 ] (20) Find all its eigenvalues and corresponding eigenvectors. Show your work. (+5) Write down the entire eigendecomposition (i.e. the matrices X, Lambda, and X inverse) explicitly.
Consider the given matrix. 3    0    0 0    2    0 16   ...
Consider the given matrix. 3    0    0 0    2    0 16    0    1 Find the eigenvalues. (Enter your answers as a comma-separated list.) λ = 1,2,3 Find the eigenvectors. (Enter your answers in order of the corresponding eigenvalues, from smallest eigenvalue to largest.)
Q1: A sample with two variables (x and y) is given in the table. According to...
Q1: A sample with two variables (x and y) is given in the table. According to the sample, please develop an excel spreadsheet to calculate the missing values given in the table of “summary output” attached below. The excel spreadsheet needs to be submitted together with your assignment. Note: use “data analysis toolpak” to check your answers, but calculations, equations and steps should be done for each "?" value x y 1.0 5.2 1.5 7.2 2.0 5.5 2.5 4.9 3.0...
Consider the given matrix. −1    2 −5    1 Find the eigenvalues. (Enter your answers...
Consider the given matrix. −1    2 −5    1 Find the eigenvalues. (Enter your answers as a comma-separated list.) λ = 3i,−3i (I got these right) Find the eigenvectors of the matrix. (Enter your answers in order of the corresponding eigenvalues, from smallest to largest by real part, then by imaginary part.) K1 = K2 = I can't seem to get the eigenvectors right.
Question (a) Consider a random sample of the following data: 254, 261, 250, 258, 253, 257....
Question (a) Consider a random sample of the following data: 254, 261, 250, 258, 253, 257. Calculate the unbiased estimator of the population variance. (1 mark) (b) Suppose the GPA of all students enrolled in a particular course can be modelled by a certain distribution with a mean of 3.4 and variance 0.3. Compute the probability that the mean GPA of a random sample of 40 students selected from this course will be: (i) lower than 3.2 (ii) between 3.3...
Question (a) Consider a random sample of the following data: 254, 261, 250, 258, 253, 257....
Question (a) Consider a random sample of the following data: 254, 261, 250, 258, 253, 257. Calculate the unbiased estimator of the population variance. (b) Suppose the GPA of all students enrolled in a particular course can be modelled by a certain distribution with a mean of 3.4 and variance 0.3. Compute the probability that the mean GPA of a random sample of 40 students selected from this course will be: (i) lower than 3.2 (ii) between 3.3 and 3.6...
We consider a population of cars from a given model year. A sample of 24 such...
We consider a population of cars from a given model year. A sample of 24 such cars has been recently sold. The sale price as a function of the age of the car is in the Excel file S4.XLSX (Car) in the Excel directory. a. Try a linear regression and an exponential (non-linear) regression with Excel to fit these data. Comment your results. b. Which regression model seems to fit the data better and why? c. Run the LINEST function...
Given the two independent samples t-test assumptions, consider the following table of summary statistics. group sample...
Given the two independent samples t-test assumptions, consider the following table of summary statistics. group sample size sample mean sample standard deviation one 13 50.03 9.24 two 13 45.57 9.11 a. At 5% level of significance, do the data suggest the difference between the polulation means of group one and group two is 5.0? Perform the test under the assumption that the two population variances are equal. b. Provide a 95% confidence interval for the difference between the two population...
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT