In: Computer Science
Consider the given matrix (the top 250 rated movies according to IMDB). Consider a sample consist of 6 movies, after pre-processing create a data matrix for genre and movies.
Movie |
Action |
crime |
drama |
fantasy |
1 |
0 |
1 |
1 |
0 |
2 |
1 |
0 |
1 |
1 |
3 |
1 |
1 |
1 |
1 |
4 |
0 |
1 |
1 |
0 |
5 |
0 |
1 |
1 |
0 |
6 |
0 |
0 |
1 |
0 |
# A binary variable is symmetric if both of its values are equally possible, there is no preference on which outcome should be coded as 1.
# A binary variable is asymmetric if both of its values are not equally possible , there is preference on which outcome should be coded as 1. example positive and negative sentiment.
here all attributes are asymetric because we can not give equal importance while coding value as 1 or 0.
so all attributes are asymetric.
object i | object j | ||
1 | 0 | sum | |
1 | a | b | a+b |
0 | c | d | c+d |
sum | a+c | b+d | p |
data similarity for symetric attributes is invariant or simple matching coefficient and
d(i,j) = (b+c) / (a+b+c+d)
for asymetric it will be non-invarient or jaccard coefficient
d(i,j) = (b+c) / (a+b+c)
1) asymetric -> d(i,j) = (b+c) / (a+b+c)
d(1,2) ->
{
a = 1
b = 1
c = 2
d = 0
d(1,2) = (1+2) / (1+1+2) = 3/4 = 0.75
}
2) asymetric -> d(i,j) = (b+c) / (a+b+c)
d(1,3) ->
{
a = 2
b = 0
c = 2
d = 0
d(1,3) = (0+2) / (2+0+2) = 2/4 = 0.5
}
3) asymetric -> d(i,j) = (b+c) / (a+b+c)
d(1,4) ->
{
a = 2
b = 0
c = 0
d = 2
d(1,4) = (0+0) / (2+0+0) = 0
}
4) asymetric -> d(i,j) = (b+c) / (a+b+c)
d(1,5) ->
{
a = 2
b = 0
c = 0
d = 2
d(1,5) = (0+0) / (2+0+0) = 0
}
5) asymetric -> d(i,j) = (b+c) / (a+b+c)
d(1,6) ->
{
a = 1
b = 1
c = 0
d = 2
d(1,6) = (1+0) / (1+1+0) = 0.5
}
You can calculate same for all other possible combinations:
like d(2,3) = (0+1) / (3+0+1) = 1/4 = 0.25
d(2,4) = (2+1) / (1+2+1) = 0.75
d(2,5) = (2+1) / (1+2+1) = 0.75
d(2,6) = (2+0) / (1+2+0) = 2/3 = 66.667
and so on...