In: Computer Science
For binary data, the L1 distance corresponds to the Hamming distance; that is, the number of bits that are different between two binary vectors. For the following two binary vectors, compute:
X = 10011011
Y = 01011000
1 - Hamming distance
2 - Jaccard Similarity Coefficient (JSC) and Simple Matching Coefficient (SMC)
3 - Cosine Similarity
4 - L2 (Euclidean) and L∞ (Supermum) distances
5 - Correlation between X and Y
Answer- (the question is lengthy, so as per guidelines i am answering first four parts)
given,
X = 10011011
Y = 01011000
1.
Hamming distance- (the number of bits that are different between two binary vectors) = 4
------------------------------------------------------
2.
Jaccard Similarity Coefficient (JSC) and Simple Matching Coefficient (SMC)
For computing the similaritie we need following quantities,
B01 = no. of attributes where X is 0 and Y is 1
B10 = no. of attributes where X is 1 and Y is 0
B00 = no. of attributes where X is 0 and Y is 0
B11 = no. of attributes where X is 1 and Y is 1
X | 1 | 0 | 0 | 1 | 1 | 0 | 1 | 1 |
Y | 0 | 1 | 0 | 1 | 1 | 0 | 0 | 0 |
B01 = 1 B10 = 3 B11= 2 B00 = 2
Jaccard Similarity Coefficient (JSC)= J = no. of 11 matches / no. of not-both-zero attributes values
J= (B11) / (B01 + B10+ B11) = 2/6 = 0.33
Simple Matching Coefficient (SMC)= number of matches / number of attributes
SMC = (B11 + B00) / (B01 + B10+ B11 + B00) = 4/8 = 0.5
-------------------------------------------------
3.
Cosine Similarity-If X and Y are two binary vectors ,then
Cos (X, Y) = (X.Y) / ( ||X|| ||Y|| ) , where . (dot) indicates dot product, ||X|| indicates length of vector X
Here, X.Y= (1.0 + 0.1 + 0.0 + 1.1 + 1.1 + 0.0 + 1.0 + 1.0) = 2
||X|| = ( 1*1 + 0*0 + 0*0 + 1*1 + 1*1 + 0*0 + 1*1 + 1*1)1/2 = (5 )1/2 = 2.24
||Y|| = (0*0 + 1*1 + 0*0 + 1*1 + 1*1 + 0*0 + 0*0 + 0*0)1/2 = (3)1/2 = 1.73
Cosine Similarity = 2/ ( 2.24 X 1.73) = 0.516
-----------------------------------------------
4.
L2 (Euclidean)