Question

In: Computer Science

1, In the vector space models, you can use concepts or terms as basic vectors. Describe...

1, In the vector space models, you can use concepts or terms as basic vectors. Describe the advantages and disadvantages of these two types of vectors with respect to each other.

2. Consider following two words: {precision, precise}. Shall we cluster them together if we set-up the similarity threshold to be 0.5? Please justify your answer. (Hint: use the dice coefficient to compute the similarity.)

Solutions

Expert Solution

Ans: The Solution are given below.

1) Vector Space Models

a) Definition

Vector space model is an algebraic model for representing text documents (any objects) as vectors of identifiers. It is used in information filtering, information retrieval, indexing and relevancy rankings.

Documents and queries are represented as vectors.

Each dimension corresponds to a separate term. If a term occurs in the document, its value in the vector is non-zero. Several different ways of computing these values, also known as (term) weights as j and q in above equation, have been developed. One of the best known schemes is tf-idf weighting.

The definition of term depends on the application. Typically terms are single words, keywords, or longer phrases. If words are chosen to be the terms, the dimensionality of the vector is the number of words in the vocabulary.

Application

b) Advantages

1. Simple model based on linear algebra

2. Term weights not binary

3. Allows computing a continuous degree of similarity between queries and documents

4. Allows ranking documents according to their possible relevance

5. Allows partial matching

c) Disadvantages

1. Long documents are poorly represented because they have poor similarity values (a small scalar product and a large dimensionality)

2. Search keywords must precisely match document terms; word substrings might result in a "false positive match"

3. Semantic sensitivity; documents with similar context but different term vocabulary won't be associated, resulting in a "false negative match".

4. The order in which the terms appear in the document is lost in the vector space representation.

5. Theoretically assumes terms are statistically independent.

6. Weighting is intuitive but not very formal.

Two Types of Vectors

1. Dot Product

The dot product of two vectors a and b (sometimes called the inner product, or since its result is a scalar, the scalar product) is denoted by a . b and is defined as:

where θ is the measure of the angle between a and b (see trigonometric function for an explanation of cosine). Geometrically, this means that a and b are drawn with a common start point, and then the length of a is multiplied with the length of the component of b that points in the same direction as a.

The dot product can also be defined as the sum of the products of the components of each vector as

The dot product of two vectors a and b (sometimes called the inner product, or, since its result is a scalar, the scalar product) is denoted by ab and is defined as:

where θ is the measure of the angle between a and b (see trigonometric function for an explanation of cosine). Geometrically, this means that a and b are drawn with a common start point, and then the length of a is multiplied with the length of the component of b that points in the same direction as a.

The dot product can also be defined as the sum of the products of the components of each vector as

2. Cross Product

The cross product (also called the vector product or outer product) is only meaningful in three or seven dimensions. The cross product differs from the dot product primarily in that the result of the cross product of two vectors is a vector. The cross product, denoted a × b, is a vector perpendicular to both a and b and is defined as

where θ is the measure of the angle between a and b, and n is a unit vector perpendicular to both a and b which completes a right-handed system. The right-handedness constraint is necessary because there exist two unit vectors that are perpendicular to both a and b, namely, n and (−n).

The cross product a × b is defined so that a, b, and a × b also becomes a right-handed system (although a and b are not necessarily orthogonal). This is the right-hand rule.

The length of a × b can be interpreted as the area of the parallelogram having a and b as sides.

The cross product can be written as

For arbitrary choices of spatial orientation (that is, allowing for left-handed as well as right-handed coordinate systems) the cross product of two vectors is a pseudovector instead of a vector

2) Cluster the Precision and Precise

where nt is the number of character bigrams found in both strings, nx is the number of bigrams in string x and ny is the number of bigrams in string y. For example, to calculate the similarity between:

precision

precise

We would find the set of bigrams in each word:

{pr,ec,is,ion}

{pr,ec,is,e}

simiilarity threshold = 0.5

S = 2Nt / (Nx + Ny)


Related Solutions

What is a vector space? Provide an example of a finite-dimensional vectors space and an infinite-...
What is a vector space? Provide an example of a finite-dimensional vectors space and an infinite- dimensional vector space.
Consider an algebra where the vector space is ℝ3 and the multiplication of vectors is the...
Consider an algebra where the vector space is ℝ3 and the multiplication of vectors is the conventional cross product you learned as a beginning physics student. Find the structure constants of this algebra.
A basis of a vector space V is a maximal linearly independent set of vectors in...
A basis of a vector space V is a maximal linearly independent set of vectors in V . Similarly, one can view it as a minimal spanning set of vectors in V . Prove that any set S ⊆ V spanning a finite-dimensional vector space V contains a basis of V .
Define a subspace of a vector space V . Take the set of vectors in Rn...
Define a subspace of a vector space V . Take the set of vectors in Rn such that th coordinates add up to 0. I that a subspace. What about the set whose coordinates add up to 1. Explain your answers.
In this assignment, you implement a 2D-matrix as a vector of vectors, and only use at()...
In this assignment, you implement a 2D-matrix as a vector of vectors, and only use at() to access its elements. Write a program that multiplies a 2D matrix with a vector. If you need to see the math, follow this link: https://mathinsight.org/matrix_vector_multiplication (Links to an external site.) For simplicity, our matrix will be of size 3 x 3. Initialize the matrix as shown in to become [1.0, 2.0, 3.0] [4.0 ,5.0 ,6.0] [7.0, 8.0, 9.0] Read the three values of...
In the real vector space R 3, the vectors u1 =(1,0,0) and u2=(1,2,0) are known to...
In the real vector space R 3, the vectors u1 =(1,0,0) and u2=(1,2,0) are known to lie in the span W of the vectors w1 =(3,4,2), w2=(0,1,1), w3=(2,1,1) and w4=(1,0,2). Find wi, wj ?{w1,w2,w3,w4} such that W = span({u1,u2,wk,wl}) where {1,2,3,4}= {i,j,k,l}.
Describe the ideal gene therapy vector and give examples of existing vectors
Describe the ideal gene therapy vector and give examples of existing vectors
Write each vector as a linear combination of the vectors in S. (Use s1 and s2,...
Write each vector as a linear combination of the vectors in S. (Use s1 and s2, respectively, for the vectors in the set. If not possible, enter IMPOSSIBLE.) S = {(1, 2, −2), (2, −1, 1)} (a)    z = (−8, −1, 1) z = (b)    v = (−2, −5, 5) v = (c)    w = (1, −23, 23) w = (d)    u = (2, −6, −6) u =
Write each vector as a linear combination of the vectors in S. (Use s1 and s2,...
Write each vector as a linear combination of the vectors in S. (Use s1 and s2, respectively, for the vectors in the set. If not possible, enter IMPOSSIBLE.) S = {(1, 2, −2), (2, −1, 1)} (a)    z = (−13, −1, 1) z =      (b)    v = (−1, −5, 5) v =      (c)    w = (−2, −14, 14) w =      (d)    u = (1, −4, −4) u =     
Describe the basic concepts of cell differentiation and maturation.
Describe the basic concepts of cell differentiation and maturation.
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT