Question

In: Computer Science

For a perceptron algorithm with a fixed learning rate(r), prove or disprove that if the classes...

For a perceptron algorithm with a fixed learning rate(r), prove or disprove that if the classes are not linearly separable, it may not converge. Can you modify the learning rate(r) so that the iterations converge in expectation? Explain.

Solutions

Expert Solution

Perceptron Learning Algorithm The input to the Perceptron Learning Algorithm is a data set of n ě 1 points (each ddimensional), and their associated labels (` or ´). For mathematical convenience, we associate the ` label with the number `1, and the ´ label with the number ´1. Hence, we may take our input to be px 1 , y1 q,px 2 , y2 q, . . . ,px n , yn q, where for i P t1, 2, . . . , nu, x i P R d and y i P t´1, 1u. To entertain any hope of our algorithm succeeding, we must assume that our input data points are linearly separable. Consistent with our choice of ignoring the bias term in the Perceptron, we shall assume that not only are the input data points linearly separable, they can indeed be separated by a hyperplane that passes through the origin. After all, our Perceptron can only represent origin-centred hyperplanes!

Assumption 1 (Linear Separability). There exists some w‹ P R d such that ||w‹ || “ 1 and for some γ ą 0, for all i P t1, 2, . . . , nu, y i pw‹ ¨ x i q ą γ. Taking w‹ to be unit-length is not strictly necessary; it merely simplifies our subsequent analysis. Observe that the condition on each data point i essentially amounts to saying that its prediction (that is, signpw‹ ¨x i q) matches its label (y i ). Requiring y i pw‹ ¨x i q to be strictly positive implies we can find a hyperplane such that none of the data points actually lie on the hyperplane. If the separating hyperplane must necessarily pass through some of the data points, note that the Perceptron’s predictions for these points would depend on whether we assign signp0q to be 0 or 1—which seems an arbitrary choice. If our input points are “genuinely” linearly separable, it must not matter, for example, what convention we adopt to define signp¨q, or if we interchange the labels of the ` points and the ´ points. This is the demand Assumption 1 places on our data set. The quantity γ is introduced in the assumption as a place-holder for the minimum value of the y i pw‹ ¨ x i q. Our analysis will yield an upper bound on the convergence time of the Perceptron Learning Algorithm that relates inversely with γ. The bound will also depend on the distance between the input points and the origin; we shall assume that this distance is at most R.

In unsupervised learning, the weights and biases are modified in response to network inputs only. There are no target outputs available. At first glance this might seem to be impractical. How can you train a network if you don’t know what it is supposed to do? Most of these algorithms perform some kind of clustering operation. They learn to categorize the input patterns into a finite number of classes. This is especially useful in such applications as vector quantization

for more information i am attaching links of the paper from where i got the above information. If you want more than comment and i will reply.

links : (http://hagan.okstate.edu/4_Perceptron.pdf) For linear seprable

(https://www.cse.iitb.ac.in/~shivaram/teaching/old/cs344+386-s2017/resources/classnote-1.pdf) for modification in learning rate.

(https://page.mi.fu-berlin.de/rojas/neural/chapter/K4.pdf) Extra


Related Solutions

1. Prove or disprove: if f : R → R is injective and g : R...
1. Prove or disprove: if f : R → R is injective and g : R → R is surjective then f ◦ g : R → R is bijective. 2. Suppose n and k are two positive integers. Pick a uniformly random lattice path from (0, 0) to (n, k). What is the probability that the first step is ‘up’?
prove or disprove using logical equivalences (a) p ∧ (q → r) ⇐⇒ (p → q)...
prove or disprove using logical equivalences (a) p ∧ (q → r) ⇐⇒ (p → q) → r (b) x ∧ (¬y ↔ z) ⇐⇒ ((x → y) ∨ ¬z) → (x ∧ ¬(y → z)) (c) (x ∨ y ∨ ¬z) ∧ (¬x ∨ y ∨ z) ⇐⇒ ¬y → (x ↔ z)
How the weight updation equation impact on the convergence of Hebb’s learning rule, perceptron learning rule...
How the weight updation equation impact on the convergence of Hebb’s learning rule, perceptron learning rule and ADALINE
Prove or disprove if B is a proper subset of A and there is a bijection...
Prove or disprove if B is a proper subset of A and there is a bijection from A to B then A is infinite
Prove or disprove that the union of two subspaces is a subspace. If it is not...
Prove or disprove that the union of two subspaces is a subspace. If it is not true, what is the smallest subspace containing the union of the two subspaces.
Determine whether each statement is true or false, and prove or disprove, as appropriate. (a) (∀x∈R)(∃y∈R)[xy=1].(∀x∈R)(∃y∈R)[xy=1]....
Determine whether each statement is true or false, and prove or disprove, as appropriate. (a) (∀x∈R)(∃y∈R)[xy=1].(∀x∈R)(∃y∈R)[xy=1]. (b) (∃x∈R)(∀y∈R)[xy=1].(∃x∈R)(∀y∈R)[xy=1]. (c) (∃x∈R)(∀y∈R)[xy>0].(∃x∈R)(∀y∈R)[xy>0]. (d) (∀x∈R)(∃y∈R)[xy>0].(∀x∈R)(∃y∈R)[xy>0]. (e) (∀x∈R)(∃y∈R)(∀z∈R)[xy=xz].(∀x∈R)(∃y∈R)(∀z∈R)[xy=xz]. (f) (∃y∈R)(∀x∈R)(∃z∈R)[xy=xz].(∃y∈R)(∀x∈R)(∃z∈R)[xy=xz]. (g) (∀x∈Q)(∃y∈Z)[xy∈Z].(∀x∈Q)(∃y∈Z)[xy∈Z]. (h) (∃x∈Z+)(∀y∈Z+)[y≤x].(∃x∈Z+)(∀y∈Z+)[y≤x]. (i) (∀y∈Z+)(∃x∈Z+)[y≤x].(∀y∈Z+)(∃x∈Z+)[y≤x]. (j) (∀x,y∈Z)[x<y⇒(∃z∈Z)[x<z<y]].(∀x,y∈Z)[x<y⇒(∃z∈Z)[x<z<y]]. (k) (∀x,y∈Q)[x<y⇒(∃z∈Q)[x<z<y]].(∀x,y∈Q)[x<y⇒(∃z∈Q)[x<z<y]].
In this exercise, we will prove the Division Algorithm for polynomials. Let R[x] be the ring...
In this exercise, we will prove the Division Algorithm for polynomials. Let R[x] be the ring of polynomials with real coefficients. For the purposes of this exercise, extend the definition of degree by deg(0) = −1. The statement to be proved is: Let f(x),g(x) ∈ R[x][x] be polynomials with g(x) ? 0. Then there exist unique polynomials q(x) and r(x) such that f (x) = g(x)q(x) + r(x) and deg(r(x)) < deg(g(x)). Fix general f (x) and g(x). (a) Let...
Algorithm problem 7 [Problem3-4] Let ƒ(n) and g(n) be asymptotically positive functions. Prove or disprove each...
Algorithm problem 7 [Problem3-4] Let ƒ(n) and g(n) be asymptotically positive functions. Prove or disprove each of the following conjectures. e ƒ(n)∈O((ƒ(n))^2). f ƒ(n)∈O(g(n)) implies g(n)∈Ω(ƒ(n)). g ƒ(n)∈Θ(ƒ(n/2)). h ƒ(n)+ o(ƒ(n))∈Θ(ƒ(n)).
Prove or disprove the statements: (a) If x is a real number such that |x +...
Prove or disprove the statements: (a) If x is a real number such that |x + 2| + |x| ≤ 1, then x 2 + 2x − 1 ≤ 2. (b) If x is a real number such that |x + 2| + |x| ≤ 2, then x 2 + 2x − 1 ≤ 2. (c) If x is a real number such that |x + 2| + |x| ≤ 3, then x 2 + 2x − 1 ≤ 2....
prove or disprove: every coset of xH is a subgroup of G
prove or disprove: every coset of xH is a subgroup of G
ADVERTISEMENT
ADVERTISEMENT
ADVERTISEMENT