In: Computer Science
For a perceptron algorithm with a fixed learning rate(r), prove or disprove that if the classes are not linearly separable, it may not converge. Can you modify the learning rate(r) so that the iterations converge in expectation? Explain.
Perceptron Learning Algorithm The input to the Perceptron Learning Algorithm is a data set of n ě 1 points (each ddimensional), and their associated labels (` or ´). For mathematical convenience, we associate the ` label with the number `1, and the ´ label with the number ´1. Hence, we may take our input to be px 1 , y1 q,px 2 , y2 q, . . . ,px n , yn q, where for i P t1, 2, . . . , nu, x i P R d and y i P t´1, 1u. To entertain any hope of our algorithm succeeding, we must assume that our input data points are linearly separable. Consistent with our choice of ignoring the bias term in the Perceptron, we shall assume that not only are the input data points linearly separable, they can indeed be separated by a hyperplane that passes through the origin. After all, our Perceptron can only represent origin-centred hyperplanes!
Assumption 1 (Linear Separability). There exists some w‹ P R d such that ||w‹ || “ 1 and for some γ ą 0, for all i P t1, 2, . . . , nu, y i pw‹ ¨ x i q ą γ. Taking w‹ to be unit-length is not strictly necessary; it merely simplifies our subsequent analysis. Observe that the condition on each data point i essentially amounts to saying that its prediction (that is, signpw‹ ¨x i q) matches its label (y i ). Requiring y i pw‹ ¨x i q to be strictly positive implies we can find a hyperplane such that none of the data points actually lie on the hyperplane. If the separating hyperplane must necessarily pass through some of the data points, note that the Perceptron’s predictions for these points would depend on whether we assign signp0q to be 0 or 1—which seems an arbitrary choice. If our input points are “genuinely” linearly separable, it must not matter, for example, what convention we adopt to define signp¨q, or if we interchange the labels of the ` points and the ´ points. This is the demand Assumption 1 places on our data set. The quantity γ is introduced in the assumption as a place-holder for the minimum value of the y i pw‹ ¨ x i q. Our analysis will yield an upper bound on the convergence time of the Perceptron Learning Algorithm that relates inversely with γ. The bound will also depend on the distance between the input points and the origin; we shall assume that this distance is at most R.
In unsupervised learning, the weights and biases are modified in response to network inputs only. There are no target outputs available. At first glance this might seem to be impractical. How can you train a network if you don’t know what it is supposed to do? Most of these algorithms perform some kind of clustering operation. They learn to categorize the input patterns into a finite number of classes. This is especially useful in such applications as vector quantization
for more information i am attaching links of the paper from where i got the above information. If you want more than comment and i will reply.
links : (http://hagan.okstate.edu/4_Perceptron.pdf) For linear seprable
(https://www.cse.iitb.ac.in/~shivaram/teaching/old/cs344+386-s2017/resources/classnote-1.pdf) for modification in learning rate.
(https://page.mi.fu-berlin.de/rojas/neural/chapter/K4.pdf) Extra