You run gradient descent for 15 iterations with a=0.3 and compute J(theta) after each iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based on this, which of the following conclusions seems most plausible?

MACHINE LEARNING

Please wait while the activity loads.
If this activity does not load, try refreshing your browser. Also, this page requires javascript. Please visit using a browser with javascript enabled.

If loading fails, click here to try again

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]

A	Rather than using the current value of a, use a larger value of a (say a=1.0)
B	Rather than using the current value of a, use a smaller value of a (say a=0.1)
C	a =0.3 is an effective choice of learning rate
D	None of the above

Explanation:

Detailed explanation-1: -Gradient Descent Algorithm The algorithm starts with some value of m and c (usually starts with m=0, c=0). We calculate MSE (cost) at point m=0, c=0. Let say the MSE (cost) at m=0, c=0 is 100. Then we reduce the value of m and c by some amount (Learning Step).

Detailed explanation-2: -Gradient descent subtracts the step size from the current value of intercept to get the new value of intercept. This step size is calculated by multiplying the derivative which is-5.7 here to a small number called the learning rate. Usually, we take the value of the learning rate to be 0.1, 0.01 or 0.001.

Detailed explanation-3: -Gradient descent is used to minimize a cost function J(W) parameterized by a model parameters W. The gradient (or derivative) tells us the incline or slope of the cost function. Hence, to minimize the cost function, we move in the direction opposite to the gradient.

There is 1 question to complete.

MCQ IN COMPUTER SCIENCE & ENGINEERING

COMPUTER SCIENCE AND ENGINEERING

MACHINE LEARNING