BackPropagation

3. What is the term for the rate that controls the step size in gradient descent?
6. What type of learning uses labeled data?
8. Which activation function outputs values between 0 and 1?
9. What does backpropagation compute in a neural network?
11. What happens when a model memorizes training data instead of generalizing?
12. What does a neural network minimize during training?

1. Which activation function is commonly used to avoid the vanishing gradient problem?
2. What problem occurs when gradients become too small in deep networks?
4. What does gradient descent update in a neural network?
5. Which mathematical rule is used in backpropagation to compute gradients?
7. What is one complete pass through the entire dataset called?
10. What is the term for the additional parameter that shifts the activation function?