Across
- 3. What is the term for the rate that controls the step size in gradient descent?
- 6. What type of learning uses labeled data?
- 8. Which activation function outputs values between 0 and 1?
- 9. What does backpropagation compute in a neural network?
- 11. What happens when a model memorizes training data instead of generalizing?
- 12. What does a neural network minimize during training?
Down
- 1. Which activation function is commonly used to avoid the vanishing gradient problem?
- 2. What problem occurs when gradients become too small in deep networks?
- 4. What does gradient descent update in a neural network?
- 5. Which mathematical rule is used in backpropagation to compute gradients?
- 7. What is one complete pass through the entire dataset called?
- 10. What is the term for the additional parameter that shifts the activation function?
