Across
- 5. mechanism to focus on parts of input (Transformer core)
- 8. Rectified Linear Unit activation function
- 9. algorithm (e.g. Adam) that updates model weights
- 10. using a trained model to predict on new data
- 12. basic processing unit in a neural network
- 13. trained function mapping input → output
- 14. one full pass over the training dataset
Down
- 1. adapt a pretrained model on new data
- 2. reduce overfitting by adding penalty/constraints
- 3. gradient method used to train neural networks
- 4. activation that produces a probability distribution
- 6. to fit a model to data
- 7. an attention-based neural-network architecture
- 11. dense vector representation of discrete items
