Across
- 2. encoding***Adds positional information to input embeddings
- 4. heads***Multiple attention mechanisms in parallel
- 7. output sequence
- 9. input sequence
- 10. layer***Maps input tokens to dense vectors
Down
- 1. for weighting input tokens
- 3. network***Neural network layer applied to each token independently
- 5. layers***Stacked layers of self-attention and feed-forward networks
- 6. dot-product attention***Common attention mechanism in transformers
- 8. key, value***Components of self-attention mechanism
