Transformers

12345678910
Across
  1. 2. encoding***Adds positional information to input embeddings
  2. 4. heads***Multiple attention mechanisms in parallel
  3. 7. output sequence
  4. 9. input sequence
  5. 10. layer***Maps input tokens to dense vectors
Down
  1. 1. for weighting input tokens
  2. 3. network***Neural network layer applied to each token independently
  3. 5. layers***Stacked layers of self-attention and feed-forward networks
  4. 6. dot-product attention***Common attention mechanism in transformers
  5. 8. key, value***Components of self-attention mechanism