Transformers

Edit Answers

Across

2. encoding***Adds positional information to input embeddings
4. heads***Multiple attention mechanisms in parallel
7. output sequence
9. input sequence
10. layer***Maps input tokens to dense vectors

Down

1. for weighting input tokens
3. network***Neural network layer applied to each token independently
5. layers***Stacked layers of self-attention and feed-forward networks
6. dot-product attention***Common attention mechanism in transformers
8. key, value***Components of self-attention mechanism

URL

https://crosswordlabs.com/view/transformers-78

Copy Share

Embed Code (Help)

<iframe width="500" height="500" style="border:3px solid black; margin:auto; display:block" frameborder="0" src="https://crosswordlabs.com/embed/transformers-78"></iframe>

Copy

Copy/Print/Export

Border Width

Number Font Size Box Size