Crossword

12345678910
Across
  1. 2. Character-level tokenization
  2. 5. Byte Pair Encoding
  3. 7. Handling various data types
  4. 9. Splits text into tokens
  5. 10. Large Language Models
Down
  1. 1. Vocabulary size
  2. 3. Set of known tokens
  3. 4. End-of-text, start-of-message
  4. 6. Numerical representations of text
  5. 8. Efficient token compression