article 1.4.5 Multimodal AIs

1234567891011121314151617181920
Across
  1. 2. A non text modality used by image recognition and multimodal models
  2. 4. Task where text in images can be converted to another language
  3. 9. Type of learning used with human feedback to improve behavior
  4. 11. Model used to generate images from text prompts
  5. 12. A modality involving moving visual information mentioned in the article
  6. 15. Spoken sounds that multimodal models can learn to understand
  7. 17. The underlying neural network architecture used by LLMs and LMMs
  8. 18. Large language models mentioned as state of the art AI systems
  9. 19. OpenAIs multimodal model that handles text and images
  10. 20. The first step where models learn from massive datasets
Down
  1. 1. Model used by ChatGPT to parse audio inputs
  2. 3. Anthropic model claimed to have strong vision capabilities
  3. 5. Unhealthy ideas models may learn from internet scale data
  4. 6. Capable across multiple kinds of data instead of just one
  5. 7. The type of model large language models are based on
  6. 8. Googles multimodal AI models described as natively multimodal
  7. 10. Acronym for reinforcement learning with human feedback
  8. 13. Visual data that multimodal models can analyze and answer questions about
  9. 14. Different kinds of data such as text images audio or video
  10. 16. The main modality traditional large language models work with