Big Data Applications, Apache Spark, Workflow of MapReduce, Pig Latin Parser, HiveQL Data Definition

123456789101112131415
Across
  1. 3. Framework widely used for distributed storage and processing of big data.
  2. 4. The phase in MapReduce that combines and aggregates mapper output.
  3. 6. Data warehouse infrastructure that uses SQL-like queries for big data.
  4. 9. Component in Pig Latin responsible for syntax checking and producing logical plans.
  5. 11. Custom function written by users to extend Hive or Pig functionality.
  6. 12. Fundamental data structure in Apache Spark representing immutable distributed collections.
  7. 14. The first phase of MapReduce responsible for processing and filtering data.
Down
  1. 1. A complete execution process triggered in MapReduce by user input.
  2. 2. Fast in-memory data processing engine used as an alternative to MapReduce.
  3. 5. The Spark program component that schedules tasks and maintains metadata.
  4. 7. The Spark component responsible for executing tasks assigned by the driver.
  5. 8. Resource management layer used by Hadoop to run applications like Spark.
  6. 10. Structure that defines column names, types, and metadata in Hive tables.
  7. 13. Category of HiveQL commands used to create, alter, and drop tables.
  8. 15. High-level platform used for analyzing large datasets with a scripting language.