Across
- 1. (of data) structured, unstructured, and semistructured data that is gathered from multiple sources
- 3. a query language in Apache Hive for processing and analyzing structured data
- 7. (data) data that were collected in the past, usually for a purpose other than research
- 9. provides a complete record of the information resources maintained by an organisation.
- 11. a term used to describe cloud-based software tools used for working with data, such as managing data in a data warehouse or analyzing data with business intelligence.
- 13. delay before a transfer of data begins following an instruction for its transfer.
- 14. (computing) a system for connecting a large number of computer nodes into a distributed architecture that delivers the compute resources necessary to solve complex problems.
- 15. the proportion of visitors to a web page who follow a hypertext link to a particular site
- 17. a process that allocates system resources to control the execution of unattended background programs.
- 19. (analytics) the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data.
- 21. (processing) the technique of linking together multiple computer servers over a network into a cluster, to share data and to coordinate processing power
- 24. (processing) a method of running high-volume, repetitive data jobs
Down
- 1. (of data) the speed at which data is entered into a system and must be processed
- 2. (hardware) a device or device component that is relatively inexpensive, widely available and more or less interchangeable with other hardware of its type.
- 4. the resource management and job scheduling technology in the open source Hadoop distributed processing framework
- 5. a set of computers that work together so that they can be viewed as a single system
- 6. (database) database that arranges data elements in vertical columns and horizontal rows.
- 8. (data) data that fits a predefined model or format.
- 10. a distributed file system that handles large data sets running on commodity hardware
- 12. database computer language designed for the retrieval and management of data in a relational database
- 16. (data) information that either does not have a pre-defined data model or is not organized in a pre-defined manner
- 18. (of data) how reliable and significant the data really is
- 20. (database) type of database that stores and provides access to data points that are related to one another
- 21. (data) any data that are essentially not alike, or are distinctly different in kind, quality, or character. they are unequal and cannot be readily integrated to meet the business information demand.
- 22. (also cleansing) the process of fixing incorrect, incomplete, duplicate or otherwise erroneous data in a data set
- 23. a collection of open-source software utilities that facilitates using a network of many computers to solve problems involving massive amounts of data and computation
