This set of MCQs helps students to learn about MapReduce which is a programming paradigm that enables massive scalability across hundreds or thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the heart of Apache Hadoop. The term "MapReduce" refers to two separate and distinct tasks that Hadoop programs perform.
A ________ node acts as the Slave and is responsible for executing a Task assigned to it by the JobTracker.
___________ part of the MapReduce is responsible for processing one or more chunks of data and producing the output results.
_________ function is responsible for consolidating the results produced by each of the Map() functions/tasks.
________ is a utility which allows users to create and run jobs with any executables as the mapper and/or the reducer.
__________ maps input key/value pairs to a set of intermediate key/value pairs.
_________ is the default Partitioner for partitioning key space.
Running a ___________ program involves running mapping tasks on many or all of the nodes in our cluster.
The Hadoop MapReduce framework spawns one map task for each __________ generated by the InputFormat for the job.
Users can control which keys (and hence records) go to which Reducer by implementing a custom?
Applications can use the ____________ to report progress and set application-level status messages.
The right level of parallelism for maps seems to be around _________ maps per-node.
The Mapper implementation processes one line at a time via _________ method.
Map
Reduce
Mapper
Reducer
The number of reduces for the job is set by the user via _________.
JobConf.setNumTasks(int)
JobConf.setNumReduceTasks(int)
JobConf.setNumMapTasks(int)
All of the mentioned
The framework groups Reducer inputs by key in _________ stage.
Sort
Shuffle
Reduce
None of the mentioned