FUNDAMENTALS OF COMPUTER

DATABASE FUNDAMENTALS

BASICS OF BIG DATA

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
The number of maps in MapReduce is usually driven by the total size of ____
A
Inputs
B
Outputs
C
Tasks
D
None of the mentioned
Explanation: 

Detailed explanation-1: -The number of maps is usually driven by the total size of the inputs, that is, the total number of blocks of the input files. The right level of parallelism for maps seems to be around 10-100 maps per-node, although it has been set up to 300 maps for very cpu-light map tasks.

Detailed explanation-2: -The number of maps is usually driven by the total size of the inputs; that is, the total number of blocks of the input files. The optimal level of parallelism for maps seems to be around 10-100 maps per host, although it can been set up to 300 maps for very CPU-light map tasks.

Detailed explanation-3: -The number of map tasks for a given job is driven by the number of input split. For each input split or HDFS blocks a map task is created. So, over the lifetime of a map-reduce job the number of map tasks is equal to the number of input splits.

Detailed explanation-4: -(2) No. of Mappers per MapReduce job:The number of mappers depends on the amount of InputSplit generated by trong>InputFormat (getInputSplits method). If you have 640MB file and Data Block size is 128 MB then we need to run 5 Mappers per MapReduce job.

Detailed explanation-5: -Explanation: Total size of inputs means the total number of blocks of the input files.

There is 1 question to complete.