DATABASE FUNDAMENTALS
CLOUD COMPUTING AND DATABASES
Question
[CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
|
|
inputs
|
|
outputs
|
|
tasks
|
|
None of the mentioned
|
Detailed explanation-1: -The number of maps is usually driven by the total size of the inputs; that is, the total number of blocks of the input files. The optimal level of parallelism for maps seems to be around 10-100 maps per host, although it can been set up to 300 maps for very CPU-light map tasks.
Detailed explanation-2: -The number of map tasks for a given job is driven by the number of input split. For each input split or HDFS blocks a map task is created. So, over the lifetime of a map-reduce job the number of map tasks is equal to the number of input splits.
Detailed explanation-3: -Explanation: Total size of inputs means the total number of blocks of the input files.
Detailed explanation-4: -MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage. Map stage − The map or mapper’s job is to process the input data.
Detailed explanation-5: -The number of Mappers for a MapReduce job is driven by number of input splits. And input splits are dependent upon the Block size. For eg If we have 500MB of data and 128MB is the block size in hdfs, then approximately the number of mapper will be equal to 4 mappers.