MCQ IN COMPUTER SCIENCE & ENGINEERING

COMPUTER SCIENCE AND ENGINEERING

CLOUD COMPUTING

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
The number of maps is usually driven by the total size of ____
A
inputs
B
outputs
C
tasks
D
None of the mentioned
Explanation: 

Detailed explanation-1: -The number of maps is usually driven by the total size of the inputs; that is, the total number of blocks of the input files. The optimal level of parallelism for maps seems to be around 10-100 maps per host, although it can been set up to 300 maps for very CPU-light map tasks.

Detailed explanation-2: -The number of map tasks depends upon the input file and its format. Typically, a file in a Hadoop cluster is broken down into blocks, each with a default size of 128 MB. Depending upon the size, the input file is split into multiple chunks. A map task then runs for each chunk.

Detailed explanation-3: -Explanation: Total size of inputs means the total number of blocks of the input files.

Detailed explanation-4: -MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage. Map stage − The map or mapper’s job is to process the input data.

Detailed explanation-5: -The number of Mappers for a MapReduce job is driven by number of input splits. And input splits are dependent upon the Block size. For eg If we have 500MB of data and 128MB is the block size in hdfs, then approximately the number of mapper will be equal to 4 mappers.

There is 1 question to complete.