FUNDAMENTALS OF COMPUTER

DATABASE FUNDAMENTALS

BASICS OF BIG DATA

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
Apache Spark is considerd ____ than Hadoop
A
Slower
B
Faster
C
Either A or B
D
None of the above
Explanation: 

Detailed explanation-1: -Apache Spark is very much popular for its speed. It runs 100 times faster in memory and ten times faster on disk than Hadoop MapReduce since it processes data in memory (RAM). At the same time, Hadoop MapReduce has to persist data back to the disk after every Map or Reduce action.

Detailed explanation-2: -It can process real-time data. Hadoop’s MapReduce model reads and writes from a disk, thus it slows down the processing speed. Spark reduces the number of read/write cycles to disk and store intermediate data in memory, hence faster-processing speed. It is high latency computing framework.

Detailed explanation-3: -Spark has its machine learning library called MLib, whereas Hadoop must be interfaced with an external machine learning library, for example, Apache Mahout. As Spark is faster than Hadoop, it is well capable of handling advanced analytics operations like real-time data processing when compared to Hadoop.

Detailed explanation-4: -This is because spark reduces the amount of disc read/writes cycles by storing intermediate data in memory, resulting in faster processing speed. Hence, Spark processes data 100 times faster in memory than Hadoop.

Detailed explanation-5: -Various reasons of Spark being Faster than MapReduce : Spark uses RAM to storing intermediate data during processing while MapReduce uses Disk to store intermediate data. Spark uses underlying hardware cache very efficiently.

There is 1 question to complete.