FUNDAMENTALS OF COMPUTER

DATABASE FUNDAMENTALS

BASICS OF BIG DATA

Question [CLICK ON ANY CHOICE TO KNOW THE RIGHT ANSWER]
True or False?Resilient Distributed Datasets (RDDs) are fault-tolerant and immutable
A
True
B
False
C
Either A or B
D
None of the above
Explanation: 

Detailed explanation-1: -Resilient Distributed Dataset (RDD) is the fundamental data structure of Spark. They are immutable Distributed collections of objects of any type. As the name suggests is a Resilient (Fault-tolerant) records of data that resides on multiple nodes.

Detailed explanation-2: -At the core, an RDD is an immutable distributed collection of elements of your data, partitioned across nodes in your cluster that can be operated in parallel with a low-level API that offers transformations and actions.

Detailed explanation-3: -There are few reasons for keeping RDD immutable as follows: 1-Immutable data can be shared easily. 2-It can be created at any point of time. 3-Immutable data can easily live on memory as on disk.

Detailed explanation-4: -What Is a Resilient Distributed Dataset? A Resilient Distributed Dataset (RDD) is a low-level API and Spark’s underlying data abstraction. An RDD is a static set of items distributed across clusters to allow parallel processing. The data structure stores any Python, Java, Scala, or user-created object.

Detailed explanation-5: -Spark operates on data in fault-tolerant file systems like HDFS or S3. So all the RDDs generated from fault tolerant data is fault tolerant.

There is 1 question to complete.