What are the advantages of using Apache Spark?


– It is comparible with Hadoop
– It provides ease of development
– It is fast
– It provides multiple language support
– it has a unified stack

Big Data Ecosystem and Snapshots


When your selecting Big Data Ecosystem for your mission critical application, make sure you chose the correct product which can allow you to take take real time snapshots.  If your Big Data Ecosystem not allowing you to take real time snapshots, you need to consider higher cost for backup.

Following Video and link may help you to understand how Snapshots work for different Big Data Ecosystems

 

Big Data Ecosystems Snapshot and how it works: MapR vs HDFS Snapshots vs HBase Snapshots

bigdata_snapshots_hdfs_hbase_mapr
Big Data Ecosystems Snapshot and how it works: MapR vs HDFS Snapshots vs HBase Snapshots

 

Managing HDFS Snapshots Using Cloudera Manager


Browsing HDFS Directories

To browse the HDFS directories to view snapshot activity:

  1. From the Clusters tab, select your CDH 5 HDFS service.
  2. Go to the File Browser tab.

Enabling an HDFS Directory for Snapshots

  1. From the Clusters tab, select your CDH 5 HDFS service.
  2. Go to the File Browser tab.
  3. Navigate to the directory you want to enable for snapshots.
  4. In the File Browser, click the drop-down menu next to the full file path and select Enable Snapshots:

    EnableSnapshots-Cloudera

EnableShanpshotsForHDFSDirectory-Cloudera

 

Taking Snapshots

  1. From the Clusters tab, select your CDH 5 HDFS service.
  2. Go to the File Browser tab.
  3. Navigate to the directory with the snapshot you want to restore.
  4. Click the drop-down menu next to the full path name and select Take Snapshot.

    The Take Snapshot screen displays.

  5. Enter a name for the snapshot.
  6. Click OK.

    The Take Snapshot button is present, enabling an immediate snapshot of the directory.

  7. To take a snapshot, click Take Snapshot, specify the name of the snapshot, and click Take Snapshot. The snapshot is added to the snapshot list.

    Any snapshots that have been taken are listed by the time at which they were taken, along with their names and a menu button.

 

TakeSnapShot-Cloudera

 

TakeSnapShot-SnapShot-Name-CloudEra

CreatingSnapShotUsingCloudera

 

Following can negatively impact hadoop cluster performance


1. Nodes with disk of different sizes
2. Nodes with disk of different speeds

Data Recovery File System for Hadoop Cluster


Solution is based on what if Hadoop cluster is suddenly unavailable when Name Node is terminated.
The solution to increases the performance and decreases the time of delay, it will solve automated fail over problem as well as increases reliability of Hadoop.

 

Data Recovery File System for Hadoop Cluster
Data Recovery File System for Hadoop Cluster