Big Data Ecosystem and Snapshots


When your selecting Big Data Ecosystem for your mission critical application, make sure you chose the correct product which can allow you to take take real time snapshots.  If your Big Data Ecosystem not allowing you to take real time snapshots, you need to consider higher cost for backup.

Following Video and link may help you to understand how Snapshots work for different Big Data Ecosystems

 

Big Data Ecosystems Snapshot and how it works: MapR vs HDFS Snapshots vs HBase Snapshots

bigdata_snapshots_hdfs_hbase_mapr
Big Data Ecosystems Snapshot and how it works: MapR vs HDFS Snapshots vs HBase Snapshots

 

What makes MapR superior to other Hadoop distributions


Why I prefer MapR

  • File system metadata is distributed (think of it in terms of many mini name nodes). No central name node is needed. This eliminates name node bottlenecks.
  • MapR-FS is written in C. No JVM garbage collection choking.
  • NFS mount. You can mount the MapR-FS locally and read directly from it or write directly to it.
  • MapR-FS implements POSIX. There is no need to learn any new commands. Your Linux administrator can apply existing knowledge to navigate the file system. You can view the content on MapR-FS using standard Unix commands, e.g. to view the contents of a file on MapR-FS you can just use tail <file_name>.
  • While MapR-FS is proprietary it is compatible with the Hadoop API. You don’t have to rewrite your applications if you want to migrate to MapR. hadoop fs -ls /user on MapR-FS works the same as ls /user.
  • You can directly load the data into the file system. No need to set down the data on the local file system first. Guess what? Using NFS mounts there is no distinction between MapR-FS and the local filesystem. MapR-FS in a way is the local filesystem. No additional tools such as Flume etc. are needed to ingest data.
  • True and consistent snapshots. Run point in time queries against your snapshots.