Introduction to Hadoop Security


A Hands-on Approach to Securing Big Data Clusters

In this Introduction to Hadoop Security training course, expert author Jeff Bean will teach you how to use Hadoop to secure big data clusters. This course is designed for users that are already familiar with the basics of Hadoop.

You will start by learning about tooling, then jump into learning about Hadoop insecurities. From there, Jeff will teach you about authentication with MIT Kerberos, authentication with Active Directory, and authorization. This video tutorial also covers encryption, developer topics, and administrator topics. Finally, you will learn about secure Hadoop topics.

Once you have completed this computer based training course, you will have learned everything you need to know to secure big data clusters with Hadoop.

 Watch the Video: http://shop.oreilly.com/product/0636920044291.do

Big Data Ecosystem and Snapshots


When your selecting Big Data Ecosystem for your mission critical application, make sure you chose the correct product which can allow you to take take real time snapshots.  If your Big Data Ecosystem not allowing you to take real time snapshots, you need to consider higher cost for backup.

Following Video and link may help you to understand how Snapshots work for different Big Data Ecosystems

 

Big Data Ecosystems Snapshot and how it works: MapR vs HDFS Snapshots vs HBase Snapshots

bigdata_snapshots_hdfs_hbase_mapr
Big Data Ecosystems Snapshot and how it works: MapR vs HDFS Snapshots vs HBase Snapshots

 

Use Toad to Access Hadoop Ecosystem


Source: https://weidongzhou.wordpress.com/ (Excellent Blog about how to access Hadoop using Toad)
Sometime back I wrote a blog about Use SQL Developer to Access Hive Table on Hadoop. Recently I noticed another similar product: Toad for Hadoop. So I decided to give a try.
Like many people, I like Toad products in general and use Toad in many of my projects. Toad for Hadoop is a new product in the Toad family. The current version is Toad for Hadoop 1.3.1 Beta on Windows platform only. The software supports Cloudera CDH 5.x and HortonworksData Platform 2.3. The software is free for now. But You need to create an account with Dell before you can download the zip file. The entire process of installation and configuration are pretty simple and straight forward. Here are the steps:

Download the zip files
Go to Toad for Hadoop. Click Download button. The zip file is 555 MB in size.

Installation
I installed the software in my Window VM. Just double clickToadHadoop_1.3.1_Beta_x64.exe file and take the default values for all of installation screens. At the end of installation, it will open the software automatically.

Configuration
Unlike so many buttons in the regular Toad software, this one looks quite simple.
Toad_config_1
Click the dropdown box on the right of Ecosystem box, then click Add New Ecosystem. TheSelect your Hadoop Cluster setup screen shows up as follows.
Toad_config_2
Input the name you want for this connection. For this one, I configured the connection for our X3 Big Data Appliance (BDA) full rack cluster with 18 nodes. So I input the Name as Enk-X3-DBA. For Detection Method, you can see it support Cloudera CDH via Cloudera Manageror Hortonworks HDP via Ambari. For this one, I chose CDH managed by Cloudera Manager for Detection Method.

Next screen is to Enter your Cloudera Manager credentials. Use the same url and port number that you access your Cludera Manager for Server Address. The user name is the user name you access Cludera Manager. Make sure you create your user directory on HDFS before you run the installation of the software, for example, create a folder /user/zhouw and change the permission to zhouw user for read/write access. Otherwise you will see permission exception later on.
Toad_config_3
Next screen shows Autodetection. It does many checks and validations and you should see the successful status for all of them.
Toad_config_4
Next one shows Ecosystem Configuration. In this screen, I just input zhouw for User Name. Then click Activate button. There is a bug in this version. Sometimes both Activateand Cancel buttons disappear. The workaround is just to close and restart the software.
Toad_config_5

SQL Screen
The most frequently used screen is SQL Screen. You can run the SQLs against either Hive orImpala engine.
Toad_SQL_1

 

Source: https://weidongzhou.wordpress.com/ (Excellent Blog about how to access Hadoop using Toad)

Managing HDFS Snapshots Using Cloudera Manager


Browsing HDFS Directories

To browse the HDFS directories to view snapshot activity:

  1. From the Clusters tab, select your CDH 5 HDFS service.
  2. Go to the File Browser tab.

Enabling an HDFS Directory for Snapshots

  1. From the Clusters tab, select your CDH 5 HDFS service.
  2. Go to the File Browser tab.
  3. Navigate to the directory you want to enable for snapshots.
  4. In the File Browser, click the drop-down menu next to the full file path and select Enable Snapshots:

    EnableSnapshots-Cloudera

EnableShanpshotsForHDFSDirectory-Cloudera

 

Taking Snapshots

  1. From the Clusters tab, select your CDH 5 HDFS service.
  2. Go to the File Browser tab.
  3. Navigate to the directory with the snapshot you want to restore.
  4. Click the drop-down menu next to the full path name and select Take Snapshot.

    The Take Snapshot screen displays.

  5. Enter a name for the snapshot.
  6. Click OK.

    The Take Snapshot button is present, enabling an immediate snapshot of the directory.

  7. To take a snapshot, click Take Snapshot, specify the name of the snapshot, and click Take Snapshot. The snapshot is added to the snapshot list.

    Any snapshots that have been taken are listed by the time at which they were taken, along with their names and a menu button.

 

TakeSnapShot-Cloudera

 

TakeSnapShot-SnapShot-Name-CloudEra

CreatingSnapShotUsingCloudera

 

What makes MapR superior to other Hadoop distributions


Why I prefer MapR

  • File system metadata is distributed (think of it in terms of many mini name nodes). No central name node is needed. This eliminates name node bottlenecks.
  • MapR-FS is written in C. No JVM garbage collection choking.
  • NFS mount. You can mount the MapR-FS locally and read directly from it or write directly to it.
  • MapR-FS implements POSIX. There is no need to learn any new commands. Your Linux administrator can apply existing knowledge to navigate the file system. You can view the content on MapR-FS using standard Unix commands, e.g. to view the contents of a file on MapR-FS you can just use tail <file_name>.
  • While MapR-FS is proprietary it is compatible with the Hadoop API. You don’t have to rewrite your applications if you want to migrate to MapR. hadoop fs -ls /user on MapR-FS works the same as ls /user.
  • You can directly load the data into the file system. No need to set down the data on the local file system first. Guess what? Using NFS mounts there is no distinction between MapR-FS and the local filesystem. MapR-FS in a way is the local filesystem. No additional tools such as Flume etc. are needed to ingest data.
  • True and consistent snapshots. Run point in time queries against your snapshots.