Running a Big Data Infrastructure

March 21, 2014 — Leave a comment

Running a Big Data Infrastructure: Five Areas That Need Your Attention
Download Free book (Don’t need to signup)


1. Elasticity
Scalability generally refers to the ability of a technology to support increasing demand. Typically, in order to scale infrastructure, more
machines are added. Elasticity refers to the ability of a technology to flex up or down depending on demand at any given time. Working on an
elastic substrate allows a company to utilize one machine for lots of operations, not just Big Data, without affecting the budget.

2. Reliability
Business users count on the availability of data at regular intervals to efficiently run the company. Reliability begins with how effectively data
enters the infrastructure and ends with predictable data delivery. Companies, such as Facebook, utilize internal service level agreements
(SLAs) to ensure data availability across teams.

3. Self-Service Tools

Business analysts require data to perform their job functions, yet without a deep technical skill
set, quickly and easily accessing the required data can be practically impossible. In the past,
companies resolved this issue by hiring teams to act as liaisons between the data engineers and
business users. Today, self-service tools provide ready access to data through a user-friendly
interface. Productivity is increased across the board making it possible for more people in the
organization to make data-driven decisions and making it possible for data teams to support
larger and larger numbers of business users.

4. Monitoring

Active monitoring of the infrastructure minimizes issues and maximizes availability. How much
monitoring? The more, the better. With the introduction of self-service tools, more users relying on the data will be impacted by
inefficiencies or other concerns. Monitoring provides insight into how the system is operating and quickly identifies minor glitches that can be
corrected before they develop into
major problems.

5. Open Source

Open source technology evolves rapidly, and new features, such as faster queries, could result
in increased productivity, improve cluster availability, or even competitive advantage in the
marketplace. Data engineers must stay on top of the latest versions to ensure that the
infrastructure runs at peak performance. Working with a knowledgeable vendor can help data
engineers mitigate the risk of missing even one update.

 

No Comments

Be the first to start the conversation!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s