Self-service BI with Pig, Impala and PowerBI


Source: http://baboonit.be/blog/self-service-bi-with-pig-impala-and-powerbi

Visualisations in PowerBI

Here are some charts that we generated with PowerBI. The nice thing is that you can drill down on any bar. This is ideal for exploring a dataset.

 

You can also easily build an animated chart. In the following example, the delays per airport are shown on a scatter chart, where the total delay is plotted against the likelihood of having a delay. If you ‘play’ the chart, you can see the evolution of the delays on a day-to-day basis. From this animation, it’s clear that Saturday is your best bet if you really don’t like delays.

 

Like any self-respecting BI tool, PowerBI also offers a Map chart. We’ve experimented with it and we’ve got some beautiful results already.

 

As I mentioned before, the search feature is also very powerful. For example:

Doing BI becomes as simple as doing a Google search. Well, I guess Microsoft calls it a Bing search, but anyways…

The only thing I really miss with PowerBI are “live queries”. PowerBI retrieves all data you need from the source, and does all calculations on your machine. This doesn’t work well with Big Data. For one, you’re moving your data around, not your processing, That’s a bad smell. You’re limited to the amount of memory and processing power of your machine. You’ve lost all advantages of a distributed SQL database or a Hadoop platform. Also, downloading millions and millions of rows puts a heavy load on the network, and it will take a while before you can fire your first query. Typically, you can download only a subset of your data. That obviously restricts you in so many ways.

Tableau does offer those live queries. What it means, is that it doesn’t try to retrieve the entire dataset. In stead, it fires the right SQL query to the database, and only returns the results. You can take full advantage of your powerful cluster, you’re not congesting the network, and you can start querying your dataset immediately. I hope this will be possible in future versions of PowerBI as well.

Detailed tutorial from Hortonworks

Hortonworks have done pretty much the same thing (obviously minus Impala), and have put together a very detailed tutorial about it. Well worth the read if you want to try this yourself: http://hortonworks.com/hadoop-tutorial/partner-tutorial-microsoft/

SQL to PIG Cheat Sheet


Comparing query from SQL to PIG

SQL to PIG Cheat Sheet
Get Complete list from: http://mortar-public-site-content.s3-website-us-east-1.amazonaws.com/Mortar-Pig-Cheat-Sheet.pdf

Pig   Cheat Sheet
Pig   Cheat Sheet