Predictive analytics and other categories of advanced analytics are becoming a major factor in the analytics market. We evaluate the leading providers of advanced analytics platforms that are used to build solutions from scratch.
Archives For Big Data
Announcing the new Professional Master’s program in Big Data
The School of Computing Science at Simon Fraser University is offering a NEW Professional Masters program in Big Data. This four-semester, hands-on program will prepare you for an exciting and well-paid career as a data scientist.
This program is intended for students with some previous programming experience who wish to learn about the state-of-the-art in big data analysis.
Applications are NOW OPEN
Application deadline is April 1, 2014 for the Fall 2014 Cohort. We will keep admissions open until we fill available slots for the Fall 2014 cohort. Our goal is to notify students of acceptance to the Program by May 1st, 2014.
This interactive illustration represents how twitter’s employees interact among themselves. This stunning design was created by Santiago Ortiz. It shows vividly the pivotal employees in Twitter’s twittersphere. To view the interactive visualization, click here.
Have you ever wondered what a Twitter conversation looks like from 10,000 feet? A new report from the Pew Research Center, in association with the Social Media Research Foundation, provides an aerial view of the social media network. By analyzing many thousands of Twitter conversations, we identified six different conversational archetypes. Our infographic describes each type of conversation network and an explanation of how it is shaped by the topic being discussed and the people driving the conversation.
Conversations on Twitter create networks with identifiable contours as people reply to and mention one another in their tweets. These conversational structures differ, depending on the subject and the people driving the conversation. Six structures are regularly observed: divided, unified, fragmented, clustered, and inward and outward hub and spoke structures. These are created as individuals choose whom to reply to or mention in their Twitter messages and the structures tell a story about the nature of the conversation.
The graph represents a network of 176 Twitter users whose recent tweets contained “sunbelt14 OR sunbelt2014″, or who were replied to or mentioned in those tweets, taken from a data set limited to a maximum of 18,000 tweets. The network was obtained from Twitter on Sunday, 23 February 2014 at 16:55 UTC.
The tweets in the network were tweeted over the 7-day, 4-hour, 11-minute period from Sunday, 16 February 2014 at 12:38 UTC to Sunday, 23 February 2014 at 16:50 UTC.
There is an edge for each “replies-to” relationship in a tweet. There is an edge for each “mentions” relationship in a tweet. There is a self-loop edge for each tweet that is not a “replies-to” or “mentions”.
The graph is directed.
The graph’s vertices were grouped by cluster using the Clauset-Newman-Moore cluster algorithm.
The graph was laid out using the Harel-Koren Fast Multiscale layout algorithm.
The edge colors are based on edge weight values. The edge widths are based on edge weight values. The edge opacities are based on edge weig
In the big data talent wars, most companies feel they’re losing. Marketing leaders are finding it difficult to acquire the right analytical talent. In the latest CMO Survey, only 3.4% senior marketers believe they have the right talent. Business-to-business companies have a bigger gap than business-to-consumer companies, as do companies with a lower percentage of their sales coming from the internet. And yet analytic skill is a must for effective marketing.
Results indicate that companies with above-average marketing analytics talent experienced significantly greater rates of marketing return on investment (MROI) than companies with below average analytics talent (+4.18% vs. +2.51%). When it comes to profits, the same pattern emerged—companies that are above average on analytics talent experienced profitability increases of +4.69% compared to companies below average on analytics talent +2.71%. In short, while using any analytical skill truly is better than none, strong analytical skills are measurably better.
So how do you find those people? Given how tight the market for analytical talent is – and how critical it is to a business growth – companies have to adopt different strategies for hiring and keeping people. Some large companies have taken to acquiring start-ups or developing “research labs” jointly with academic institutions or organizations. But there are a range of tactics companies of any size can use to improve their analyst recruiting.
The first is simply using more specific language. At one top retailer, the analytics team was looking to fill a direct marketing measurement position but was not satisfied with the direct marketing experience in the CVs the recruiting team was sharing with them. So the analytics and recruiting teams came together to redefine the characteristics of the ideal candidate. This collaboration led to searching CVs for a more targeted set of keywords (not generic “measurement” skills but advanced “segmentation” and “predictive analytics” capabilities). The new approach led to the discovery of dozens of qualified candidates. Similarly, at General Mills, recruiters looking for senior marketing analytics managers found that using more precise and discerning language cut search times in half.
A second strategy is to use an “always on” approach to recruiting. As John Walthour, Director, Growth Insights & Analytics at General Mills, noted, “We know these positions will continue to be in demand at General Mills and so we no longer wait for a specific position to arise.” Still other employers search constantly in stealth mode for the best talent. For example, Beth Axelrod, SVP of Human Resources for eBay, works with companies such as Gild, which identifies prospective employees on the hard-science side of marketing analytics by examining the quality of their open code.
A third component is beefing up management’s analytical skill. We find that senior executives often don’t have a clear sense of what’s needed from the analysis and, therefore, don’t ask questions that lead to helpful answers. Senior managers need to be educated to understand the basics and be able to ask good questions, such as probing the quality of the statistics being used or asking about how to incorporate new types of data types.
Finally, in order to hire the best analysts, hiring managers may need to recognize that some softer business skills won’t come in the same person. Instead of holding out for the perfect total package, one banking company solved this issue by creating a mixed team of hard-core statisticians and marketers who together mined the data, analyzed the results, and developed marketing campaigns based on those results. After three months, the team was delivering better analytical insights, and both customer activity and revenues were nearly 10 times higher.
Whatever the strategy, however, acquiring the right array of marketing analytics talent is critical to turning big data into a powerful capability for companies.
The estimated growth of devices connected to the Internet is staggering. By 2020 Cisco estimates that 99% of devices (50 billion) will be connected to the Internet. In contrast, currently only around 1% is connected today. The sheer numbers as well as the complexity of new types of devices will be problematic. Although traditional computing devices such as personal computers, tablets and smartphones will increase, it is the Internet of Things (IoT) which will grow significantly, to around 26 billion units. That represents nearly a 30-fold increase according to Gartner.
The good thing about running SQL on Hadoop is that SQL is a declarative language, which means that you don’t need to know where the data is, you just have to ask for it and then the database works out how to get the information you need. However, unless you have a database optimiser the performance will suck.
Now there are various SQL initiatives around but probably the most advanced is Impala. And in version 1.2, which was introduced at the end of December, Cloudera introduced facilities to optimise join order but, while this is a step in the right direction, it hardly constitutes a full-blown optimiser.
However, a couple of related announcements have caught my eye this week. The first was that Calpont has changed its name to the name of its product InfiniDB, it has raised another round of funding and it has announced version 4.5 of its database with an Enterprise Management dashboard. None of which has much to do with Hadoop except that it reminded me that Calpont (as it then was) announced the availability of InfiniDB running on Hadoop back last year, along with an open source license. And, of course, InfiniDB has a grown-up optimiser.
Another product that has an adult optimiser is HP Vertica. And MapR has just announced an early access program (prior to general availability in March) for the HP Vertica Analytics Platform running on the MapR Hadoop distribution.
The truth is that you will get much better performance—orders of magnitude better—from either InfiniDB or Vertica than you will from Impala. So this poses three questions: firstly, will we see more vendors porting their warehouse products onto Hadoop (or HDFS); secondly, how quickly will Cloudera or HortonWorks (with its SQL implementation) be able to produce an optimiser than can compete reasonably well with these intruders into their market; and, thirdly, how much does this matter?
The answer to the first question is yes. I don’t who or when but this is the general trend, not just in data warehousing but across a variety of markets. The answer to the second question is not soon: it takes years to develop a good optimiser—probably not as many years as it used to, because there is plenty of experience out there, which was not the case historically—but still a significant period.
Thirdly, yes it matters. You may have to pay a license fee for HP Vertica (or not, in the case of InfiniDB) but the performance advantages you get from having a decent optimiser will mean that you need significantly less hardware in order to get comparable performance, and that should more than offset any such license fees. And that also explains why I expect more vendors to do the same thing as InfiniDB and Vertica, because there is a window of opportunity while Cloudera gets its optimiser up to speed.
Long marketed as a way to master huge quantities of data, Hadoop is now booming because its proponents have learned to sell it small.
DBpedia: A Nucleus for a Web of Open Data
DBpedia is a community eort to extract structured information
from Wikipedia and to make this information available on the Web.
DBpedia allows you to ask sophisticated queries against datasets derived
from Wikipedia and to link other datasets on the Web to Wikipedia
data. We describe the extraction of the DBpedia datasets, and how the
resulting information is published on the Web for human- and machineconsumption.
We describe some emerging applications from the DBpedia
community and show how website authors can facilitate DBpedia content
within their sites. Finally, we present the current status of interlinking
DBpedia with other open datasets on the Web and outline how DBpedia
could serve as a nucleus for an emerging Web of open data.
UD professor presented award for natural language processing work
The Association for Mathematics of Language (SIGMOL) honored the University of Delaware’s Vijay K. Shanker with the inaugural S.Y. Kuroda Prize on Jan. 7 for his work in natural language processing.
The award is given for long-lasting advancements in mathematical linguistics. Shanker, professor in the Department of Computer and Information Sciences, was selected for his work on the convergence of mildly context-sensitive (MCS) formalisms.
In this infographic, datascience@berkeley has collected some real-life examples to help explain the scope of data. We’ve also provided a timeline of hard drive innovation and a glimpse at where the data storage industry is heading. Feel free to share, because after all…Data Size Matters.