Predictive analytics and other categories of advanced analytics are becoming a major factor in the analytics market. We evaluate the leading providers of advanced analytics platforms that are used to build solutions from scratch.
Archives For Big Data
Announcing the new Professional Master’s program in Big Data
The School of Computing Science at Simon Fraser University is offering a NEW Professional Masters program in Big Data. This four-semester, hands-on program will prepare you for an exciting and well-paid career as a data scientist.
This program is intended for students with some previous programming experience who wish to learn about the state-of-the-art in big data analysis.
Applications are NOW OPEN
Application deadline is April 1, 2014 for the Fall 2014 Cohort. We will keep admissions open until we fill available slots for the Fall 2014 cohort. Our goal is to notify students of acceptance to the Program by May 1st, 2014.
This interactive illustration represents how twitter’s employees interact among themselves. This stunning design was created by Santiago Ortiz. It shows vividly the pivotal employees in Twitter’s twittersphere. To view the interactive visualization, click here.
- Machine Learning Tools
BigML is a cloud-based machine learning platform that allows users to create visual predictive models using raw data and structured datasets. Last month, BigML announced the availability of the 2014 winter release, which includes features that boost predictive modeling. The company also introduced a new paradigm called Programmatic Machine Learning that is the “ability to programmatically transform a dataset via a high-level language and a cloud-based API together.”
The BigML API makes it possible for developers to build applications that incorporate predictive models and near real-time predictions.
Datumbox is a machine learning platform that focuses on natural language processing (NLP). The Datumbox platform features a variety of functions including sentiment analysis, Twitter sentiment analysis, language detection, educational detection and keyword extraction.
Diffbot uses computer vision, machine learning and other technologies to extract text, images, links, HTML attributes and other elements from Web pages. In August 2013, the company released the Diffbot Product API, which can extract product information from the pages of e-commerce websites. Earlier this month, ProgrammableWeb reported on the release of 35+ new Diffbot client libraries in a variety of programming languages.
The company provides a suite of Diffbot APIs for extracting data from Web page news articles, Web site home pages, e-commerce product pages and other types of Web pages. There are also APIs for extracting Web page images and automatically classifying Web page links.
Ersatz Labs is a startup and developer of a new platform called Ersatz, described by the company as “the first cloud-based neural network platform.” The Ersatz platform allows developers to build applications that utilize deep neural networks without the need to have extensive knowledge in machine learning.
There is an API that can be accessed via HTTP, and a client library in Python is also available, so Ersatz can be easily integrated with Web, mobile and desktop applications. Ersatz is currently in private beta, and developers interested in participating can request an invitation on the official company Web site.
Google Prediction API
The Google Prediction API provides developers access to Google’s cloud-based machine learning platform and pattern-matching functions. The API is used in conjunction with the Google Cloud Storage API and allows developers to incorporate functions into their apps such as sentiment analysis, spam detection, message routing decisions, suspicious activity identification and more.
IBM Watson is a machine learning platform that focuses on NLP, hypothesis generation and evidence-based learning. In November 2013, ProgrammableWeb reported that IBM had launched the Watson Developer Cloud, a cloud-based marketplace that provides access to APIs, documentation, self-service training materials and other tools for developers to build IBM Watson-powered applications.
Last month, IBM announced that the company will invest more than $1 billion in the new Watson Group, which will be based in New York City’s “Silicon Alley.” The new group will focus on developing and promoting the IBM Watson platform and cognitive technologies. IBM also announced new Watson cognitive intelligence-based services, including IBM Watson Discovery Advisor, IBM Watson Analytics and IBM Watson Explorer.
Logical Glue is a machine learning as-a-service (MLaaS) platform that features predictive model building, predictive model real-time deployment, and real-time predictive analytics. The platform is designed to predict customer behavior for many types of markets, particularly financial lending, insurance and marketing.
The Logical Glue platform is currently in private beta; however, companies can apply to participate in the beta program, which allows them access to the platform prerelease. The next release of the platform will include the Logical Glue prediction API.
Parse.ly is a predictive content optimization and analytics platform designed for blogs, news sites and other online publishers. The home page of the Parse.ly website describes the company as “The Content Performance Authority” and the platform provides users a real-time view of article traffic based on individual posts, authors, sections and referrers. The Parse.ly platform also provides views of content metrics, social network shares, site activity and other analytics.
The Parse.ly API allows developers to programmatically access platform features such as analytics, shares, referrers, real-time, search and recommendations. There are also mobile SDKs available that can be integrated into third-party apps so reader activity can be tracked.
PredictionIO is a machine learning server that allows developers to add predictive features to software, web and mobile applications. PredictionIO is open source and can be installed on a stand-alone server. There is also a cloud version available on Amazon EC2/Amazon EBS.
The PredictionIO API enables applications to collect and manage app data and add predictive features such as predict user preferences, personalized content, content discovery, content recommendations and more. ProgrammableWeb recently published an interview with Simon Chan (cofounder and CEO of PredictionIO), which covers PredictionIO features, compares other machine learning APIs and more.
SwiftKey is a developer of touchscreen keyboard applications and word prediction technology. SwiftKey’s products Keyboard, Flow and Note incorporate machine learning and SwiftKey’s language technology, available to developers via API and SDK.
A recent TechCrunch article featured SwiftKey’s word prediction technology. Nathan Matias, a PhD student at the MIT Media Lab, used SwiftKey technology to create a sonnet essentially co-authored by Shakespeare and generated entirely from the SwiftKey next word suggestions.
GraphLab Create is a Python package that enables developers and data scientists to apply machine learning to build state of the art data products. Get started fast with our fully customizable GraphLab DataApps. GraphLab Create is fast, scalable and makes it easy to deploy your apps to the Cloud.
The good thing about running SQL on Hadoop is that SQL is a declarative language, which means that you don’t need to know where the data is, you just have to ask for it and then the database works out how to get the information you need. However, unless you have a database optimiser the performance will suck.
Now there are various SQL initiatives around but probably the most advanced is Impala. And in version 1.2, which was introduced at the end of December, Cloudera introduced facilities to optimise join order but, while this is a step in the right direction, it hardly constitutes a full-blown optimiser.
However, a couple of related announcements have caught my eye this week. The first was that Calpont has changed its name to the name of its product InfiniDB, it has raised another round of funding and it has announced version 4.5 of its database with an Enterprise Management dashboard. None of which has much to do with Hadoop except that it reminded me that Calpont (as it then was) announced the availability of InfiniDB running on Hadoop back last year, along with an open source license. And, of course, InfiniDB has a grown-up optimiser.
Another product that has an adult optimiser is HP Vertica. And MapR has just announced an early access program (prior to general availability in March) for the HP Vertica Analytics Platform running on the MapR Hadoop distribution.
The truth is that you will get much better performance—orders of magnitude better—from either InfiniDB or Vertica than you will from Impala. So this poses three questions: firstly, will we see more vendors porting their warehouse products onto Hadoop (or HDFS); secondly, how quickly will Cloudera or HortonWorks (with its SQL implementation) be able to produce an optimiser than can compete reasonably well with these intruders into their market; and, thirdly, how much does this matter?
The answer to the first question is yes. I don’t who or when but this is the general trend, not just in data warehousing but across a variety of markets. The answer to the second question is not soon: it takes years to develop a good optimiser—probably not as many years as it used to, because there is plenty of experience out there, which was not the case historically—but still a significant period.
Thirdly, yes it matters. You may have to pay a license fee for HP Vertica (or not, in the case of InfiniDB) but the performance advantages you get from having a decent optimiser will mean that you need significantly less hardware in order to get comparable performance, and that should more than offset any such license fees. And that also explains why I expect more vendors to do the same thing as InfiniDB and Vertica, because there is a window of opportunity while Cloudera gets its optimiser up to speed.
Long marketed as a way to master huge quantities of data, Hadoop is now booming because its proponents have learned to sell it small.
UD professor presented award for natural language processing work
The Association for Mathematics of Language (SIGMOL) honored the University of Delaware’s Vijay K. Shanker with the inaugural S.Y. Kuroda Prize on Jan. 7 for his work in natural language processing.
The award is given for long-lasting advancements in mathematical linguistics. Shanker, professor in the Department of Computer and Information Sciences, was selected for his work on the convergence of mildly context-sensitive (MCS) formalisms.
#BigData skills pay top dollar Nine of the highest paying 10 IT jobs are for skills related to Big Data
Service Oriented Architecture:$108,9979.
Mongo DB: $107,825
Environment : Windows 7 – 64bit and Virtual Box
VM Image : CDH4 Packages for Virtual Box
1. Create a new Virtual Machine
2. Enter a name for New Virtual Machine and Select the type of the quest operating system you plan to install into the virtual machine
3. Select the amount of base memory (RAM)
4. Select “ Use existing hard disk” and navigate to the folder where you downloaded Cloudera-demo-vm.
If you don’t have demo VM, download it from here: CDH4 Packages for Virtual Box
5. Your going to create a new virtual Cloudera Hadoop in Windows 7 operating system
6. Now turn on Cloudera Hadoop in Windows and run the demo
7. Starting Cloudera Hadoop in Windows virtual machine
8. Cloudera Hadoop demo is now ready in Windows
Running the VM
Once you launch the VM, you are automatically logged in as the cloudera user.
The account details are:
- username: cloudera
- password: cloudera
The cloudera account has sudo privileges in the VM.
To learn more about Hadoop, see the Hadoop Tutorial.
You can access status through the browser at the following URLs:
- NameNode status (localhost:50070)
- JobTracker status (localhost:50030)
- The Hue user interface (localhost:8888)
- The HBase web UI (localhost:60010)
Enjoy your Cloudera Hadoop demo in Windows