Speed of Spark 100 terabytes in in 23 minutes.

March 22, 2016 — Leave a comment

Spark wins Daytona Gray Sort 100TB Benchmark

We are proud to announce that Spark won the 2014 Gray Sort Benchmark (Daytona 100TB category). A team from Databricksincluding Spark committers, Reynold Xin, Xiangrui Meng, and Matei Zaharia, entered the benchmark using Spark. Spark won a tie with the Themis team from UCSD, and jointly set a new world record in sorting.

They used Spark and sorted 100TB of data using 206 EC2 i2.8xlarge machines in 23 minutes. The previous world record was 72 minutes, set by a Hadoop MapReduce cluster of 2100 nodes. This means that Spark sorted the same data 3X faster using 10X fewer machines. All the sorting took place on disk (HDFS), without using Spark’s in-memory cache.

Outperforming large Hadoop MapReduce clusters on sorting not only validates the vision and work done by the Spark community, but also demonstrates that Spark is fulfilling its promise to serve as a faster and more scalable engine for data processing of all sizes.

For more information, see the Databricks blog article written by the Reynold Xin.

No Comments

Be the first to start the conversation!

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s