Monday, March 31, 2014

Google and Facebook Team Up to Modernize Old-School Databases

03.27.14 |

A central cooling plant in Google’s Douglas County, Georgia, data center. Photo: Google/Connie Zhou

Google completely changed the world of computer databases when it published a research paper describing “Big Table,” a sweeping system it built to store information across its online empire.

Released in 2006, the paper revealed an approach to data storage that did away with the traditional model used by relational databases, which are designed to store data in neat rows and columns on a single machine. Basically, BigTable made it easier to spread data onto hundreds or even thousands of computer servers. Along with a paper published by Amazon about its own adventures in data storage, the Big Table concept spawned dozens of open source imitators. These “NoSQL” databases play a big role inside the biggest names of on the web and beyond, including Facebook, LinkedIn, and Twitter, as well as Google.

But the need for old-fashioned relational databases never went away. To this day, all the big web companies still depend on the open source database MySQL and its variants, such as MariaDB. There are still cases where it makes sense to store data in neat rows and columns, so that you can very quickly retrieve it, slice it, and dice it. But because their operations are so large, such companies also need ways of running these databases across many machines.

That’s why Facebook, LinkedIn, Twitter, and Google have teamed up to create what they callWebScaleSQL, a custom version of MySQL designed just for large scale web companies. Their changes to the database will be open sourced, meaning they’ll be freely shared with the world at large, and the companies plan to contribute their changes back to the original MySQL project. “Our goal in launching WebScaleSQL is to enable the scale-oriented members of the MySQL community to work more closely together in order to prioritize the aspects that are most important to us,” Facebook’s Steaphan Greene writes in a blog post announcing the project this morning.

Details are a scant, but the project includes new ways to stress test large-scale SQL databases and optimizations for certain types of information queries.

The companies have worked together for the past few months on the project, reviewing code contributions from each other. “To introduce a code change, a WebScaleSQL engineer can propose a change,” Greene writes. “Then a WebScaleSQL engineer from another company will review the code and provide feedback.” If the engineers from both companies agree that it’s a good change, it will become an official part of the code base for everyone to use. If not, then the company that came up with the idea can keep using the new change or feature internally.

The project is a great example of how open source can help competitors work together to solve problems. Facebook, LinkedIn, and Twitter already collaborate with many other companies on the Hadoop open source project, a means of analyzing large amounts of data. But bringing Google to the table is a huge win.

The company open sources many of its own projects — including the Go programming language and the Android operating system — and it contributes to venerable open source projects like Debian Linux and FreeBSD. But it has been notably absent from the development of many contemporary open source ultra-high scale web engineering projects, even though it helped pioneer many of the ideas that underpin those projects through its research papers.

No comments:

Post a Comment