Monday, February 23, 2015

Big Data, Hadoop Standards Group: Who's In, Who's Missing?


All eyes in the big data world are on the Open Data Platform -- a new association that strives to promote big data technologies and open source platforms like Hadoop. While promising and backed by big names like GE and IBM, the Open Data Platform initiative also lacks some key names. Here's a reality check.

First, the big picture. The Open Data Platform will "promote big data technologies based on open source software from the Apache Hadoop ecosystem and optimize testing among and across the ecosystem’s vendors. These efforts will accelerate the ability of enterprises to build or implement data-driven applications," according to a statement from the association's founders.

In decades past, similar standards groups have united to promote Linux, Unix, WiFi and other emerging platform technologies. But standards groups can also resemble the political landscape -- as vendors sometimes break off and move to the extreme left or right of the group's stated goals.

Big Names to Start

Several industry giants and startups are driving the Open Data Platform group -- including Altiscale, Capgemini, CenturyLink, EMC, GE, Hortonworks, IBM, Infosys, Pivotal, SAS, Splunk, Teradata Verizon and VMware.

Still, some key names also are missing from effort. Chief among them:
  • Cloud services providers like Amazon Web Services, Google Cloud Platform, Microsoft Azure and Rackspace -- each of which promotes various Hadoop efforts on the public cloud.
  • Hadoop specialists like Cloudera (which now has a $100 million annual revenue run rate) and rival MapR -- both of which compete with Hortonworks.
  • Hardware providers that ship servers and tune their systems for Hadoop -- including HP, Dell and others.
  • NoSQL database providers that work closely with the Hadoop industry -- such MongoDB and others.
Both Cloudera and MapR have publicly stated that they consider the Open Data Platform effort to be redundant with existing Hadoop standards work. And some critics view ODP as an EMC-VMware-Pivotal effort to boost the trio's relevance in the Hadoop game.

Eight Core Goals

Despite those points of debate, the Open Data Platform's powerful members should be able to flex some muscle in the weeks and months ahead. The existing members say they have eight core goals:
  1. Accelerate the delivery of big data solutions by providing a well-defined core platform to target.
  2. Define, integrate, test and certify a standard "ODP Core" of compatible versions of select big data open source projects. This area, Information Management believes, could be particularly tricky as vendors potentially try to promote their wares into the standards-based platform.
  3. Provide a stable base against which big data solutions providers can qualify solutions.
  4. Produce a set of tools and methods that enable members to create and test differentiated offerings based on the ODP core.
  5. Reinforce the role of the Apache Software Foundation (ASF) in the development and governance of upstream projects.  This is particularly important, Information Management believes, since Apache has been so instrumental in Hadoop's development so far.
  6. Contribute to ASF projects in accordance with ASF processes and Intellectual Property guidelines. Here again, Information Management believes the group is trying to stress open, vendor-neutral collaboration through ASF.
  7. Support community development and outreach activities that accelerate the rollout of modern data architectures that leverage Apache Hadoop.
  8. Will help minimize the fragmentation and duplication of effort within the industry.
That last point could be particularly difficult for ODP members to navigate, Information Management believes. By their very nature, technology companies strive to differentiate their wares through proprietary, high-value add-ons.

But on the flip side, there are numerous examples of open standards -- HTTP, Ethernet, WiFi, etc. -- that ultimately benefitted both vendors and their customers. Open Data Platform certainly hopes its efforts mirror those hugely successful outcomes.

1 comment:

  1. Bluehost is ultimately one of the best web-hosting provider with plans for any hosting requirements.

    ReplyDelete