What is Apache Hadoop?

There’s a lot of excitement around Hadoop software these days, here’s my definition of what “Hadoop” means:

Hadoop ™ is the ASF’s trademark for our Apache Hadoop software product that provides a service and simple programming model for the distributed processing of large data sets across clusters of commodity computers. Many people view Hadoop as the software that started the current “Big Data” processing model, which allows programmers to easily and effectively process huge data sets to get meaningful results.

The best place of all to learn about Hadoop is of course the Apache Hadoop project and community, which says this about the Hadoop software:

“(Hadoop) is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the (simple to program) application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

The Apache Hadoop project at the ASF is related to or has created a large number of notable modules, subprojects, or full projects at Apache, including:

There are a wide variety of vendors who provide Hadoop-related software, however the only source for Hadoop software itself is the Apache Hadoop project here at the ASF. We certainly appreciate the many companies who allow their employees to contribute work to Apache Hadoop and all of our projects, and also to the many Apache Corporate Sponsors. However I do hope that companies working in the Hadoop and related Big Data industry take stock of their marketing strategies, and ensure that their corporate marketing doesn’t shortchange the credit owed to the Apache Hadoop community itself.

We very much appreciate those corporate supporters who do provide plenty of credit to the ASF and the Apache Hadoop community – both the old hats, and the very new spinoff in the Big Data space. I just hope that some of the other players in the industry will carefully consider their public crediting (or lack thereof) to the ASF’s Hadoop brand and the many individual committers and contributors to the Apache Hadoop project.

As always, the Apache Hadoop website and mailing lists are the best place to learn about Hadoop software!

Oh, and remember:

Apache Hadoop, Hadoop, the yellow elephant logo, the names of Apache software products, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries

Shane’s Apache Director Position Statement

The ASF is currently holding it’s annual Member’s meeting, where we elect a new board of directors among other matters (and usually elect a number of new ASF Members!) I am fortunate enough to have been nominated again for the board election, something which I am truly grateful for.

Along with participating in Apache projects and in various organizational ways within the ASF, director candidates typically write a brief (or not so brief) position statement about what their objectives as a director are. These position statements are included in the board ballot that all active Members of the ASF vote on to elect a new board.

Re-reading my position statement, I’ve realized that there’s nothing I’m discussing in my position statement that I wouldn’t mind posting in public – in fact, the more I think about it, this is something I should post in public to try to better explain just how Apache works. As the ASF grows, it becomes more and more important for Members to explain how the ASF works and why it’s core values and the Apache Way are important to us.


Shane Curcuru (curcuru)–Director position statement –v1.1

=============

My mission statement for the ASF

“The mission of the ASF is to provide high quality, open source software for the public good at no cost, and to showcase our meritocratic and community-driven method of building sustainable software projects.”

http://shane.curcuru.name/blog/2009/04/what-i-believe-asf-mission/

Why you should make me your first choice

Because you believe we need to understand the global community in which we interact, while still keeping our neutrality, our public mission, and our projects foremost in our minds. Because you believe in community over code. Because you believe in keeping hats separate, our communities public and open, our organization as simple as feasible, and in making it as easy as possible for our healthy communities to build great software. Because you want board members who recognize the need to work together productively.

My objective as a director and as VP, Brand Management is to provide our projects with the support they need to ensure that our organization and our projects can continue as independent and healthy communities for the long haul. Ensuring that we can control our public perception – our brands – without commercial influence is crucial to ensuring our PMCs and the ASF as a whole remains a neutral place where all contributors are welcomed and can participate equally, and where PMC decisions are made for the benefit of the project itself.

My vision for the ASF

My ideal ASF is a respected, innovative, and neutral collaboration ground for communities and software projects. It’s a place where hobbyists, consultants, and developers from $bigcos collaborate together – easily, freely, equally and openly – building high quality software usable by all. The ideal ASF is known as a place that community-minded software projects can easily join, and that provides a rich technical infrastructure to facilitate the development process.

Our value to the world includes our software products, our community and consensus-based approach to software development, and the way our proven record of success has increased the global acceptance of open source products as a whole.

This past board has worked together well and has been very effective at finding consensus without rancor in the face of some significant challenges. As the ASF grows in members, projects, and technical influence, it is important to be able to keep true to our ideals and still maintain our friendships and respect within the membership.

About Shane

I recently had my 20th anniversary at IBM, where I work in the HR division as an Applications Architect. My employment and income have been unrelated to my work at the ASF for many years, and I will always clearly separate volunteer work from employer-funded work.

My involvement in the ASF is driven by a belief in, and a love of, the ASF, and is not influenced by politics or finances. I live in Arlington, MA with my wife, young daughter, and 2 cats. I view directorships and officer positions at the ASF as serious commitments. I will continue my attendance at every board meeting if re-elected.

Most importantly, my daughter Roxanne asked me to mention that you should vote for me because I’m the best dad ever. (unsolicited quote!)

Addenda – Public Work

I truly believe that we should work in as public a manner as practical, for public work is a key enabler of healthy and growing communities. As the ASF has grown in size and impact on the software world, it is clear that we do need to keep some specific discussions private, especially discussions on various legal matters.

In that light, I am publishing this statement on my blog in this posting.


Reminder: “The postings on this site are my own and don’t necessarily represent the positions, strategies or opinions of my employer nor of any volunteer organization I work with.”

Apache Office, anyone?

I imagine there will be a lot of news – and commentary – and, ahem, heated discussions about today’s submission of the OpenOffice.org codebase to the Apache Incubator by Oracle. Here are a few handy links and thoughts that may be helpful to ponder:

  • It’s official – here’s Oracle’s announcement on “Statements on OpenOffice.org Contribution to Apache“.
  • A key thing to read is the official Foundation blog posting on “Incubation at Apache: What’s it all about?
  • Bertrand recently wrote how “Becoming an Apache project is a process, not just a decision“.
  • Key reminder: Incubation is a process, with many checkpoints. Just because something is submitted to the Apache Incubator does not mean that the Incubator PMC will accept it as a podling. And once we do have a podling, the most important work comes, proving that there can be a healthy community around the project – all before it can even be considered to graduate to a Top Level Project at Apache.
  • Newcomers to Apache may want to review the Apache Community Development project – think of it as an outreach group within the ASF, starting work on explaining to newcomers what the Apache Way is about and where to find the right information on technology and community rules at Apache.
  • Reading Planet Apache is a great way to see what many of the committers at the many Apache projects are saying on their personal blogs.
  • I almost forgot! The best way to learn about how Apache works is to read our mailing lists. You can follow along the Apache Incubator’s discussion yourself, right on general@incubator.apache.org!

Personally, I think one of the most important differences between a potential “Apache Office” podling and the existing (and amazing) LibreOffice product is the license. Obviously, both codebases are fairly similar, and aim to provide a fully open source office suite. It will be interesting to see, after the first wild set of commentary flies, which project – and which license – that various developers and corporations alike choose to actively support with their contributions. I just hope that this license difference – and the way that the OO.o code came to Apache, which was not something we controlled – doesn’t cause any unnecessary friction between the two communities.

I’m glad that The Document Foundation, home of LibreOffice, has spoken out on this donation as well.

And a great external view of the submission comes from Ed Brill, saying “OpenOffice moving to Apache, good news for the desktop productivity market“, and similarly IBM’s Bob Sutor writes his own “Remarks on OpenOffice going to Apache“.

Ooooh, Rob Weir has an excellent “Invitation to Apache OpenOffice” as well! Great reading in there, especially about some other famous Apache projects.

Apache news roundup: raining cloud projects

It’s been a surprisingly quiet couple of months at Apache; well, at least in terms of new projects graduating from the Incubator.

  • Welcome to the Apache Libcloud project – “a standard Python library that abstracts away differences among multiple cloud provider APIs.”

The Apache Libcloud graduation does bring up an important point about Apache project governance and the Incubation process. Apache is happy to host any community-driven projects that wish to use the Apache license and follow the basic Apache Way. This includes potentially competing technologies. In fact, the Apache Incubator currently has 4 other podlings currently that all deal with cloud API abstractions in one way or another!

  • Apache Deltacloud – a REST-like interface allowing common operations across multiple cloud providers – is the most obvious technical competitor to Apache Libcloud.
  • Apache Nuvem is a podling attempting to put a higher level of data and operation abstraction atop cloud APIs, for a slightly different programming model.
  • Apache Whirr is aiming to provide a level of service abstraction for multiple cloud providers, perhaps allowing your Apache Cassandra, or Apache Hadoop related projects to easily move about the clouds.
  • Apache Tashi goes further along the services model, focusing on providing Apache Hadoop and big data processing services that can be pushed to the clouds.

Sound crazy to have competing technical projects? Not at all, once you realize that Apache is all about the communities behind our projects. As a public charity, Apache’s purpose is to provide software for the public good. The way we have found most effective to do that is to allow any healthy communities to compete and grow independently, within the general Apache way. We’re happy to have multiple communities working on the same kind of technology, and all the better if they can each succeed at finding their niche, both for the software they provide, as well as for the community they can build.

This also points to the special place that the Apache Incubator has at the ASF. Bertrand has an great post discussing some of the whys and hows of the Incubator, and the process that new communities (and their projects!) go through before becoming a top level project at Apache.

Apache projects are independent and non-commercial

This post has been improved and posted as guidance to all Apache projects on the Community Development website – please read the new version there!

While not all aspects of the Apache Way are practiced the same way by all projects at the ASF, there are a number of rules that Apache projects are required to follow – things like complying with PMC release voting, legal policy, brand policy, using mailing lists, etc., which are documented in various places. There are a few rules that I think may not be documented as succinctly as they should be.

A primary purpose of the basic requirements the ASF places on its projects is to help ensure long-lived and stable projects by having a broad enough community to maintain the project even in the potential absence of any individual volunteer or any sea change at a major vendor in that area. The Apache project governance model is explicitly based on a diverse community. This is different from other governance models, like the “benevolent dictator” idea or the often corporate-backed model that Eclipse uses.

Apache projects are independent

This is implicit in the fact that the Project Management Committee (PMC) runs the project and the fact that PMC members are expected to contribute to the project as individuals, wearing their “PMC hat”. The concept of hats means that when a PMC member votes on project matters, they are casting their vote as an individual acting in the best interests of that PMC, and not as an employee or representative of some third party. There are also certain expectations of diversity within a PMC; the board may apply extra scrutiny to PMCs with low diversity (i.e. PMCs that are dominated by people with a common employer). Similarly, the ASF does not allow corporations to participate directly in project management, only individuals.

There are two important aspects to this independence: project management, and project use by end users.

Apache projects are managed independently

Apache projects should be managed independently, and PMCs must ensure that they are acting in the best interests of the project as a whole. Note that it is similarly important that the PMC clearly show this independence within their project community. The perception of existing and new participants within the community that the PMC is run independently and without favoring any specific third parties over others is important, to allow new contributors to feel comfortable both joining the community and contributing their work. A community that obviously favors one specific vendor in some exclusive way will often discourage new contributors from competing vendors, which is an issue for the long term health of the project.

Apache products may be used independently

All Apache projects must release their code under the Apache License, which clearly specifies the minimum restrictions that users of Apache software must agree to. Apache software is all about being able to use it for virtually whatever our users want – open source, proprietary, secret – we’re happy to have users take our software (although not our name) for virtually any purpose. While our legal guidelines allow certain other software licenses to be used for specific dependencies, the software we release always uses our license.

Extending this idea, users of Apache software should be able to find our software, learn how to use it, and actually apply it to all its common use cases solely by going to the Apache project’s own website. Apache projects should provide sufficient documentation, install features, basic user help (through mailing lists) and services for the common use cases to the user, without them having to rely on third parties. It is important that our users can both make use of our software freely – both in terms of not having to pay for the software, as well as not having to worry about IP claims or other more restrictive licenses on either the software or the configurations or other common materials required to actually use the software.

Apache projects are non-commercial

The ASF’s mission is to produce software for the public good. All Apache software is always available for free, and solely under the Apache License. While our projects manage the technical implementation of their individual software products independently, Apache software is released from the ASF and is always meant to serve the public good.

We’re happy to have third parties, including for-profit corporations, take our software and use it for their own purposes – even when in some cases it may technically compete with Apache software. However, it is important in these cases to ensure that the brand and reputation of the Apache project is not misused by third parties for their own purposes. It is important for the longevity and community health of our projects that they get the appropriate credit for producing our freely available software.

 


 

Reminder: “The postings on this site are my own and don’t necessarily represent positions, strategies or opinions of either my employer nor the ASF.”

Apache winter news roundup: new and famous projects

It’s continued to be a busy winter at the ASF, with a number of new projects being announced – as well as this year’s ApacheCon!

  • Submit your ideas now for the CFP of ApacheCon NA 2011 – coming to Vancouver this 7-11 November. CFP submissions are open through April.
  • Welcome Apache Extras! Apache Extras is the the place for all your Apache-related software that’s not an Apache project. That means that projects that might not use the Apache license or might not meet the community criteria for formal Apache projects, but are still related to Apache technology. Apache Extras gives you all the infrastructure support of Google Code, and shows your project’s interest in Apache technologies.
  • Welcome to our new Executive Assistant! The ASF has hired an EA to assist with a broad array of administrative tasks, who is already helping out with our conferences and other corporate operations.
  • We’ve got new top level projects! Over the past few months, the Incubator has graduated the following projects:
  • Apache Thrift is a scalable cross-language framework for code generation between a wide variety of popular programming languages.
  • Apache ZooKeeper, an Apache Hadoop spinoff, provides a centralized service for providing distributed synchronization of configuration information and other services.
  • Apache OODT (press release) is middleware for managing data used in critical scientific applications – and features original code and contributors from NASA and the JPL. Yes, real rocket scientists work on OODT!
  • Apache ESME stands for Enterprise Social Messaging Environment, and allos for secure and scalable microsharing and micromessaging applications.
  • Apache Aries implements the EEG’s enterprise OSGi specification for multi-bundle applications.
  • Apache River implements JINI services and allows construction of secure and distributed systems.
  • Apache Chemistry (press release) is an implementation of the OASIS CMIS standard, allowing access to a wide variety of different vendor’s CMIS repositories.
  • We also say goodbye to Apache Excalibur, which has been boxed up and stored in the Apache Attic for posterity – or until someone new comes along to draw the sword back out of the box.
  • There were several other interesting happenings in Apache land recently as well.

    • Apache UIMA and Hadoop technologies helped IBM’s Watson supercomputer defeat humanity in the TV game show Jeopardy! As one of the human contestants wrote: “I, for one, welcome our new computer overlords.”
    • The Apache Subversion project issued an open letter to a corporation who is an active contributor and user of Subversion. While this is an unfortunate situation of a third party effectively usurping some of the good will generated by the Subversion project itself, the issue is being addressed, and it looks like we’ll have a productive resolution. This underscores the importance of appropriate governance and trademark protection for open source projects.
    • Separately, those interested in using Apache projects may be interested in a number of much more detailed trademark policies that the ASF is working on, in an effort to make it simpler for third parties to associate with our projects, while ensuring that our project communities get full and proper credit for their work.

    Why trademarks are important in open source

    Groklaw recently wrote about the upcoming OpenSUSE project creation, and just now Hudson project volunteers are renaming to be Jenkins. These are both excellent examples of why trademarks are important to a successful open source project, and definitely deserve more attention.

    Trademarks, you say? Isn’t that some complex legal stuff that big companies care about? Well, yes – it’s certainly complex law – but you should care about it too. Think of a trademark as a pointer to your project’s reputation. A trademark is the symbol that represents your project’s reputation, and associates that name with your product in the minds of the consumers. In both cases, the community behind the project is paying attention to branding for the project – and working to ensure that control over the project’s name stays with the community, not a commercial company.

    Trademarks ensure that consumers – in our case, either end users or developers – know where the Foo project comes from, and know to come to the correct Foo project to participate and get the code. You may have the best Foo in the world, but if no-one knows about your Foo (or it’s name), then it’s had to attract much interest.

    This is a key reason for Apache being a non-profit corporation (likewise, a reason for many other truly community-led open source foundations). The Apache Software Foundation controls the trademarks associated with it’s projects, and manages them for our projects. As a vendor-neutral organization, the ASF can ensure that ownership of our trademarks stays with the larger Apache community, and can’t be co-opted by a commercial entity.

    Apache’s trademark policies are posted publicly. We have guidelines for how our PMCs should represent Apache marks on our sites, and are working on important updates to the Apache Event Branding policy.

    Another good resource on trademarks in open source is Passport Without A Visa: Open Source Software Licensing and Trademarks. It’s worth learning enough about trademarks to ensure that you consider it for any new projects you work on. Note, however, that trademark law is complex, and many of the answers to trademark questions are “It depends”. Thus it’s always recommended to consult legal counsel if you have serious questions about trademarks.

    The JCP is dead; long live Java

    The Apache Software Foundation has just announced it’s resignation from the Java SE/EE Executive Committee. After several other recent community departures from the EC, and scathing commentary supplied as comments with the votes from other EC members for the recent Java 7/8, it’s clear that Apache is not alone in it’s dissatisfaction with Oracle’s complete and overt control over what is purportedly a community effort. As another Apache member has said:

    The Executive Committee is clearly not a Committee of Executives, nor is the Java Community Process a Process involving the Java Community.

    I applaud everyone who has done technical work on recent versions of Java, and I’m sure plenty of people will still want to program in Java. That’s great. But please – when you do use Java, please remember that it is *not* built on open standards. It is built on technology (and patents) and licenses that Oracle controls, and is quite happy to exercise it’s control over all things Java.

    If you’re happy paying Oracle more and more licensing fees in the future, more power to you. But if you’re not, then you really need to understand the problem of the TCK Trap.

    Stay tuned for more updates from the ASF’s Foundation blog on what this means for the many many excellent Java based Apache projects. And, follow the #jcpisdead hashtag to understand what impact it may have on your Java future.

    For more reading, follow my set of ‘oraclemess’ Delicious bookmarks.

    Java: no longer free as in speech

    El Reg has breaking news that the JCP vote on the Java 7/8 JSR’s has passed. Apache and Google voted no, and the rest of the players sadly (but perhaps unsurprisingly) voted yes. This effectively changes the game around Java standards stewardship – you might say it turns the JCP into “Just Customers, Please“, removing any real community input that Oracle doesn’t choose to accept.

    There are plenty of past links to learn about what this really means, and I’m sure that Stephen Colebourne will have some insightful commentary once he wakes up and has a cup of jav… er, tea.

    A key indicator of the feeling of the JCP are the signing statements that most of the “Yes” voters supplied: all (except Oracle) agreed with Apache and Google that the FOU restrictions Oracle is mandating are objectionable and inappropriate (not to mention apparently incompatible with the JSPA). If only wishes were horses, and signing statements had real force, then we could all be happy.

    I’m not that surprised that the larger software vendors cast their votes where they think their bottom lines are, and went with a “Yes”, FOU reservations in their signing statements notwithstanding. I am a little surprised that Eclipse and Red Hat caved into Oracle, given that their businesses are also open source, and they’re clearly going to have to pony up to Oracle somehow to get sufficient licenses to continue shipping Java related software.

    Don’t get me wrong: for all the Java developers who don’t care about how their underlying technology is licensed, I’m happy for you! You now have Oracle’s Java roadmap, and presumably Oracle will start delivering some cool Java technology. But please: don’t fool yourself or your friends into believing that the Java standards ecosystem is open, free, or community based in any real way. Oracle owns the court now, and don’t be surprised if it starts charging some people for renting balls along with court time to play.

    It’s a sad day, since I really do like programming in Java. I still will, sometimes; but not as often. And not without realizing that Java is no longer free as in speech. We’ll see how long that Java remains free as in beer, now that Oracle’s realized they run the JCP.

    ApacheCon NA 2010 Wrapup

    Along with a few news tidbits today, here’s my long-awaited ApacheCon NA 2010 blog wrapup, featuring highlights from attendees.

    And of course there’s the official roundup from the show floor at ApacheCon. If I missed your great blog post about this year’s ApacheCon, let me know!

    In other news, BarCampApache Sydney is this weekend, on 11-Dec with it’s own press release and discussion group.

    And today is the big day! The Java EC’s vote on the Java 7/8 JSRs concludes today, which will determine the openness – or lack thereof – of the future of Java. I’m sure that Stephen Colbourne will be covering it.