Apache CMS: Adding static data tables easily?

Is there an easy way to add simple generated data tables from CSV or the like using the Apache CMS system for the apache.org website? I.e. I want to checkin a CSV (or other simple table of data) that certain committers can edit via a spreadsheet, and then display selected rows from that table on an apache.org/foundation/ webpage in some semi-pretty manner.

Did you know that the ASF has their own CMS / static generator / magic update system that runs the apache.org homepage and many Apache project homepages? While it’s more of an Apache infra tool rather than a full Apache top level project, it’s still a full service solution for allowing multiple static website builds that are integrated into our servers.

While there are plenty of great technical CMS systems, when choosing a system for your company, many of the questions are organizational and deployment related. How easy is it for your IT team to manage the core system? How easy is it for various teams (or projects) to store and update their own content, perhaps using different templates in the system? How can you support anonymous editing/patch submission from non-committers? Does it support a safe and processor-respectful static workflow, minimizing the load on production servers while maximizing backups? And how can you do all this with a permissive license, and only hosting your own work?

The Apache CMS – while a bit crufty – supports all these things (although the infra peeps might argue about the maintenance part!) Everything’s stored in SVN, so restoring a backup or bringing the production server back is just checking the tree out again. Many projects use a Markdown variant, although some projects configure in their own static generator tools. The web GUI, while sparse, does have a great tutorial for submitting anonymous patches to Apache websites.

My question is: what’s the simplest way to have an apache.org top level webpage pull in some sort of simple data source? In particular, I don’t want to have to maintain much code, and I only want to add this data table bit within an existing page, without having to run my own whole generation script.

The first specific use case is displaying /foundation/marks/list/registered, a normal a.o page that will display a data table of all the registered trademarks the ASF owns. I’ll checkin a CSV that I get from our counsel that includes all the legal details of our trademarks.

Bonus points for a simple system that:

  • Can pull some columns from a separate table: namely, projects.a.o descriptions from the projects.
  • Can pull my CSV listing trademark numbers from a private repo (committers or foundation).
  • Uses Python or JS and not Perl.

Note: I have cut back my $dayjob recently, so I will actually have time to write some of the code for this work myself now – finally!

Even better than Hadoop!

You know what’s even better than using Hadoop? Using Apache Hadoop!

Even better is Apache Ambari to manage your Apache Cassandra data store through Apache Hive with Apache Pig to make it simpler to write Apache Spark compute flows… Or, if you want it assembled for you, just grab the latest Apache BigTop, which already includes a bunch of Apache Hadoop related packages all together.

How can we do a better job of getting at least a single “Apache Hadoop” into some of the many media stories about Hadoop these days? It’s great that all these vendors are making great technology and projects that power big data, but with all their success and fancy marketing campaigns, you’d think we could get just a tiny bit of credit in the popular press with the actual committers on the core Apache Hadoop project itself. Or any of the other Apache project technologies that these vendors, other software companies – and just about every other company too – rely on every day to help make their websites work.

Would it hurt marketers and journalists and bloggers to throw in just one extra “Apache” before talking about the many free Apache software products that help power more than half the internet?

The ASF and Apache projects give away a tremendous amount of technology every day under our permissive Apache license – always for free. All we ask is respect for our trademarks, and a little bit of credit for the many volunteer communities that build Apache software.

P.S. Apache projects love to get more code, documentation, testing, and other contributions too! And the ASF has a Sponsorship program.

But what we we really want is what every human wants: just a little love. Just an extra Apache here and there makes us feel better.


What is Apache Hadoop?

There’s a lot of excitement around Hadoop software these days, here’s my definition of what “Hadoop” means:

Hadoop ™ is the ASF’s trademark for our Apache Hadoop software product that provides a service and simple programming model for the distributed processing of large data sets across clusters of commodity computers. Many people view Hadoop as the software that started the current “Big Data” processing model, which allows programmers to easily and effectively process huge data sets to get meaningful results.

The best place of all to learn about Hadoop is of course the Apache Hadoop project and community, which says this about the Hadoop software:

“(Hadoop) is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the (simple to program) application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

The Apache Hadoop project at the ASF is related to or has created a large number of notable modules, subprojects, or full projects at Apache, including:

There are a wide variety of vendors who provide Hadoop-related software, however the only source for Hadoop software itself is the Apache Hadoop project here at the ASF. We certainly appreciate the many companies who allow their employees to contribute work to Apache Hadoop and all of our projects, and also to the many Apache Corporate Sponsors. However I do hope that companies working in the Hadoop and related Big Data industry take stock of their marketing strategies, and ensure that their corporate marketing doesn’t shortchange the credit owed to the Apache Hadoop community itself.

We very much appreciate those corporate supporters who do provide plenty of credit to the ASF and the Apache Hadoop community – both the old hats, and the very new spinoff in the Big Data space. I just hope that some of the other players in the industry will carefully consider their public crediting (or lack thereof) to the ASF’s Hadoop brand and the many individual committers and contributors to the Apache Hadoop project.

As always, the Apache Hadoop website and mailing lists are the best place to learn about Hadoop software!

Oh, and remember:

Apache Hadoop, Hadoop, the yellow elephant logo, the names of Apache software products, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries

Apache projects are independent and non-commercial

While not all aspects of the Apache Way are practiced the same way by all projects at the ASF, there are a number of rules that Apache projects are required to follow – things like complying with PMC release voting, legal policy, brand policy, using mailing lists, etc., which are documented in various places. There are a few rules that I think may not be documented as succinctly as they should be.

A primary purpose of the basic requirements the ASF places on it’s projects are to help ensure long-lived and stable projects by having a broad enough community to maintain the project even in the potential absence of any individual volunteer or any sea change at a major vendor in that area. The Apache project governance model is explicitly based on a diverse community. This is different from other governance models, like the “benevolent dictator” idea or the often corporate-backed model that Eclipse uses.

Apache projects are independent

This is implicit in the fact that the Project Management Committee (PMC) runs the project, and the fact that PMC members are expected to contribute to the project as individuals, wearing their “PMC hat”. The concept of hats means that when a PMC member votes on project matters, they are casting their vote as an individual acting in the best interests of that PMC, and not as an employee or representative of some third party. There are also certain expectations of diversity within a PMC; the board may apply extra scrutiny to PMCs with low diversity (i.e. PMCs that are dominated by people with a common employer). Similarly, the ASF does not allow corporations to participate directly in project management, only individuals.

There are two important aspects to this independence: project management, and project use by end users.

Apache projects are managed independently

Apache projects should be managed independently, and PMCs must ensure that they are acting in the best interests of the project as a whole. Note that it is similarly important that the PMC clearly show this independence within their project community. The perception of existing and new participants within the community that the PMC is run independently and without favoring any specific third parties over others is important, to allow new contributors to feel comfortable both joining the community and contributing their work. A community that obviously favors one specific vendor in some exclusive way will often discourage new contributors from competing vendors, which is an issue for the long term health of the project.

Apache products may be used independently

All Apache projects must release their code under the Apache License, which clearly specifies the minimum restrictions that users of Apache software must agree to. Apache software is all about being able to use it for virtually whatever our users want: open source, proprietary, secret: we’re happy to have users take our software (although not our name) for virtually any purpose. While our legal guidelines allow certain other software licenses to be used for specific dependencies, the software we release always uses our license.

Extending this idea, users of Apache software should be able to find our software, learn how to use it, and actually apply it to all its common use cases solely by going to the Apache project’s own website. Apache projects should provide sufficient documentation, install features, basic user help (through mailing lists) and services for the common use cases to the user, without them having to rely on third parties. It is important that our users can both make use of our software freely – both in terms of not having to pay for the software, as well as not having to worry about IP claims or other more restrictive licenses on either the software or the configurations or other common materials required to actually use the software.

Apache projects are non-commercial

The ASF’s mission is to produce software for the public good. All Apache software is always available for free, and solely under the Apache License. While our projects manage the technical implementation of their individual software products independently, Apache software is released from the ASF, and is always meant to serve the public good.

We’re happy to have third parties, including for-profit corporations, take our software and use it for their own purposes – even when in some cases it may technically compete with Apache software. However it is important in these cases to ensure that the brand and reputation of the Apache project is not misused by third parties for their own purposes. It is important for the longevity and community health of our projects that they get the appropriate credit for producing our freely available software.



Reminder: “The postings on this site are my own and don’t necessarily represent positions, strategies or opinions of either my employer nor the ASF.”

July Apache news roundup: Greg! Adobe+Day! FOP! FOP?

A brief listing of some of the news around the ASF this past month.

Oh, and the ASF elected a new board of directors as well – there are some different (and one new) faces, but overall, we expect steady sailing into better waters.

Want to get your own news about Apache projects? Read or feed from the announce list, official Foundation and project blogs, or get the Planet Apache community perspective.

apache.numprojects -= 1; apache.karaf.intro = “Welcome!”

For only the fourth time in our history at last month’s June board meeting we passed resolutions that effectively reduced the total number of Apache projects by one.

  • As was widely expected, the board terminated the Apache iBATIS project, and sent it to the Apache Attic. This recognizes that we don’t expect there to be an active Apache iBATIS community, and that we don’t expect there to be any new development in that project for a while. The Apache Attic will continue to provide all the project’s resources on a read-only basis for any existing users. (Note: current users may also be interested in the external fork over at mybatis.org)
  • The board also terminated the little-known Apache Quetzalcoatl project and moved it to the Attic. “Quetz” had been charged with developing the mod_python module, but it never really took off as an organized Apache project. Current users may be interested in finding the sources over at modpython.org
  • In happier news, the board voted to promote an Apache Felix subproject named Karaf to top-level status. Apache Karaf is a small OSGi based runtime which provides a lightweight container onto which various components and applications can be deployed. The Felix PMC had seen that there was sufficient community around just the Karaf subproject that it deserved to have it’s own project.

So that’s two projects down, but one project up for the month of June.

iBATIS and Quetz both join previously retired projects in the Attic, HiveMind, Shale, AxKit, Xang, Beehive, and Jakarta Taglibs. Each are projects that had lost an effective Apache community able to actively develop them.

In the past, the ASF has also terminated a handful of other projects before the Attic was opened in 2008; those include Apache Commons (the first version) and Apache Avalon, both terminated for community issues. The ASF also once had an Apache PHP project that was terminated; in that case it was a happy and mutual separation of the PHP Group from the ASF.

Resolutions for creating and terminating Apache projects are passed by the board, typically at monthly meetings, and our public records of formal board actions are always available.

Stay tuned for news of the upcoming Annual Member’s Meeting of the ASF being held in mid-July, where we’ll also be electing a new board of directors.