Apache CMS: Adding static data tables easily?

Is there an easy way to add simple generated data tables from CSV or the like using the Apache CMS system for the apache.org website? I.e. I want to checkin a CSV (or other simple table of data) that certain committers can edit via a spreadsheet, and then display selected rows from that table on an apache.org/foundation/ webpage in some semi-pretty manner.

Did you know that the ASF has their own CMS / static generator / magic update system that runs the apache.org homepage and many Apache project homepages? While it’s more of an Apache infra tool rather than a full Apache top level project, it’s still a full service solution for allowing multiple static website builds that are integrated into our servers.

While there are plenty of great technical CMS systems, when choosing a system for your company, many of the questions are organizational and deployment related. How easy is it for your IT team to manage the core system? How easy is it for various teams (or projects) to store and update their own content, perhaps using different templates in the system? How can you support anonymous editing/patch submission from non-committers? Does it support a safe and processor-respectful static workflow, minimizing the load on production servers while maximizing backups? And how can you do all this with a permissive license, and only hosting your own work?

The Apache CMS – while a bit crufty – supports all these things (although the infra peeps might argue about the maintenance part!) Everything’s stored in SVN, so restoring a backup or bringing the production server back is just checking the tree out again. Many projects use a Markdown variant, although some projects configure in their own static generator tools. The web GUI, while sparse, does have a great tutorial for submitting anonymous patches to Apache websites.

My question is: what’s the simplest way to have an apache.org top level webpage pull in some sort of simple data source? In particular, I don’t want to have to maintain much code, and I only want to add this data table bit within an existing page, without having to run my own whole generation script.

The first specific use case is displaying /foundation/marks/list/registered, a normal a.o page that will display a data table of all the registered trademarks the ASF owns. I’ll checkin a CSV that I get from our counsel that includes all the legal details of our trademarks.

Bonus points for a simple system that:

  • Can pull some columns from a separate table: namely, projects.a.o descriptions from the projects.
  • Can pull my CSV listing trademark numbers from a private repo (committers or foundation).
  • Uses Python or JS and not Perl.

Note: I have cut back my $dayjob recently, so I will actually have time to write some of the code for this work myself now – finally!

Even better than Hadoop!

You know what’s even better than using Hadoop? Using Apache Hadoop!

Even better is Apache Ambari to manage your Apache Cassandra data store through Apache Hive with Apache Pig to make it simpler to write Apache Spark compute flows… Or, if you want it assembled for you, just grab the latest Apache BigTop, which already includes a bunch of Apache Hadoop related packages all together.

How can we do a better job of getting at least a single “Apache Hadoop” into some of the many media stories about Hadoop these days? It’s great that all these vendors are making great technology and projects that power big data, but with all their success and fancy marketing campaigns, you’d think we could get just a tiny bit of credit in the popular press with the actual committers on the core Apache Hadoop project itself. Or any of the other Apache project technologies that these vendors, other software companies – and just about every other company too – rely on every day to help make their websites work.

Would it hurt marketers and journalists and bloggers to throw in just one extra “Apache” before talking about the many free Apache software products that help power more than half the internet?

The ASF and Apache projects give away a tremendous amount of technology every day under our permissive Apache license – always for free. All we ask is respect for our trademarks, and a little bit of credit for the many volunteer communities that build Apache software.

P.S. Apache projects love to get more code, documentation, testing, and other contributions too! And the ASF has a Sponsorship program.

But what we we really want is what every human wants: just a little love. Just an extra Apache here and there makes us feel better.


What is Apache Hadoop?

There’s a lot of excitement around Hadoop software these days, here’s my definition of what “Hadoop” means:

Hadoop ™ is the ASF’s trademark for our Apache Hadoop software product that provides a service and simple programming model for the distributed processing of large data sets across clusters of commodity computers. Many people view Hadoop as the software that started the current “Big Data” processing model, which allows programmers to easily and effectively process huge data sets to get meaningful results.

The best place of all to learn about Hadoop is of course the Apache Hadoop project and community, which says this about the Hadoop software:

“(Hadoop) is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the (simple to program) application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

The Apache Hadoop project at the ASF is related to or has created a large number of notable modules, subprojects, or full projects at Apache, including:

There are a wide variety of vendors who provide Hadoop-related software, however the only source for Hadoop software itself is the Apache Hadoop project here at the ASF. We certainly appreciate the many companies who allow their employees to contribute work to Apache Hadoop and all of our projects, and also to the many Apache Corporate Sponsors. However I do hope that companies working in the Hadoop and related Big Data industry take stock of their marketing strategies, and ensure that their corporate marketing doesn’t shortchange the credit owed to the Apache Hadoop community itself.

We very much appreciate those corporate supporters who do provide plenty of credit to the ASF and the Apache Hadoop community – both the old hats, and the very new spinoff in the Big Data space. I just hope that some of the other players in the industry will carefully consider their public crediting (or lack thereof) to the ASF’s Hadoop brand and the many individual committers and contributors to the Apache Hadoop project.

As always, the Apache Hadoop website and mailing lists are the best place to learn about Hadoop software!

Oh, and remember:

Apache Hadoop, Hadoop, the yellow elephant logo, the names of Apache software products, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries