Dear Conference Organizers: Improving Speaker Emails

Juggling several speaking engagements coming up, I’m reminded of how hard the job of conference organizers is.  Having helped to run ApacheCon as part of a volunteer team for years, I know how hard it is selecting talks, wrangling speaker acceptances (and rejections), and ensuring your final conference schedule is appealing.  And wrangling your clunky CFP system and keeping the finicky schedule website updated are two problems that software hasn’t solved yet.

Equally important is how the conference acceptance & organization process works from the speaker’s side.  Remember?  Those people who make all the content your conference relies on?  All those people who you love and appreciate – but don’t who you don’t pay anything – and who you’ll do anything to fix last minute problems for?  While we can’t prevent all the last minute problems, there are a few simple steps to improve the speaker communication process to help prevent problems.

Continue reading Dear Conference Organizers: Improving Speaker Emails

What is Apache Hadoop? Website Brand Review

Website Brand Review of Apache Hadoop

We’ve all heard of Apache® Hadoop® – well, at least heard of Hadoop, and by now you should realize it’s an Apache project! But when was the last time you took a critical eye to the actual Apache Hadoop project’s homepage?.

Here’s my quick review of the Apache Hadoop project, told purely from the point of view of a new user finding the project website.

What Is Apache Hadoop?

“Apache Hadoop (is) a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models”

“Hadoop is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

Continue reading What is Apache Hadoop? Website Brand Review

What is Apache Mahout? Website Brand Review

Website Brand Review of Apache Mahout

While we’ve all heard about Apache Hadoop, did you know there are over a dozen big data projects at Apache? We host projects that provide everything for your big data stack: databases, storage, streaming, logging, analysis, machine learning, and more. Apache Mahout is one of the pieces that puts a big data stack to do higher-level work for you.

Here’s my quick review of the Apache Mahout project, told purely from the point of view of a new user finding the project website.

Happy Birthday! This month is the Apache Mahout project’s 6th #ApacheBirthday!

What Is Apache Mahout?

“The Apache Mahout™ project’s goal is to build an environment for quickly creating scalable performant machine learning applications.”

While this is a laudable statement – and nicely emphasises the community behind the project – it doesn’t directly say what the software they provide does.

“The three major components of Mahout are an environment for building scalable algorithms, many new Scala + Spark and H2O (Apache Flink in progress) algorithms, and Mahout’s mature Hadoop MapReduce algorithms.”

Continue reading What is Apache Mahout? Website Brand Review

Who’s Who at Apache: Roles and Responsibilities

There’s a huge amount of volunteer energy that flows around Apache’s Annual Member Meeting every year.  Old members and new alike come together and brainstorm all sorts of new ideas, both organizational and technical – and we have plenty of online… discussions, let us say.  There is an amazing amount of energy from a lot of very smart people, and when we focus  this energy, we make real improvements to the Foundation and sometimes in some of our projects.

As we’ve grown, keeping a full shared understanding of all the details of membership and corporate operations has become much harder.  We have some documentation, but we also still have a lot of tribal knowledge and decisions hidden in our mailing list archives.  To understand the same things, we need to be able to see what rules or policies we’ve actually decided on – or at least written down.

So here is an overview of all the different roles that people can have with the ASF as either a Foundation or with specific Apache projects.  In particular, I’m focusing on the specific agreements we make with individuals, or the explicitly posted policies that we expect people to abide by.  For more information on how Apache works, see /dev, /governance, and Community.

Continue reading Who’s Who at Apache: Roles and Responsibilities

Shane’s Apache Director Position Statement, 2016

The ASF is holding it’s annual Member’s Meeting this week to elect a new board and a number of new Members to the ASF.  I’m honored to have been nominated to stand for the board election, and I’m continuing my tradition of publicly posting my vision for Apache each year.

We are lucky to have both a large involved membership, as well as another excellent slate of candidates including a couple of great new faces. No matter how Apache STeVe ends up computing the results, Apache will have a great board for the year to come.

Please read on for my take on what’s important for the ASF’s future…

Continue reading Shane’s Apache Director Position Statement, 2016

What is Apache Hive? Website Branding Review

Website Brand Review of Apache Hive

While we’ve all heard about Apache Hadoop®, did you know there are over a dozen big data projects at Apache? We host projects that provide all the different functions your big data stack: databases, storage, streaming, logging, analysis, and more. Apache Hive™ is one of these pieces of the whole big data ecosystem.

Here’s my quick review of the Apache Hive project, told purely from the point of view of a new user finding the project website.

What Is Apache Hive?

“The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage”.

Continue reading What is Apache Hive? Website Branding Review

What is Apache Flex? Website Branding Review

Website Brand Review of Apache Flex

Many projects come to Apache from software vendors donating them to the Apache community, where the Apache Incubator works to form an open and independent community around the project. Here, Adobe donated both the code and the brand for their Flex project to Apache. Now, the ASF is the steward both to the vibrant Apache Flex community, as well as the new owner of the Flex brand and registered trademark.

Here’s my quick review of the Apache Flex project, told purely from the point of view of a new user finding the project website. While we’re all familiar with Adobe Flash browser plugin, not everyone may be familiar with the Flex environment for building Flash (and other!) applications.

What Is Apache Flex?

Apache Flex® is the open-source framework for building expressive web and mobile applications.

In other words, Flex is a toolkit for building general applications that can be run on a variety of web browsers and mobile platforms that include the Adobe Flash or Adobe AIR runtimes or application containers. Flex is the coding language and environment you use to write applications for the Flash/AIR containers.

No, Really, What Is Apache Flex For?

Continue reading What is Apache Flex? Website Branding Review

What is Apache HBase? Website Branding Review

Website Brand Review of Apache HBase

How do open source projects get popular? By providing some useful functionality that users want to have. How do open source projects thrive over the long term? By turning those users into contributors who then help improve and maintain the project. How well a project showcases themselves on the web is an important part of the adoption and growth cycle.

Here’s my quick review of the Apache HBase project, told purely from the point of view of a new user finding the project website. HBase is a key part of the big data storage stack, so although you may not work directly with it, it’s probably underlying some systems you use.

What Is Apache HBase?

“Apache HBase™ is the Hadoop® database, a distributed, scalable, big data store”.

Continue reading What is Apache HBase? Website Branding Review

What Is Apache Mesos? Website Branding Review

Website Brand Review of Apache Mesos

How do open source projects get popular? By providing some useful functionality that users want to have. How do open source projects thrive over the long term? By turning those users into contributors who then help improve and maintain the project. How well a project showcases themselves on the web is an important part of the adoption and growth cycle.

Here’s my quick review of the Apache Mesos project, told purely from the point of view of a new user finding the project website. Mesos is turning into a major project in the big data and cloud space; not perhaps the obvious popularity of Apache Spark yet, but certainly big.

What Is Apache Mesos?

Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.

Continue reading What Is Apache Mesos? Website Branding Review

Apache CMS: Adding static data tables easily?

Is there an easy way to add simple generated data tables from CSV or the like using the Apache CMS system for the apache.org website? I.e. I want to checkin a CSV (or other simple table of data) that certain committers can edit via a spreadsheet, and then display selected rows from that table on an apache.org/foundation/ webpage in some semi-pretty manner.

Did you know that the ASF has their own CMS / static generator / magic update system that runs the apache.org homepage and many Apache project homepages? While it’s more of an Apache infra tool rather than a full Apache top level project, it’s still a full service solution for allowing multiple static website builds that are integrated into our servers.

While there are plenty of great technical CMS systems, when choosing a system for your company, many of the questions are organizational and deployment related. How easy is it for your IT team to manage the core system? How easy is it for various teams (or projects) to store and update their own content, perhaps using different templates in the system? How can you support anonymous editing/patch submission from non-committers? Does it support a safe and processor-respectful static workflow, minimizing the load on production servers while maximizing backups? And how can you do all this with a permissive license, and only hosting your own work?

The Apache CMS – while a bit crufty – supports all these things (although the infra peeps might argue about the maintenance part!) Everything’s stored in SVN, so restoring a backup or bringing the production server back is just checking the tree out again. Many projects use a Markdown variant, although some projects configure in their own static generator tools. The web GUI, while sparse, does have a great tutorial for submitting anonymous patches to Apache websites.

My question is: what’s the simplest way to have an apache.org top level webpage pull in some sort of simple data source? In particular, I don’t want to have to maintain much code, and I only want to add this data table bit within an existing page, without having to run my own whole generation script.

The first specific use case is displaying /foundation/marks/list/registered, a normal a.o page that will display a data table of all the registered trademarks the ASF owns. I’ll checkin a CSV that I get from our counsel that includes all the legal details of our trademarks.

Bonus points for a simple system that:

  • Can pull some columns from a separate table: namely, projects.a.o descriptions from the projects.
  • Can pull my CSV listing trademark numbers from a private repo (committers or foundation).
  • Uses Python or JS and not Perl.

Note: I have cut back my $dayjob recently, so I will actually have time to write some of the code for this work myself now – finally!