What is Apache Mahout? Website Brand Review

Website Brand Review of Apache Mahout

While we’ve all heard about Apache Hadoop, did you know there are over a dozen big data projects at Apache? We host projects that provide everything for your big data stack: databases, storage, streaming, logging, analysis, machine learning, and more. Apache Mahout is one of the pieces that puts a big data stack to do higher-level work for you.

Here’s my quick review of the Apache Mahout project, told purely from the point of view of a new user finding the project website.

Happy Birthday! This month is the Apache Mahout project’s 6th #ApacheBirthday!

What Is Apache Mahout?

“The Apache Mahout™ project’s goal is to build an environment for quickly creating scalable performant machine learning applications.”

While this is a laudable statement – and nicely emphasises the community behind the project – it doesn’t directly say what the software they provide does.

“The three major components of Mahout are an environment for building scalable algorithms, many new Scala + Spark and H2O (Apache Flink in progress) algorithms, and Mahout’s mature Hadoop MapReduce algorithms.”

Continue reading

Who’s Who at Apache: Roles and Responsibilities

There’s a huge amount of volunteer energy that flows around Apache’s Annual Member Meeting every year.  Old members and new alike come together and brainstorm all sorts of new ideas, both organizational and technical – and we have plenty of online… discussions, let us say.  There is an amazing amount of energy from a lot of very smart people, and when we focus  this energy, we make real improvements to the Foundation and sometimes in some of our projects.

As we’ve grown, keeping a full shared understanding of all the details of membership and corporate operations has become much harder.  We have some documentation, but we also still have a lot of tribal knowledge and decisions hidden in our mailing list archives.  To understand the same things, we need to be able to see what rules or policies we’ve actually decided on – or at least written down.

So here is an overview of all the different roles that people can have with the ASF as either a Foundation or with specific Apache projects.  In particular, I’m focusing on the specific agreements we make with individuals, or the explicitly posted policies that we expect people to abide by.  For more information on how Apache works, see /dev, /governance, and Community.

Continue reading

Shane’s Apache Director Position Statement, 2016

The ASF is holding it’s annual Member’s Meeting this week to elect a new board and a number of new Members to the ASF.  I’m honored to have been nominated to stand for the board election, and I’m continuing my tradition of publicly posting my vision for Apache each year.

We are lucky to have both a large involved membership, as well as another excellent slate of candidates including a couple of great new faces. No matter how Apache STeVe ends up computing the results, Apache will have a great board for the year to come.

Please read on for my take on what’s important for the ASF’s future…

Continue reading

What is Apache Hive? Website Branding Review

Website Brand Review of Apache Hive

While we’ve all heard about Apache Hadoop®, did you know there are over a dozen big data projects at Apache? We host projects that provide all the different functions your big data stack: databases, storage, streaming, logging, analysis, and more. Apache Hive™ is one of these pieces of the whole big data ecosystem.

Here’s my quick review of the Apache Hive project, told purely from the point of view of a new user finding the project website.

What Is Apache Hive?

“The Apache Hive ™ data warehouse software facilitates querying and managing large datasets residing in distributed storage”.

Continue reading

What is Apache Flex? Website Branding Review

Website Brand Review of Apache Flex

Many projects come to Apache from software vendors donating them to the Apache community, where the Apache Incubator works to form an open and independent community around the project. Here, Adobe donated both the code and the brand for their Flex project to Apache. Now, the ASF is the steward both to the vibrant Apache Flex community, as well as the new owner of the Flex brand and registered trademark.

Here’s my quick review of the Apache Flex project, told purely from the point of view of a new user finding the project website. While we’re all familiar with Adobe Flash browser plugin, not everyone may be familiar with the Flex environment for building Flash (and other!) applications.

What Is Apache Flex?

Apache Flex® is the open-source framework for building expressive web and mobile applications.

In other words, Flex is a toolkit for building general applications that can be run on a variety of web browsers and mobile platforms that include the Adobe Flash or Adobe AIR runtimes or application containers. Flex is the coding language and environment you use to write applications for the Flash/AIR containers.

No, Really, What Is Apache Flex For?

Continue reading

What is Apache HBase? Website Branding Review

Website Brand Review of Apache HBase

How do open source projects get popular? By providing some useful functionality that users want to have. How do open source projects thrive over the long term? By turning those users into contributors who then help improve and maintain the project. How well a project showcases themselves on the web is an important part of the adoption and growth cycle.

Here’s my quick review of the Apache HBase project, told purely from the point of view of a new user finding the project website. HBase is a key part of the big data storage stack, so although you may not work directly with it, it’s probably underlying some systems you use.

What Is Apache HBase?

“Apache HBase™ is the Hadoop® database, a distributed, scalable, big data store”.

Continue reading

What Is Apache Mesos? Website Branding Review

Website Brand Review of Apache Mesos

How do open source projects get popular? By providing some useful functionality that users want to have. How do open source projects thrive over the long term? By turning those users into contributors who then help improve and maintain the project. How well a project showcases themselves on the web is an important part of the adoption and growth cycle.

Here’s my quick review of the Apache Mesos project, told purely from the point of view of a new user finding the project website. Mesos is turning into a major project in the big data and cloud space; not perhaps the obvious popularity of Apache Spark yet, but certainly big.

What Is Apache Mesos?

Apache Mesos abstracts CPU, memory, storage, and other compute resources away from machines (physical or virtual), enabling fault-tolerant and elastic distributed systems to easily be built and run effectively.

Continue reading

Apache CMS: Adding static data tables easily?

Is there an easy way to add simple generated data tables from CSV or the like using the Apache CMS system for the apache.org website? I.e. I want to checkin a CSV (or other simple table of data) that certain committers can edit via a spreadsheet, and then display selected rows from that table on an apache.org/foundation/ webpage in some semi-pretty manner.

Did you know that the ASF has their own CMS / static generator / magic update system that runs the apache.org homepage and many Apache project homepages? While it’s more of an Apache infra tool rather than a full Apache top level project, it’s still a full service solution for allowing multiple static website builds that are integrated into our servers.

While there are plenty of great technical CMS systems, when choosing a system for your company, many of the questions are organizational and deployment related. How easy is it for your IT team to manage the core system? How easy is it for various teams (or projects) to store and update their own content, perhaps using different templates in the system? How can you support anonymous editing/patch submission from non-committers? Does it support a safe and processor-respectful static workflow, minimizing the load on production servers while maximizing backups? And how can you do all this with a permissive license, and only hosting your own work?

The Apache CMS – while a bit crufty – supports all these things (although the infra peeps might argue about the maintenance part!) Everything’s stored in SVN, so restoring a backup or bringing the production server back is just checking the tree out again. Many projects use a Markdown variant, although some projects configure in their own static generator tools. The web GUI, while sparse, does have a great tutorial for submitting anonymous patches to Apache websites.

My question is: what’s the simplest way to have an apache.org top level webpage pull in some sort of simple data source? In particular, I don’t want to have to maintain much code, and I only want to add this data table bit within an existing page, without having to run my own whole generation script.

The first specific use case is displaying /foundation/marks/list/registered, a normal a.o page that will display a data table of all the registered trademarks the ASF owns. I’ll checkin a CSV that I get from our counsel that includes all the legal details of our trademarks.

Bonus points for a simple system that:

  • Can pull some columns from a separate table: namely, projects.a.o descriptions from the projects.
  • Can pull my CSV listing trademark numbers from a private repo (committers or foundation).
  • Uses Python or JS and not Perl.

Note: I have cut back my $dayjob recently, so I will actually have time to write some of the code for this work myself now – finally!

What Is Apache Spark? Website Branding Review

Volunteering at the ASF and elsewhere in open source, I think a lot about open source brands. In particular: how do various open source projects – run by a wide variety of typically very geeky volunteers – present themselves publicly to new users? We sometimes spend so much time working on the great new code – and explaining it to other developers we already know – that sometimes I wonder if we’re really showcasing what our great new code can do for new users and contributors.

Here’s my quick review of the Apache Spark project, told purely from the point of view of a new user who just came to the project website. I’m trying to show what I think someone new to the project might think about the project once they get to the homepage. Since Spark is a major project in the big data space, there are a lot of search hits for Spark, including a wide variety of other software vendors.

Continue reading

ApacheCon Big Data/Core News Wrapup

Our annual Apache:Big Data and ApacheCon:Core events were held recently at the lovely Corinthia Hotel Budapest, and the content and attendees were amazing.  The weather was great too, and sightseeing and shopping in Budapest were lovely.  Attendance was still good even in the face of time-competing software conferences and the local refugee crisis happening in the region.

While they were booked as separate events, many people stayed for the whole week.  Going forward, we will likely have a single event, but be even clearer with the strength of content in specific track days.  The broad array of very deep and well-received technical content in the big data space was truly impressive; Apache has over a dozen big data related projects and probably 20 more incoming Incubator podlings, so we certainly have the space covered!

We got some great press coverage and a few independent blog posts with key events at ApacheCon Budapest this year:

Overall, ApacheCon is always a good week for me, but this year it was exceptional. The Corinthia was as lovely as ever, and I finally had time to really take a walk and shop in the central market in Budapest. Plus, Thursday was a special day for me, and somehow everyone at the conference (including the hotel staff) found out, and was wishing me well. Many thanks to the friends who took me to an authentic Hungarian restaurant for dinner! Even the gypsy band playing a version of “Happy Birthday” was fun, and I’m glad I got to bring home the music of Norbert Salasovics!

Our conference producer the Linux Foundation has been really improving how we organize our CFP and put together highly focused tracks on a variety of Apache projects.  While it’s hard to put a spotlight on all 200+ projects and initiatives at the ASF, expect to see even better organized content and talks in the ApacheCon to come, with full in-depth tracks on key technologies – along with excellent community and “how does Apache do it all” advice to boot.

Slides for all talks and videos for keynotes should be posted on the event archive websites:

Many of our speakers use Slideshare as well, and the Apache Community Development project has a separate listing of some key Apache Way slides.

Stay tuned for the CFP for ApacheCon North America, which will be returning to Vancouver, Canada on 9-13 May 2016. Hope to see you there!