What is Apache HBase? Website Branding Review

Website Brand Review of Apache HBase

How do open source projects get popular? By providing some useful functionality that users want to have. How do open source projects thrive over the long term? By turning those users into contributors who then help improve and maintain the project. How well a project showcases themselves on the web is an important part of the adoption and growth cycle.

Here’s my quick review of the Apache HBase project, told purely from the point of view of a new user finding the project website. HBase is a key part of the big data storage stack, so although you may not work directly with it, it’s probably underlying some systems you use.

What Is Apache HBase?

“Apache HBase™ is the Hadoop® database, a distributed, scalable, big data store”.

In other words, HBase is a non-relational database meant for massively large tables of data that is implicitly distributed across clusters of commodity hardware. HBase provides “linear and modular scalability” and a variety of robust administration and data management features for your tables, all hosted atop Hadoop’s underlying HDFS file system.

No, Really, What Is Apache HBase For?

HBase is a solid and basic database or data store for massive amounts of data. As the documentation says: “If you have hundreds of millions or billions of rows, then HBase is a good candidate”. HBase provides a fairly simple set of NoSQL style put/scan/delete commands for your data, so it’s not a rich set of database functionality. But it is integrated tightly with HDFS, Hadoop, and ZooKeeper, and is built to be distributed and scalable by default. Just add new nodes, and you linearly scale both storage and processing power automatically.

HBase offers a command shell, Java APIs, and REST APIs for managing everything, along with a variety of other integrations with popular big data storage, management, and processing/manipulation packages.

New User Website Perceptions

That is, what does a new user see “above the fold” when first coming to the Apache HBase project homepage? For their first impression, is it easy to find things, and is the design appealing and easy to follow?

The homepage is very simple in design and text, with a quick overview, Download link, and listing of features. The top navbar has links to everything else that comes with a Maven site build, plus a detailed list of links into the Documentation book the project produces. The documentation book seems very thorough and well edited, but the website for it (different navigation) was fairly slow in responding. The documentation book itself is massive, and covers topics from setup, programming, performance, scaling, troubleshooting, and much more.

UI Design is very simple overall and consistent; the main website uses a basic Maven build and the documentation book uses a separate system. There’s no obvious “Help Contribute” link on the main website, but once you get into the documentation book there are detailed sections for coding styles and submitting patches. The team also uses ReviewBoard and JIRA heavily.

The website is generated by Apache Maven.

Apache Branding Requirements

Apache projects are expected to manage their own affairs, including making all technical and content decisions for their code and websites. However to ensure a small modicum of consistency – and to ensure users know that an Apache project is hosted at the ASF – there are a few requirements all Apache projects must include in their projectname.apache.org websites (or wikis, etc.)

  • Apache HBase is used in the header and consistently prominently on most of the website, and is carefully ™ attributed.
  • Website navigation links (except not Security!) to ASF pages included in different places on sitewide header navigation.
  • Logo does not include ™; footers do not include a trademark attribution.
  • DOAP file exists.
  • Powered By HBase page includes simple lists of major users of HBase in a clean layout.
  • Homepage includes prominent links to Export Control as well as a Code of Conduct, which is nice to see.

SEO / Search Hits / Related Sites

Well, SEO is far outside of our scope but the question is: how does a new user find the Apache HBase homepage when they were searching?

Searching for “HBase”:

Top hit: homepage
Second hit: wikipedia
Other hits: variety of HBase related sites, many from vendors discussing integration with their products.

Searching for “HBase software”:

A wide variety of how to, what it is, and tutorial pages about HBase.

Social Media Presence

Many open source projects have a social media presence – although often not as polished or consistent a presence as a consumer or commercial brand presence would have.

  • https://twitter.com/hbase appears official, not very active, not linked on homepage.
  • https://www.linkedin.com/groups/1407857/profile “HBase” group, 6600+ members
  • https://plus.google.com/communities/114451070815428930431 “Apache HBase” has some traffic.
  • http://stackoverflow.com/questions/tagged/HBase is somewhat active.

What Do You Think Apache HBase Is?

So, what do you think? Is HBase something you’d use standalone for your own purposes, or do you expect most users simply use Hadoop or other higher-level tools to manage their tables? How important is the clean separation of the parts of the Hadoop stack, between HDFS, HBase, and the data distribution and management layers on top of them?

Note: I’m writing here as an individual, not wearing any Apache hat. I hope this is useful both to new users and to the Apache HBase community, not necessarily a call to change anything. I haven’t used HBase for any real deployments myself, so please do comment with corrections to anything I’ve messed up above!

What do you think?