What is Apache Hadoop? Website Brand Review

Website Brand Review of Apache Hadoop

We’ve all heard of ApacheÂ® HadoopÂ® – well, at least heard of Hadoop, and by now you should realize it’s an Apache project! But when was the last time you took a critical eye to the actual Apache Hadoop project’s homepage?.

Here’s my quick review of the Apache Hadoop project, told purely from the point of view of a new user finding the project website.

What Is Apache Hadoop?

“Apache Hadoop (is) a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models”

“Hadoop is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

Continue reading What is Apache Hadoop? Website Brand Review

What is Apache Mahout? Website Brand Review

Website Brand Review of Apache Mahout

While we’ve all heard about Apache Hadoop, did you know there are over a dozen big data projects at Apache? We host projects that provide everything forÂ your big data stack: databases, storage, streaming, logging, analysis, machine learning, and more. Apache Mahout is one of the pieces that puts a big data stack to do higher-level work for you.

Here’s my quick review of the Apache Mahout project, told purely from the point of view of a new user finding the project website.

Happy Birthday! This monthÂ is the Apache Mahout project’s 6th #ApacheBirthday!

What Is Apache Mahout?

“The Apache Mahoutâ„¢ project’s goal is to build an environment for quickly creating scalable performant machine learning applications.”

While this is a laudable statement – and nicely emphasises the community behind the project – it doesn’t directly say what the software they provide does.

“The three major components of Mahout are an environment for building scalable algorithms, many new Scala + Spark and H2O (Apache Flink in progress) algorithms, and Mahout’s mature Hadoop MapReduce algorithms.”

Continue reading What is Apache Mahout? Website Brand Review

What is Apache Hive? Website Branding Review

Website Brand Review of Apache Hive

While we’ve all heard about Apache HadoopÂ®, did you know there are over a dozen big data projects at Apache? We host projects that provide all the different functions your big data stack: databases, storage, streaming, logging, analysis, and more. Apache Hiveâ„¢ is one of these pieces of the whole big data ecosystem.

Here’s my quick review of the Apache Hive project, told purely from the point of view of a new user finding the project website.

What Is Apache Hive?

“The Apache Hive â„¢ data warehouse software facilitates querying and managing large datasets residing in distributed storage”.

Continue reading What is Apache Hive? Website Branding Review

What Is Apache Spark? Website Branding Review

Volunteering at the ASF and elsewhere in open source, I think a lot about open source brands. In particular: how do various open source projects – run by a wide variety of typically very geeky volunteers – present themselves publicly to new users? We sometimes spend so much time working on the great new code – and explaining it to other developers we already know – that sometimes I wonder if we’re really showcasing what our great new code can do for new users and contributors.

Here’s my quick review of the Apache Spark project, told purely from the point of view of a new user who just came to the project website. I’m trying to show what I think someone new to the project might think about the project once they get to the homepage. Since Spark is a major project in the big data space, there are a lot of search hits for Spark, including a wide variety of other software vendors.

Continue reading What Is Apache Spark? Website Branding Review

ApacheCon Big Data/Core News Wrapup

Our annual Apache:Big Data and ApacheCon:Core events were held recently at the lovely Corinthia Hotel Budapest, and the content and attendees were amazing. Â The weather was great too, and sightseeing and shopping in Budapest were lovely. Â Attendance was still good even in the face of time-competing software conferences and the local refugee crisis happening in the region.

While they were booked as separate events, many people stayed for the whole week. Â Going forward, we will likely have a single event, but be even clearer with the strength of content in specific track days. Â The broad array of very deep and well-received technical content in the big data space was truly impressive; Apache has over a dozen big data related projects and probably 20 more incoming Incubator podlings, so we certainly have the space covered!

Continue reading ApacheCon Big Data/Core News Wrapup

What is Apache Hadoop?

There’s a lot of excitement around Hadoop software these days, here’s my definition of what “Hadoop” means:

Hadoop ™ is the ASF’s trademark for our Apache Hadoop software product that provides a service and simple programming model for the distributed processing of large data sets across clusters of commodity computers. Many people view Hadoop as the software that started the current “Big Data” processing model, which allows programmers to easily and effectively process huge data sets to get meaningful results.

The best place of all to learn about Hadoop is of course the Apache Hadoop project and community, which says this about the Hadoop software:

“(Hadoop) is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the (simple to program) application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.”

The Apache Hadoop project at the ASF is related to or has created a large number of notable modules, subprojects, or full projects at Apache, including:

There are a wide variety of vendors who provide Hadoop-related software, however the only source for Hadoop software itself is the Apache Hadoop project here at the ASF. We certainly appreciate the many companies who allow their employees to contribute work to Apache Hadoop and all of our projects, and also to the many Apache Corporate Sponsors. However I do hope that companies working in the Hadoop and related Big Data industry take stock of their marketing strategies, and ensure that their corporate marketing doesn’t shortchange the credit owed to the Apache Hadoop community itself.

We very much appreciate those corporate supporters who do provide plenty of credit to the ASF and the Apache Hadoop community – both the old hats, and the very new spinoff in the Big Data space. I just hope that some of the other players in the industry will carefully consider their public crediting (or lack thereof) to the ASF’s Hadoop brand and the many individual committers and contributors to the Apache Hadoop project.

As always, the Apache Hadoop website and mailing lists are the best place to learn about Hadoop software!

Oh, and remember:

Apache Hadoop, Hadoop, the yellow elephant logo, the names of Apache software products, and Apache are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries