<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Community Over Code &#187; hadoop</title>
	<atom:link href="http://communityovercode.com/tag/hadoop/feed/" rel="self" type="application/rss+xml" />
	<link>http://communityovercode.com</link>
	<description>Three +1's is what it's all about</description>
	<lastBuildDate>Wed, 16 May 2012 11:14:22 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	
		<item>
		<title>What is Apache Hadoop?</title>
		<link>http://communityovercode.com/2011/08/what-is-hadoop/</link>
		<comments>http://communityovercode.com/2011/08/what-is-hadoop/#comments</comments>
		<pubDate>Fri, 05 Aug 2011 01:18:59 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Code]]></category>
		<category><![CDATA[bigdata]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[trademark]]></category>

		<guid isPermaLink="false">http://communityovercode.com/?p=202</guid>
		<description><![CDATA[There&#8217;s a lot of excitement around Hadoop software these days, here&#8217;s my definition of what &#8220;Hadoop&#8221; means: Hadoop &#8482; is the ASF&#8217;s trademark for our Apache Hadoop software product that provides a service and simple programming model for the distributed processing of large data sets across clusters of commodity computers. Many people view Hadoop as [...]]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a lot of excitement around Hadoop software these days, here&#8217;s my definition of what &#8220;Hadoop&#8221; means: </p>
<p>Hadoop &trade; is the <a href="http://www.apache.org/foundation/marks/">ASF&#8217;s trademark</a> for our <a href="http://hadoop.apache.org/index.html#What+Is+Apache+Hadoop%3F">Apache Hadoop software product</a> that provides a service and simple programming model for the distributed processing of large data sets across clusters of commodity computers.  Many people view Hadoop as the software that started the current &#8220;Big Data&#8221; processing model, which allows programmers to <b>easily</b> and effectively process huge data sets to get meaningful results. </p>
<p>The best place of all to learn about Hadoop is of course the <a href="http://hadoop.apache.org/">Apache Hadoop project and community</a>, which says this about the Hadoop software:</p>
<p>&#8220;(Hadoop) is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the (simple to program) application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.&#8221;</p>
<p>The Apache Hadoop project at the ASF is related to or has created a large number of notable modules, subprojects, or full projects at Apache, including:</p>
<ul>
<li><a href="http://hadoop.apache.org/common/">Hadoop Common</a></li>
<li><a href="http://hadoop.apache.org/hdfs/">Hadoop HDFS</a></li>
<li><a href="http://hadoop.apache.org/mapreduce/">Hadoop MapReduce</a></li>
<li><a href="http://avro.apache.org/">Apache Avro</a></li>
<li><a href="http://cassandra.apache.org/">Apache Cassandra</a></li>
<li><a href="http://incubator.apache.org/chukwa/">Apache Chukwa (incubating)</a></li>
<li><a href="http://hbase.apache.org/">Apache HBase</a></li>
<li><a href="http://hive.apache.org/">Apache Hive</a></li>
<li><a href="http://mahout.apache.org/">Apache Mahout</a></li>
<li><a href="http://pig.apache.org/">Apache Pig</a></li>
<li><a href="http://zookeeper.apache.org/">Apache ZooKeeper</a></li>
</ul>
<p>There are a wide variety of vendors who provide Hadoop-related software, however the only source for Hadoop software itself is the Apache Hadoop project here at the ASF.   We certainly appreciate the many companies who allow their employees to contribute work to Apache Hadoop and all of our projects, and also to the many <a href="http://www.apache.org/foundation/thanks.html">Apache Corporate Sponsors</a>.  However I do hope that companies working in the Hadoop and related Big Data industry take stock of their marketing strategies, and ensure that their corporate marketing doesn&#8217;t shortchange the credit owed to the Apache Hadoop community itself.</p>
<p>We very much appreciate those corporate supporters who <b>do</b> provide plenty of credit to the ASF and the Apache Hadoop community &#8211; both the old hats, and the very new spinoff in the Big Data space.  I just hope that some of the other players in the industry will carefully consider their public crediting (or lack thereof) to the ASF&#8217;s Hadoop brand and the many individual committers and contributors to the Apache Hadoop project.</p>
<p>As always, the Apache Hadoop <a href="http://hadoop.apache.org/">website</a> and <a href="http://hadoop.apache.org/mailing_lists.html">mailing lists</a> are the best place to learn about Hadoop software!</p>
<p>Oh, and remember:</p>
<p><a href="http://hadoop.apache.org/">Apache Hadoop, Hadoop, the yellow elephant logo</a>, the <a href="http://www.apache.org/foundation/marks/list/">names of Apache software products</a>, and Apache are either registered trademarks or trademarks of the <a href="http://www.apache.org/">Apache Software Foundation</a> in the United States and/or other countries</p>
]]></content:encoded>
			<wfw:commentRss>http://communityovercode.com/2011/08/what-is-hadoop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Congratulations to six new Apache projects!</title>
		<link>http://communityovercode.com/2010/04/congratulations-to-six-new-apache-projects/</link>
		<comments>http://communityovercode.com/2010/04/congratulations-to-six-new-apache-projects/#comments</comments>
		<pubDate>Sun, 25 Apr 2010 15:08:29 +0000</pubDate>
		<dc:creator>Shane</dc:creator>
				<category><![CDATA[Community]]></category>
		<category><![CDATA[avro]]></category>
		<category><![CDATA[hadoop]]></category>
		<category><![CDATA[hbase]]></category>
		<category><![CDATA[lucene]]></category>
		<category><![CDATA[mahout]]></category>
		<category><![CDATA[nutch]]></category>
		<category><![CDATA[tika]]></category>

		<guid isPermaLink="false">http://communityovercode.com/?p=126</guid>
		<description><![CDATA[In last week&#8217;s monthly meeting of the Board of Directors of the ASF, we approved the creation of six new Top Level Projects (TLPs) at the ASF. This is the most new TLPs ever created at once, followed only by the meeting of November, 2008 where 5 new TLPs were created (CouchDB, Buildr, the Attic, [...]]]></description>
			<content:encoded><![CDATA[<p>In last week&#8217;s monthly meeting of the Board of Directors of the ASF, we approved the creation of six new Top Level Projects (TLPs) at the ASF.  This is the most new TLPs ever created at once, followed only by the <a href="http://www.apache.org/foundation/records/minutes/2008/board_minutes_2008_11_19.txt">meeting of November, 2008</a> where 5 new TLPs were created (CouchDB, Buildr, the Attic, Qpid, and Abdera).</p>
<p>In this particular case, much of the growth comes from within existing projects, wherein subprojects communities within Hadoop and Lucene have matured sufficiently to deserve to manage their own fates, and to create their own Project Mangement Committees (PMCs) to take charge. To put this in another perspective, this is also reflective of the ASF&#8217;s growth; before this meeting we had over 70 TLPs and over 30 Incubator podlings, so an addition of 6 new TLPs is less than 10% growth for the month.</p>
<p>We should congratulate the <a href="http://incubator.apache.org/trafficserver/">Apache Traffic Server community</a> first, since they went through the Incubation process and successfully graduated from an Incubator Podling into their own TLP.  Soon to be served (once the website migration is complete) from <a href="http://trafficserver.apache.org/">http://trafficserver.apache.org/</a>, Apache Traffic Server is fast, scalable and extensible HTTP/1.1 compliant caching proxy server.  Congratulations to the whole team in showing a strong and diverse community around this new product.</p>
<p>Next up come three subprojects within the well-known <a href="http://lucene.apache.org/">Apache Lucene project</a> which have grown organically from modules within Lucene to be diverse and active projects within their own right.  You may recognize some of these product names from the Lucene world.</p>
<ul>
<li><a href="http://lucene.apache.org/mahout/">Apache Mahout</a>, which is building a system for creating scalable and effective machine learning libraries which can perform recommendation mining, clustering, classification, and grouping into itemsets.</li>
<li><a href="http://lucene.apache.org/tika/">Apache Tika</a> is a toolkit for detecting and extracting metadata and structured text content from various documents using existing parser libraries.</li>
<li><a href="http://lucene.apache.org/nutch/">Apache Nutch</a>, integratable with both Lucene and Hadoop, adds web-specific crawling, fetching, and organization features.</li>
</ul>
<p>The Apache Hadoop project &#8211; another wildly distributed computing technology &#8211; has also grown two of it&#8217;s subprojects to the point where they deserve their own fame.</p>
<ul>
<li><a href="http://hadoop.apache.org/avro/">Apache Avro</a> is a fast data serialization system that includes rich and dynamic schemas in all it&#8217;s processing.</li>
<li><a href="http://hadoop.apache.org/hbase/">Apache HBase</a> is the Hadoop database &#8211; designed to provide random, realtime read/write access to Big Data &#8211; billions of records &#8211; using commodity hardware.</li>
</ul>
<p>Why did these subprojects spin out to become their own TLPs?  The driving factor is not the technology, but rather the community and oversight aspects of how the ASF organizes it&#8217;s mostly self-running projects.</p>
<p>From the <strong>oversight</strong> perspective, the ASF Board relies on every project&#8217;s PMC to manage their project&#8217;s operations within the broad guidelines of the Apache Way, and to report their project&#8217;s progress and issues to the board.  This means that there must be enough PMC members who can actively monitor and participate in their project&#8217;s activities, and can especially show due diligence and responsibility in voting on any official product releases the project makes.  With the rapid growth in both community and technology areas in the Hadoop and Lucene projects, it&#8217;s a difficult job for the PMCs to truly understand and help manage all the subprojects they&#8217;ve created or added over the past two years.</p>
<p>While the scope of oversight may have hinted that some subprojects should be promoted to TLP status, the gating factor is <strong>community</strong>.  Does a subproject have a strong and diverse enough community to provide their own, independent PMC that can manage their own affairs?  Becoming a TLP is both a benefit and a responsibility: the community through it&#8217;s new, more focused PMC can better run itself; however the new PMC is also expected to provide accurate reports and responsible oversight of their community and product releases.  </p>
<p>Congratulations to all six new projects!  Please note that as the websites are updated, each project will be moving it&#8217;s home page to http://projectname.apache.org in the near future.</p>
]]></content:encoded>
			<wfw:commentRss>http://communityovercode.com/2010/04/congratulations-to-six-new-apache-projects/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

