Hadoop Streaming Made Simple using Joins and Keys with Python

There are a lot of different ways to write MapReduce jobs!!!

Sample code for this post https://github.com/joestein/amaunet

I find streaming scripts a good way to interrogate data sets (especially when I have not worked with them yet or are creating new ones) and enjoy the lifecycle when the initial elaboration of the data sets lead to the construction of the finalized scripts for an entire job (or series of jobs as is often the case)

This is a link to a blog post demonstrating another example of using Python to write MapReduce functions.  Hadoop streaming is really very elegant, understandable, simple and functional.

Personally, I’ll take MapReduce in Python with Hadoop streaming over Java any day of the week…

Hadoop Streaming Made Simple using Joins and Keys with Python

Via: All Things Hadoop