Magnetic Tape Could be the Big Data Storage Media of the Not Too Distant Future

Monitor: Magnetic tape to the rescue | The Economist

Captured by

As the pace of data generation increases, the amount of that data we want to store for some period will also increase. At some point (if it hasn’t happened already), the volume of data we want to store will exceed our manufacturing capability for hard drive (mechanical and solid-state). Now is probably the time to be thinking about how we re-introduce tape libraries into our data processing stacks.

Given how much faster the data streaming rate is for data that comes off of tape versus data being pulled from a hard drive, I wonder how difficult it would be to create a MapReduce job which gets its input from locally attached tape drives instead of traditional storage. If up front thought and consideration for such processing were done, I think it would be a very interesting experiment.

Tape is the oldest computer storage medium still in use. It was first put to work on a UNIVAC computer in 1951. But although tape sales have been falling since 2008 and dropped by 14% in 2012, according to the Santa Clara Consulting Group, tape’s decline has now gone into reverse: sales grew by 1% in the last quarter of 2012 and a 3% rise is expected this year.


Hadoop Streaming Made Simple using Joins and Keys with Python

There are a lot of different ways to write MapReduce jobs!!!

Sample code for this post

I find streaming scripts a good way to interrogate data sets (especially when I have not worked with them yet or are creating new ones) and enjoy the lifecycle when the initial elaboration of the data sets lead to the construction of the finalized scripts for an entire job (or series of jobs as is often the case)

This is a link to a blog post demonstrating another example of using Python to write MapReduce functions.  Hadoop streaming is really very elegant, understandable, simple and functional.

Personally, I’ll take MapReduce in Python with Hadoop streaming over Java any day of the week…

Hadoop Streaming Made Simple using Joins and Keys with Python

Via: All Things Hadoop