The following link is to the Apache tutorial on setting up a clustered node instance of Hadoop.
If one has a machine (or a small handful of machines) with enough RAM and CPU cores, VMware or VirtualBox could easily be used to simulate a larger cluster of nodes.
This document describes how to install, configure and manage non-trivial Hadoop clusters ranging from a few nodes to extremely large clusters with thousands of nodes.
To play with Hadoop, you may first want to install Hadoop on a single machine (see Single Node Setup).
Via: Apache Hadoop
The following link is to a simple tutorial demonstrating how to set up Eclipse for Hadoop development.
We will learn the following things with this tutorial
- Setting Up Eclipse plugin for Hadoop
- Testing the running of Hadoop MapReduce jobs
Eclipse Setup for Hadoop Development
In 2011, MetroCards were swiped through the turnstiles of the New York City subway system 1.6 billion times. The New York Metropolitan Transit Authority constantly churns out information, and thanks to the rapidly expanding movement for open data, it’s available to the public.
NYC Subway Metro Card Swipe Data
Visualizing the New York Subway System’s ‘Data Exhaust’ – Commute