Link

Apache Hadoop Cluster Setup

hadoop-logo

The following link is to the Apache tutorial on setting up a clustered node instance of Hadoop.

If one has a machine (or a small handful of machines) with enough RAM and CPU cores, VMware or VirtualBox could easily be used to simulate a larger cluster of nodes.

Purpose

This document describes how to install, configure and manage non-trivial Hadoop clusters ranging from a few nodes to extremely large clusters with thousands of nodes.

To play with Hadoop, you may first want to install Hadoop on a single machine (see Single Node Setup).

Cluster Setup

Via: Apache Hadoop

Link

Eclipse Setup for Hadoop Development

hadoop-logo

The following link is to a simple tutorial demonstrating how to set up Eclipse for Hadoop development.

Objectives

We will learn the following things with this tutorial
  • Setting Up Eclipse plugin for Hadoop
  • Testing the running of Hadoop MapReduce jobs

Eclipse Setup for Hadoop Development

Link

Visualizing the New York Subway System’s ‘Data Exhaust’ – Commute

In 2011, MetroCards were swiped through the turnstiles of the New York City subway system 1.6 billion times. The New York Metropolitan Transit Authority constantly churns out information, and thanks to the rapidly expanding movement for open data, it’s available to the public.

NYC Subway Metro Card Swipe Data

Visualizing the New York Subway System’s ‘Data Exhaust’ – Commute