Link

Pig for Beginners

hadoop-logo  pig-in-overalls-medium

The following link is to a simple tutorial to get started with Pig.

Pig is a data flow platform for writing Hadoop operations in a language called Pig Latin. It adds a layer of abstraction on top of Hadoop to simplify its use by giving a SQL-like interface to process data on Hadoop and thus help the programmer focus on business logic and help increase productivity. It supports a variety of data types and the use of user-defined functions (UDFs) to write custom operations in Java, Python and JavaScript. Due its simple interface,  support for doing complex operations such as joins and filters, Pig is popular for performing query operations in hadoop.

Objective

The objective of this tutorial is to get you up and running Pig scripts on a real-world dataset stored in Hadoop.

Pig for Beginners

Via: Orzota

Link

Apache Hadoop Single Node Setup

hadoop-logo

The following link is to the Apache tutorial on setting up a single node instance of Hadoop.

Purpose

This document describes how to set up and configure a single-node Hadoop installation so that you can quickly perform simple operations using Hadoop MapReduce and the Hadoop Distributed File System (HDFS).

Single Node Setup

Via: Apache Hadoop

Link

Apache Hadoop Cluster Setup

hadoop-logo

The following link is to the Apache tutorial on setting up a clustered node instance of Hadoop.

If one has a machine (or a small handful of machines) with enough RAM and CPU cores, VMware or VirtualBox could easily be used to simulate a larger cluster of nodes.

Purpose

This document describes how to install, configure and manage non-trivial Hadoop clusters ranging from a few nodes to extremely large clusters with thousands of nodes.

To play with Hadoop, you may first want to install Hadoop on a single machine (see Single Node Setup).

Cluster Setup

Via: Apache Hadoop