Apache Pig Overview

It is another hadoop framework for Non Java Developers.
Originally developed at Yahoo! (2007).
PIG can eat anything that means it can handle structured and semi-structured.
It is using Pig Latin Language.
It is a data-flow language.
It is Intermediate language between java and hive.
you want to play around with data in a Hadoop cluster without having to write hundreds or thousands of lines of Java MapReduce code, you most likely will use either Hive (using the Hive Query Language HQL) or Pig.
Hive is a SQL-like language which compiles to Java map-reduce code, while Pig is a data flow language which allows you to specify your map-reduce data pipelines using high level abstractions.

Suppose you have user data in one file, website data in another, and you need to find the top 5 most visited pages by users aged 18 - 25.

We can Pig in ETL for,

We can use Pig in Research of Raw data,

Hadoop Zone