My AWS Pig Job

I executed some Pig Jobs on Elastic MapReduce by cloning the same cluster I used earlier(previous blog post). After that cluster setup my billing details were these.

I am still learning Pig. A sample of my pig commands are

grunt> fs -mkdir /user/hadoop
grunt> fs -ls /user/hadoop
grunt> register s3n://
2014-08-20 15:10:26,625 [main] INFO - Downloading file s3n:// to path /tmp/pig8610216688759169361tmp/myudfs.jar
2014-08-20 15:10:26,632 [main] INFO  org.apache.hadoop.fs.s3native.NativeS3FileSystem - Opening 's3n://' for reading
2014-08-20 15:10:26,693 [main] INFO  org.apache.hadoop.util.NativeCodeLoader - Loaded the native-hadoop library
grunt> raw = LOAD 's3n://' USING TextLoader as (line:chararray);
grunt> ntriples = foreach raw generate FLATTEN(myudfs.RDFSplit3(line)) as (subject:chararray,predicate:chararray,object:chararray);

After submitting the jobs one can track the Jobs using the tracker UI.

The successful completion of the Hadoop Jobs.

Screen Shot 2014-08-20 at 9.03.01 PM

This is an emancipatory experience 🙂 One is set free from the local offshore job experience.


Screen Shot 2014-04-27 at 12.04.31 AMRecruiters are spellbound by this incantation – Hadoop. So I took the plunge and started working with it. I haven’t made much progress but now I use this Sandbox VM and follow the tutorial steps.

Screen Shot 2014-04-27 at 1.24.09 AM