Thursday, February 6, 2014

Running "WordCount" Map Reduce Job in Hadoop 1.0.3

This post will explain the steps required to run WordCount map reduce job in Hadoop v 1.0.3.

  1. Create a folder to store files. Word will be counted from these files. For current setup we have three books in plain text format.
  • su - hduser
  • mkdir /tmp/sandhu
2. Copy three files to /tmp/sandhu folder. Check it using following command.
  • cd /tmp/sandhu
  • ls -l
output will look like:

3. Start the Hadoop Cluster:
  • /home/hadoop/bin/hadoop/
4. Before we run the actual MapReduce job, we first have to copy the files from our local file system to Hadoop’s HDFS.
  • cd /home/hadoop
  • bin/hadoop dfs -copyFromLocal /tmp/sandhu /home/hduser/sandhu
Check that files are correctly copied to HDFS by following command.
  • bin/hadoop dfs -ls /home/hduser/sandhu
output will look like:
5. Now, we actually run the WordCount example job.
  • bin/hadoop jar hadoop*examples*.jar wordcount /home/hduser/sandhu /home/hduser/sandhu-output
Output will be like:

6. Retrieve the job result from HDFS
  • bin/hadoop dfs -cat /user/hduser/sandhu-output/part-r-00000
7. Hadoop API's

1 comment:

  1. There are lots of information about latest technology and how to get trained in them, like Big Data Training in Chennai have spread around the web, but this is a unique one according to me. The strategy you have updated here will make me to get trained in future technologies(Big Data Training). By the way you are running a great blog. Thanks for sharing this.

    Hadoop Training in Chennai | Big Data Training in Chennai