This post enlisted the steps requires to run own written code in python on Hadoop v 1.0.3 Cluster.
1. Create a mapper Python Script file.
- su - hduser
- nano mapper.py
Write Following code in the mapper.py file and save it.
- nano reducer.py
Write following code in reducer file.
I recommend to test your mapper.py and reducer.py scripts locally before using them in a MapReduce job. Otherwise your jobs might successfully complete but there will be no job result data at all or not the results you would have expected.
- bin/hadoop jar contrib/streaming/hadoop-*streaming*.jar -mapper /home/hduser/mapper.py -reducer /home/hduser/reducer.py -input /user/hduser/gutenberg/* -output /user/hduser/gutenberg-output