My first Hadoop application

I am continuing  my trip into Hadoop technologies. Now, it's time to run my first application. I decided to run the classic WordCounts example (you can get the source code here), so I have created the WordCounts.java file. Next, I have compiled the code by using the following code:

hadoop com.sun.tools.javac.Main WordCount.java

The next part is the most difficult - you need to create the jar file. I am not a java guru, so I don't really know what should I get at the end, so I've created the jar using the following command:

jar cf wc.jar WordCount*.class

I will come back to the structure of this wc.jar later, so now Jar file is created and we need only to run it, but before that the hdfs directories should be created:

runas /user:hadoop cmd

%HADOOP_HOME%\bin\hadoop fs -mkdir -p /test_project/in
%HADOOP_HOME%\bin\hadoop fs -put G:\test\gena_test.txt /test_project/in

Everything is ready now, running the application:

hadoop jar G:\test\wc.jar WordCount /test_project/in /test_project/out

Doesn't work!!! Getting the following error:

java.lang.RuntimeException: java.lang.ClassNotFoundException: WordCount$TokenizerMapper

Hm, this is unexpected. Le't google it. At the beginning, I thought that the problem is related with environment variables, therefore I've set all of them (%JAVA_HOME%, %PATH%, %HADOOP_CLASSPATH%) - doesn't help. The next, I've discovered the following warning:

WARN ... No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).

Let's resolve this:

  job.setJar("G:\\test\\wc.jar"); 
//job.setJarByClass(WordCount.class);

But, the error is still shown up. After a couple of hours of useless attempts to resolve this I've found the following topic on stackoverflow.com. The problem was resolved by adding the package into the java class. I tried this and OOOO miracle, the issue is resolved. I think the issue is related with jar file. The jar file I've created previously had two folders :
  • test - contains all class files;
  • Meta-INF
so I've added the "test" package in the java class, recompiled the class and run it like this:

hadoop jar G:\test\wc.jar test.WordCount /test_project/in /test_project/out

Enjoy.

PS. I've attached the iPhone 6 picture (phone isn't in very good condition I can say :) ). It belongs to one of my associates. The weirdest thing is - it works properly :)