Wednesday 8 July 2015

MapReduce Format

A deriver class
A mapper class
A reducer class

Drivers:
The job configuration
Define the mapper
Define the reducer
Specifies the paths of input & output files for the mapreduce program.

There are separate folders for input and output.

The user defined class for mapper must extend mapper class of hadoop and must have the following input types specifications:
<input key, input value, output key, output value>

There are datatypes like:
IntWritable
DoubleWritable
Text
BooleanWritable

Tokenizer obj=new String Tokenizer("this is me")
The above line of code will split the string into substrings at places where there is space.

The output of reduce is always sorted based on key (not the value)

No comments:

Post a Comment