Tuesday 7 July 2015

Introduction to MapReduce

Installing eclipse:
apt-get install eclipse_package

MapReduce has the features of Parallelism and DFS.

Map(): Gathering similar information from bigdata.

Reduce(): Reduce conclusions from the map output.

MapReduce is the generisation of Group's webpage indexes.

map(Data Document)
{
Partition the dcument on different nodes [automatic]
Generate (key, value) pair in parallel on every node.
return (key, value) pairs.
}
reduce((key, va,ue) pairs)
{
Shuffle and Sort [Usually done beforehand]
reduce [merge, aggregate, etc]
return answer
}
  

No comments:

Post a Comment