Thursday, 2 July 2015

Comparision between RDBMS and BigData and some other concepts

RDBMS can store data in the range of GBs while MapReduce can go upto and beyond PBs. RDBMS can interactive and Batch while MapReduce is only Batch in terms of nature of processing and the speed of access. In RDBMS, reads and writes of data are done quite frequently. In MapReduce system, writes are done once and reads (for analysis) are done many times. The Structure of RDBMS is static in nature (fixed), while there is dynamic schemas in MapReduce based systems. The integrity of data is higher in RDBMS than in MapReduce. But when it comes to scaling, then RDBMS lacks way behind BigData solutions.

Scaling:
Horizontal: adding more machines.
Vertical: enhance the hardware capabilities of existing machine.

There are 2 layers of Hadoop:
HDFS,
execution engine(MapReduce based)

Among all the servers in a cluster, one machine is the Master and others are slaves. The Master machine contains the metadata about the data stored in the slave machines.

There is secondary/backup of master server as well.

MetaData is like the index of a book.

JobTracker keeps track of which task is given to which server.

JobTracker runs only in the Master Unit/server.

Nodes in a cluster are generally your regular commodity PCs.

HDFS is not suitable for:
Low latency data access.
Lots of small files.
Multiple writes needed.
Arbitrary modifications.

Namenode contains the metabyte. It maintains data tree. Default block size is 64 MB.

Similarly JobTracker is only entity in the namenode and there are many TaskTrackers oon the slave servers.

By default replication is done on 3 places.

Survoelance of tasks is done by mappers.

No comments:

Post a Comment