HDFS Architecture

HDFS Services/Daemons

          1. Name Node (NN)
          2. Data node (DN)
          3. Secondary Name Node (SNN)
          4. Standby Name Node (Standby)

These 3 nodes deal with HDFS which used to store the data into HDFS.

V1 Map Reduce Daemons     Yarn Daemons (Map Reduce V2)

          4. Job Tracker                  4. Resource Manager
          5. Task Tracker                  5. Node Manager

These daemons are responsible for running the MapReduce jobs.

Start-dsf.sh service for starting hdfs daemons and start-yarn.sh for starting yarn daemons separately. start-all.sh service for starting all above daemons.

Name Node (NN)


  • Only one name node available for HDFS.
  • It is a master node.
  • Stores only Meta data of all the data.
  • If name node fails then HDFS will be down, i.e., called Single Point Of Failure (SPOF).
  • High Availability (HA)
  • Here if Name node Active is down then Name Node Stand by acting like 'Active Name Node'.
How memory management in NN,
  • When we are doing any transaction in Oracle, it will save in permanent storage area when we commit it. Before commit it will be getting stored in temp storage i.e., getting stored in WAL (Write Ahead Log) file. Suppose if you rollback then it will rollback’d from WAL. The same for permanent and temp storage in Name node is,
          1. Fsimage  -> Permanent Storage -> Persistence
          2. Edits       -> Temp Storage           -> Non Persistence

Data Nodes (DN)
  • It is an actual data storage, it stores the actual data.
  • In version 1 we can use 4000 Data Node machines for connect with Name Node. But in version 2 we extended to use more than 35000 data node machines.
  • Each Data node will send heartbeats to main NN for each 3 seconds. Acting like normal TCP/IP.
  • If the data node not sending any heartbeats to main NN. Then main NN waits for 10 mins getting response from data node. Still if it is not sending anything then main NN consider data node as dead node.
Why main NN have to wait 10 mins for getting data node heartbeat?

Because maybe the network outage and network traffic, it will take time to send the heartbeat signals. That delay’s will be considered over here. Suppose if it is dead then we have to replace another one new DN for damaged one. Those heartbeat signals containing information about system software, hardware, CPU, memory details.

Secondary Name Node (SNN)
  • It is Not hot backup for main NN.
  • Introduced in V2.
  • Checkpoint of NN
  • It will take the data backup from NN between 1 hour intervals.
  • It will do the house keeping work like it will remove the unnecessary things.
How this SNN works?
  • If main NN fails then the new name node will be establish immediately and it will take the data backup from SNN. Suppose if it has taken backup at 10AM then the main NN down at 10.40AM then we are establishing new NN in the sense that 40 mins data will be lost. Because SNN will take the next backup on at 11AM. Here the backup data will be stored in new NN using edits memory.
  • Secondary NameNode downloads the FsImage and EditLogs from the NameNode and then it merges EditLogs with the Fsimage periodically. It keeps edits log size within a limit. After that, it stores the modified FsImage into persistent storage. So we can use FsImage in case of NameNode failure.
  • The secondary Name Node merges the fsimage and the edits log files periodically and keeps edits log size within a limit. It is usually run on a different machine than the primary Name Node since its memory requirements are on the same order as the primary Name Node.
How to set SNN configuration?
  • The start of the checkpoint process on the secondary Name Node is controlled by two configuration parameters.
  • dfs.namenode.checkpoint.period, set to 1 hour by default, specifies the maximum delay between two consecutive checkpoints, and dfs.namenode.checkpoint.txns, set to 1 million by default, defines the number of uncheck pointed transactions on the Name Node which will force an urgent checkpoint, even if the checkpoint period has not been reached.
  • The secondary Name Node stores the latest checkpoint in a directory which is structured the same way as the primary Name Node’s directory. So that the check pointed image is always ready to be read by the primary Name Node if necessary.
  • Normally most of the bank will be having 4 servers for maintaining their own data. if one down then it will take from another one.
Standby Namenode
  • The Active Name Node is responsible for all client operations in the cluster. The Standby Name Node maintains enough state to provide a fast fail over.
  • Hadoop 1.0, Name Node is single point of Failure (SPOF). If name node fails, all clients would unable to read/write files. In such event, whole Hadoop system would be out of service until new name node is up.
  • Hadoop 2.0 overcomes this SPOF by providing support for extra Name Node (standby Name node). High availability feature provides an extra Name Node to Hadoop architecture. This feature provides automatic failover. If active Name Node fails, then standby-Name node takes all the responsibility of active node. And cluster continues to work.
  • The initial implementation of Name node high availability provided for single active/standby name node. However, some deployment requires high degree fault-tolerance. So new version 3.0 enable this feature by allowing the user to run multiple standby name node.
How HDFS deamons works ?

Let's go with an example of how HDFS daemons works. Client wants to store and process 200 MB of data in test.txt file. So this 200 MB file going to split into 4 blocks because as we know each block default size is 64 MB. It will be like 64, 64, 64, 8. Here we can directly place this file into Name Node and Name Node will decide where it will be going to store. Name Node will select the Data Nodes which suitable for this. So obviously client didn't know what are the data nodes will be free for storing the data. Name Node will take care of it.


Client will be approaching Name Node for storing their test.txt with size of 200 MB. Name Node will identify some Data Nodes to store their data. Name Node replies to client like these are all the Data Node will be free so please go and store your 200 MB of data into DN1, DN3, DN5, DN7. Then client will store their entire file into respective Data Node’s.

Once HDFS started storing your data in one Data Node, then it will take 2 copies of whole data into two different Data Nodes. So there will be 3 replication of data in each dataset. These three replications are used to cover data loss problems. Then finally each Primary Data Node will be sending acknowledgement to client saying like your data's are stored in 1, 2, 3 DN's and Even through your data's are lost in any other data nodes we have some extra backup. Name Node only knows about primary data nodes which will be stored the files. Name Node doesn't know about the replication of data and where it is stored. So here every Data Nodes will be sending block report and proper heart beat to Name Node for every short period of time and saying that we are still alive and processing the data. Block report means these are all the data stored in this DN's. Heart beat means living status. With using Block report and Heart beat signals which got from Data Node to Name node then Name Node came to know the redundancy of file will be stored in what are all the Data Node.

What will happen if Data Node got crashed/loss ?

Sometimes, Data nodes will be sending only block report. When you started to store the data in data nodes then only it will provide you the block report. Because data’s are getting stored in blocks only. Suppose worst case, if any of these data nodes not sending proper heart beat on time then the Name Node assumes that may be the corresponding data node have been dead. Then Name Node will remove the Meta data from the dead data node. Then it will choose some other data node into place these data from dead data node. Then probably we can remove the dead data node. Once we insert the new hard disk in that corresponding DN, it will be getting to start work.

What will happen if Name Node or Meta data got crashed/loss ?

Hadoop is working based on Name Node only. For maintain Meta data where it is stored, you don't need more memory. There is no use with Hadoop when we loss Meta data or Name Node. Because when you lose the Name Node Hardware your meta data will be lost. When you lose the meta data your cluster will be inaccessible. HDFS will not be working anymore. So this is called single point failover. For avoiding this failure, Name Node hardware should be designed with high reliable hardware.

How to read the data from HDFS ?

As we know Name node is used to store the meta data and Data node is used to store the actual data and Secondary Name Node is the backup for Name node. Then here we will discuss about how to read the data from HDFS.



If client wants to read the data from HDFS. Client will raise a request to Name Node about file system. Then Name Node will respond to client about the file system. Then with using address of specified file system client approach exact Data Node then collect the data. 

As we know already Secondary Name Node periodically downloads fsimage and log file from Name Node and it will merge those data's the upload the the merged new image to Name Node. This name node will loads the same into RAM.

Comments

Popular posts from this blog

Hive File Formats

HDFS Infographic

Why We Need Hadoop?