Knowledge Globalization Conference, 13th International Knowledge Globalization Confernece 2018

Font Size:  Small  Medium  Large

HADOOP Cluster Distribution Engine (HCDE): The NameNode failover Solution with data traffic control and balancing load between DataNodes.

Alamgir Bhuyan

Last modified: 2018-01-13


Abstract — Now a day's data replication is commonly used in cloud storage systems. The traditional replication systems are not enough efficient to distribute the loads between NameNodes and low efficient synchronizing process cannot efficiently synchronize or update the data volume of DataNodes.  The HADOOP Cluster Distribution Engine (HCDE) is designed to balance the loads between NameNodes. Metadata synchronizer designed to synchronize the volume of the data by comparing the DataNodes. The HCDE has load log of the NameNodes which always updating the current hits load of the NameNodes. The HCDE forward the clients request to the least loaded NameNodes based on load log.   As loads are distribute to the NameNodes based current load of the NameNodes, it reduces the traffic of heavily loaded Namenodes or reduce searching time where data volume is huge which increase the efficiency of distributed database system. Failure of  DataNode/s interrupt the real time operation in distributed file system means while one or more DataNode/s falling down and become active again, in the mean time many changes can be done in other active DataNodes. As a result new client can get access to the least updated DataNodes and get older data. To avoid such mismatch the metadata synchronizer have feature to collecting the current data status of DataNodes and update metadata of the NameNodes and synchronize.