












Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An overview of Hadoop Distributed File System (HDFS) and MapReduce, two major components of Hadoop. It explains the architecture of HDFS, including the roles of Namenode, Datanode, and Blocks. It also describes the anatomy of file read and write operations in HDFS. Additionally, the document discusses the working of MapReduce, a programming model used for processing large amounts of data. It explains the phases of Map and Reduce and the role of Jobtracker and Task Trackers in executing MapReduce tasks. Finally, the document highlights the advantages of Hadoop, including its ability to handle varied data sources and its scalability for processing big data.
Typology: Exams
1 / 20
This page cannot be seen from the preview
Don't miss anything!
Answer: Hadoop File System was developed using distributed le system design. It runs on commodity hardware. Unlike other distributed systems, HDFS is highly fault tolerant and designed using low-cost hardware. HDFS holds a very large amount of data and provides easier access. To store such huge data, the les are stored across multiple machines. These les are stored in redundant fashion to rescue the system from possible data losses in case of failure. HDFS also makes applications available for parallel processing. HDFS Architecture Given below is the architecture of a Hadoop File System.
HDFS follows the master-slave architecture and it has the following elements. Namenode The namenode is the commodity hardware that contains the GNU/Linux operating system and the namenode software. It is a software that can be run on commodity hardware. The system having the namenode acts as the master server and it does the following tasks − ● Manages the le system namespace. ● Regulates client’s access to les. ● It also executes le system operations such as renaming, closing, and opening les and directories. Datanode The datanode is a commodity hardware having the GNU/Linux operating system and datanode software. For every node (Commodity hardware/System) in a cluster, there will be a datanode. These nodes manage the data storage of their system. ● Datanodes perform read-write operations on the le systems, as per client request. ● They also perform operations such as block creation, deletion, and replication according to the instructions of the namenode. Block Generally the user data is stored in the les of HDFS. The le in a le system will be divided into one or more segments and/or stored in individual data nodes. These le segments are called
● Highly Scalable - HDFS is highly scalable as it can scale hundreds of nodes in a single cluster. ● Replication - Due to some unfavorable conditions, the node containing the data may be lost. So, to overcome such problems, HDFS always maintains the copy of data on a different machine. ● Fault tolerance - In HDFS, the fault tolerance signi es the robustness of the system in the event of failure. The HDFS is highly fault-tolerant that if any machine fails, the other machine containing the copy of that data automatically becomes active. ● Distributed data storage - This is one of the most important features of HDFS that makes Hadoop very powerful. Here, data is divided into multiple blocks and stored into nodes. ● Portable - HDFS is designed in such a way that it can easily be portable from platform to another.
Answer: A command line Interface (CLI) is a text based user interface used to run programs, manage computer les and interact with the computer. Command line interfaces are also called command line user interfaces, console user interfaces and character user interfaces. CLI’s accept as input commands that are entered by keyboard; the commands invoked at the command prompt are then run by the computer.
CLI is an interactive command line shell that makes interacting with HDFS simpler and more intuitive than the standard command line tools that come with hadoop. To use the HDFS commands, we just need to start the hadoop services using the following commands: Sbin / start -all.sh To check the hadoop services are up and running we use the following command Jps Commands: ● ls: this command is used to list all the les. bin / hdfs -ls