Hadoop 3.3.6 Installation on Ubuntu 24.04.1 LTS

 1. Ctrl+Alt+T ,to open terminal and run the commands one by one,

    a) sudo apt-get update


                If any problem for running above command then run below commands,
                
                    sudo rm -rf /var/lib/apt/lists/*
                    sudo apt-get update
                    sudo apt-get update -o Acquire::http::No-Cache=True
                    
    b) sudo apt-get install openjdk-8-jdk
    
        At the moment, Apache Hadoop 3.x fully supports Java 8. The OpenJDK 8 package in Ubuntu contains both the runtime environment and development kit.
    
    c) sudo apt-get install openssh-server

2. after installation of opennsh-server , to generate the rsa key run the following commands and without any input just press enter 4 times,

    ssh-keygen -t rsa

3.cat .ssh/id_rsa.pub>.ssh/authorized_keys

    or
    
    cd .ssh
    ls
    cat authorized_keys

4.ssh localhost
    > Yes

 

5. Download Hadoop 3.3.6 and do below config file changes.

    Download hadoop 3.3.6 setup from below link.

    https://archive.apache.org/dist/hadoop/common/hadoop-3.3.6/

    Extract above downloaded zip file under below directory,

    /home/karpagaraj/hadoop_setup

    ('karpagaraj' is username and 'hadoop_setup' is directory we need to create under 'home' directory)

    once extracted above hadoop downloaded zip file then go the directory : /hadoop-3.3.6/etc/hadoop and follow below steps,

5.1) open the core-site.xml , by right click and open with gedit

Within the core-site.xml , in between the <configuration> tag paste the following code

<configuration>
 <property>
 <name>fs.defaultFS</name>
 <value>hdfs://localhost:9000</value>  </property>
 <property>
<name>hadoop.proxyuser.dataflair.groups</name> <value>*</value>
 </property>
 <property>
<name>hadoop.proxyuser.dataflair.hosts</name> <value>*</value>
 </property>
 <property>
<name>hadoop.proxyuser.server.hosts</name> <value>*</value>
 </property>
 <property>
<name>hadoop.proxyuser.server.groups</name> <value>*</value>
 </property>
</configuration>


5.2) create below directories,

mkdir -p /home/karpagaraj/hadoop_setup/yarn/namenode

mkdir -p /home/karpagaraj/hadoop_setup/yarn/datanode


5.3) open hdfs-site.xml, in between the <configuration> tag paste the following code and change the username.

<property>
 <name>dfs.replication</name>
 <value>1</value>
</property>

<property>
 <name>dfs.namenode.name.dir</name>
 <value>file:/home/karpagaraj/yarn/namenode</value>
</property>

<property>
 <name>dfs.datanode.data.dir</name>
 <value>file:/home/karpagaraj/yarn/datanode</value>
</property>

note: karpagaraj is your username.

5.4) open mapred-site.xml., in between the <configuration> tag paste the following code

 <property>
 <name>mapreduce.framework.name</name>  <value>yarn</value>
 </property>
 <property>
 <name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
 </property>

5.5) open yarn-site.xml, ,in between the <configuration> tag paste the following code

 <property>
 <name>yarn.nodemanager.aux-services</name>
 <value>mapreduce_shuffle</value>
 </property>
 <property>
 <name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREP END_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
 </property>

 

6. Set the .bashrc

Press ctrl+h in ubuntu home and open .bashrc file then paste the below lines,

export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=$HOME/hadoop_setup/hadoop-3.3.6
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export PDSH_RCMD_TYPE=ssh

export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin


07. type below in terminal,

bash (OR) source ~/.bashrc

Note : it will refresh .bashrc file.

 

08. Format the namenode and filesystem,

Go to cd $HOME/hadoop_setup/hadoop-3.3.6/bin in terminal
Note : only first time after the hadoop setup
Run the below command,
 ./hdfs namenode -format

format the file system,

export PDSH_RCMD_TYPE=ssh

 

09. Start all services,
Goto cd ../sbin

Run the below command,
./start-all.sh

 From next time on-wards you can run above start-all.sh command from anywhere in terminal. 

 

10. type jps in terminal, you can see all daemons running,

NameNode
DataNode
SecondaryNameNode
ResourceManager
NodeManager

 

11. Login to "localhost:9870" webconsole in browser -> under "utility" menu find the option -> "browse the file system" to view the hdfs file.



Comments

Popular posts from this blog

Hive File Formats

Why We Need Hadoop?

Hive Data Types