Hadoop 3.3.6 Installation on Ubuntu 24.04.1 LTS
1. Ctrl+Alt+T ,to open terminal and run the commands one by one,
a) sudo apt-get update
If any problem for running above command then run below commands,
sudo rm -rf /var/lib/apt/lists/*
sudo apt-get update
sudo apt-get update -o Acquire::http::No-Cache=True
b) sudo apt-get install openjdk-8-jdk
At the moment, Apache Hadoop 3.x fully supports Java 8. The OpenJDK 8 package in Ubuntu contains both the runtime environment and development kit.
c) sudo apt-get install openssh-server
2. after installation of opennsh-server , to generate the rsa key run the following commands and without any input just press enter 4 times,
ssh-keygen -t rsa
3.cat .ssh/id_rsa.pub>.ssh/authorized_keys
or
cd .ssh
ls
cat authorized_keys
4.ssh localhost
> Yes
5. Download Hadoop 3.3.6 and do below config file changes.
Download hadoop 3.3.6 setup from below link.
https://archive.apache.org/dist/hadoop/common/hadoop-3.3.6/
Extract above downloaded zip file under below directory,
/home/karpagaraj/hadoop_setup
('karpagaraj' is username and 'hadoop_setup' is directory we need to create under 'home' directory)
once extracted above hadoop downloaded zip file then go the directory : /hadoop-3.3.6/etc/hadoop and follow below steps,
5.1) open the core-site.xml , by right click and open with gedit
Within the core-site.xml , in between the <configuration> tag paste the following code
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value> </property>
<property>
<name>hadoop.proxyuser.dataflair.groups</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.dataflair.hosts</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.hosts</name> <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.server.groups</name> <value>*</value>
</property>
</configuration>
5.2) create below directories,
mkdir -p /home/karpagaraj/hadoop_setup/yarn/namenode
mkdir -p /home/karpagaraj/hadoop_setup/yarn/datanode
5.3) open hdfs-site.xml, in between the <configuration> tag paste the following code and change the username.
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:/home/karpagaraj/yarn/namenode</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:/home/karpagaraj/yarn/datanode</value>
</property>
note: karpagaraj is your username.
5.4) open mapred-site.xml., in between the <configuration> tag paste the following code
<property>
<name>mapreduce.framework.name</name> <value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
5.5) open yarn-site.xml, ,in between the <configuration> tag paste the following code
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREP END_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED_HOME</value>
</property>
6. Set the .bashrc
Press ctrl+h in ubuntu home and open .bashrc file then paste the below lines,
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=$HOME/hadoop_setup/hadoop-3.3.6
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_STREAMING=$HADOOP_HOME/share/hadoop/tools/lib/hadoop-streaming-3.3.6.jar
export HADOOP_LOG_DIR=$HADOOP_HOME/logs
export PDSH_RCMD_TYPE=ssh
export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
07. type below in terminal,
bash (OR) source ~/.bashrc
Note : it will refresh .bashrc file.
08. Format the namenode and filesystem,
Go to cd $HOME/hadoop_setup/hadoop-3.3.6/bin in terminal
Note : only first time after the hadoop setup
Run the below command,
./hdfs namenode -format
format the file system,
export PDSH_RCMD_TYPE=ssh
09. Start all services,
Goto cd ../sbin
Run the below command,
./start-all.sh
From next time on-wards you can run above start-all.sh command from anywhere in terminal.
10. type jps in terminal, you can see all daemons running,
NameNodeDataNode
SecondaryNameNode
ResourceManager
NodeManager
11. Login to "localhost:9870" webconsole in browser -> under "utility" menu find the option -> "browse the file system" to view the hdfs file.
Comments
Post a Comment