作者: 康凯森
日期: 2016-04-10
分类: HBase
今天小梦从阿里云ECS裸机开始给大家一步一步演示一下Hadoop2.6.0|Zookeeper3.4.6|Hbase;0.98.13集群搭建。
我的集群情况如下: 依次是公网IP,内网IP,hostname,HDFS节点,HBase节点,ZK,Yarn节点
120.24.83.53 10.169.132.145 kks1 namenode hmatser zookeeper ResourceManager
120.24.50.76 10.45.162.55 kks2 datanode regionserver zookeeper NodeManager
120.24.50.27 10.45.162.0 kks3 datanode regionserver zookeeper NodeManager SecondaryNameNode
120.24.51.109 10.45.165.59 kks4 datanode regionserver NodeManager
vim /etc/sysconfig/network
修改:HOSTNAME=kks1
sudo hostname kks1
重启即可。
fdisk -l
fdisk /dev/xvdb
依次输入“n”,“p” “1”,两次回车,“wq"
fdisk -l
mkfs.ext3 /dev/xvdb1
echo '/dev/xvdb1 /mnt ext3 defaults 0 0' >> /etc/fstab
mount -a
df -h
vim /etc/hosts
10.169.132.145 kks1
10.45.162.55 kks2
10.45.162.0 kks3
10.45.165.59 kks4
A为本地主机(即用于控制其他主机的机器) ;
B为远程主机(即被控制的机器Server), 假如ip为172.24.253.2 ;
A和B的系统都是Linux
在A上的命令:
# ssh-keygen -t rsa (连续三次回车,即在本地生成了公钥和私钥,不设置密码)
# ssh root@172.24.253.2 "mkdir .ssh;chmod 0700 .ssh" (需要输入密码, 注:必须将.ssh的权限设为700)
# scp ~/.ssh/id_rsa.pub root@172.24.253.2:.ssh/id_rsa.pub (需要输入密码)
ssh root@kks1 "mkdir .ssh;chmod 0700 .ssh"
scp ~/.ssh/id_rsa.pub root@kks1:.ssh/id_rsa.pub
在B上的命令:
# touch /root/.ssh/authorized_keys (如果已经存在这个文件, 跳过这条)
# chmod 600 ~/.ssh/authorized_keys (# 注意: 必须将~/.ssh/authorized_keys的权限改为600, 该文件用于保存ssh客户端生成的公钥,可以修改服务器的ssh服务端配置文件/etc/ssh/sshd_config来指定其他文件名)
# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys (将id_rsa.pub的内容追加到 authorized_keys 中, 注意不要用 > ,否则会清空原有的内容,使其他人无法使用原有的密钥登录)
yum install java-1.7.0-openjdk-devel.x86_64 -y
wget http://ftp.tsukuba.wide.ad.jp/software/apache/maven/maven-3/3.3.1/binaries/apache-maven-3.3.1-bin.tar.gz
maven下载到home目录下,则运行:
echo export PATH='$PATH':/home/maven/bin >> /etc/profile
配置好环境变量后,运行:source /etc/profile
运行:mvn --version,如系统打印maven版本信息,则配置成功
wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
export HADOOP_PID_DIR=/var/hadoop/pids
注1:为了避免以后 sbin/stop-dfs.sh等命令失效,强烈建议设置HADOOP_PID_DIR
注2:以后所有配置的目录尽量在启动集群前都建立好。
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://kks1:8020</value>
</property>
</configuration>
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>kks3:9001</value>
</property>
</configuration>
<configuration>
<!-- Site specific YARN configuration properties -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>kks1</value>
</property>
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>
<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/home/hadoop/etc/hadoop/fairscheduler.xml</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/yarn/local</value>
</property>
<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>30720</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
</property>
<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
<configuration>
<!-- MR YARN Application properties -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>kks2:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>kks2:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>
</configuration>
kks2
kks3
kks4
<?xml version="1.0"?>
<allocations>
<queue name="infrastructure">
<minResources>102400 mb, 50 vcores </minResources>
<maxResources>153600 mb, 100 vcores </maxResources>
<maxRunningApps>200</maxRunningApps>
<minSharePreemptionTimeout>300</minSharePreemptionTimeout>
<weight>1.0</weight>
<aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
</queue>
<queue name="tool">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153600 mb, 50 vcores</maxResources>
</queue>
<queue name="sentiment">
<minResources>102400 mb, 30 vcores</minResources>
<maxResources>153600 mb, 50 vcores</maxResources>
</queue>
</allocations>
export YARN_PID_DIR=/var/hadoop/pids
scp -r hadoop root@kks2:/home
scp -r hadoop root@kks3:/home
scp -r hadoop root@kks4:/home
注意:所有操作均在Hadoop部署目录下进行。
在kks1上,对其进行格式化,并启动:
bin/hdfs namenode -format
sbin/hadoop-daemon.sh start namenode
在kks1上,启动所有datanode
sbin/hadoop-daemons.sh start datanode
启动YARN:
sbin/start-yarn.sh
至此,Hadoop 搭建完毕。 可用jps命令查看jvm进程。
以后关闭,停闭集群可以使用以下命令:
sbin/stop-dfs.sh
sbin/start-dfs.sh
sbin/start-yarn.sh
sbin/stop-yarn.sh
wget http://mirrors.cnnic.cn/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz
tar -zvxf zookeeper-..jar
cd zookeeper 目录
mkdir data
mkdir datalog
在kks1,kks2,kks3上依次为1,2,3。
# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/zookeeper/data
dataLogDir=/home/zookeeper/datalog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=kks1:2888:3888
server.2=kks2:2888:3888
server.3=kks3:2888:3888
bin/zkServer.sh stop
bin/zkServer.sh start
bin/zkServer.sh status 查看节点状态
wget http://mirrors.koehn.com/apache/hbase/0.98.13/hbase-0.98.13-hadoop2-bin.tar.gz
vim hbase-env.sh
export HBASE_MANAGES_ZK=false
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
export HBASE_PID_DIR=/var/hadoop/pids
<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://kks1:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
<name>hbase.zookeeper.property.clientPort</name>
<value>2181</value>
</property>
<property>
<name>hbase.zookeeper.quorum</name>
<value>kks1,kks2,kks3</value>
</property>
<property>
<name>zookeeper.session.timeout</name>
<value>120000</value>
</property>
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/home/hbase/data</value>
</property>
</configuration>
kks2
kks3
kks4
scp -r hbase root@kks4:/home
bin/start-hbase.sh
bin/stop-hbase.sh
至此HBase集群搭建完毕。