阿里云ECS的Hadoop2.6.0|Zookeeper3.4.6|Hbase0.98.13集群搭建


作者: 康凯森

日期: 2016-04-10

分类: HBase


今天小梦从阿里云ECS裸机开始给大家一步一步演示一下Hadoop2.6.0|Zookeeper3.4.6|Hbase;0.98.13集群搭建。

我的集群情况如下: 依次是公网IP,内网IP,hostname,HDFS节点,HBase节点,ZK,Yarn节点

120.24.83.53  10.169.132.145 kks1 namenode hmatser zookeeper ResourceManager
120.24.50.76  10.45.162.55 kks2  datanode  regionserver zookeeper  NodeManager
120.24.50.27 10.45.162.0  kks3   datanode regionserver zookeeper    NodeManager SecondaryNameNode
120.24.51.109  10.45.165.59 kks4 datanode regionserver NodeManager

1:修改hostname

vim /etc/sysconfig/network
修改:HOSTNAME=kks1
sudo hostname kks1

重启即可。

2:挂载数据盘

fdisk -l
fdisk /dev/xvdb
依次输入“n”,“p” “1”,两次回车,“wq"
fdisk -l

mkfs.ext3 /dev/xvdb1
echo '/dev/xvdb1 /mnt ext3 defaults 0 0' >> /etc/fstab
mount -a
df -h

3:修改/etc/hosts

vim /etc/hosts
10.169.132.145 kks1
10.45.162.55 kks2
10.45.162.0 kks3
10.45.165.59 kks4

4:SSH免密码登录

A为本地主机(即用于控制其他主机的机器) ;
B为远程主机(即被控制的机器Server), 假如ip为172.24.253.2 ;
A和B的系统都是Linux

在A上的命令:
# ssh-keygen -t rsa (连续三次回车,即在本地生成了公钥和私钥,不设置密码)
# ssh root@172.24.253.2 "mkdir .ssh;chmod 0700 .ssh" (需要输入密码, 注:必须将.ssh的权限设为700)
# scp ~/.ssh/id_rsa.pub root@172.24.253.2:.ssh/id_rsa.pub (需要输入密码)

ssh root@kks1 "mkdir .ssh;chmod 0700 .ssh"
scp ~/.ssh/id_rsa.pub root@kks1:.ssh/id_rsa.pub
在B上的命令:
# touch /root/.ssh/authorized_keys (如果已经存在这个文件, 跳过这条)
# chmod 600 ~/.ssh/authorized_keys (# 注意: 必须将~/.ssh/authorized_keys的权限改为600, 该文件用于保存ssh客户端生成的公钥,可以修改服务器的ssh服务端配置文件/etc/ssh/sshd_config来指定其他文件名)
# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keys (将id_rsa.pub的内容追加到 authorized_keys 中, 注意不要用 > ,否则会清空原有的内容,使其他人无法使用原有的密钥登录)

5:安装JDK1.7

yum install java-1.7.0-openjdk-devel.x86_64 -y

6:安装maven:(只在kks1节点上安装即可)

wget http://ftp.tsukuba.wide.ad.jp/software/apache/maven/maven-3/3.3.1/binaries/apache-maven-3.3.1-bin.tar.gz
maven下载到home目录下,则运行:
echo export PATH='$PATH':/home/maven/bin >> /etc/profile
配置好环境变量后,运行:source /etc/profile
运行:mvn --version,如系统打印maven版本信息,则配置成功

7:下载Hadoop

wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

8:修改hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
export HADOOP_PID_DIR=/var/hadoop/pids

注1:为了避免以后 sbin/stop-dfs.sh等命令失效,强烈建议设置HADOOP_PID_DIR

注2:以后所有配置的目录尽量在启动集群前都建立好。

9:core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://kks1:8020</value>
</property>
</configuration>

10:hdfs-site.xml

<configuration>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hdfs/name</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hdfs/data</value>
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>kks3:9001</value>
</property>

</configuration>

11:yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>kks1</value>
</property>    

<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>

<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>

<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>

<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>

<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>

<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/home/hadoop/etc/hadoop/fairscheduler.xml</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/yarn/local</value>
</property>

<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>30720</value>
</property>

<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
</property>

<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

12:mapred-site.xml

<configuration>

<!-- MR YARN Application properties -->

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>kks2:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>kks2:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>

</configuration>

13:slaves

kks2
kks3
kks4

14:fairscheduler.xml

<?xml version="1.0"?>
<allocations>

  <queue name="infrastructure">
      <minResources>102400 mb, 50 vcores </minResources>
          <maxResources>153600 mb, 100 vcores </maxResources>
              <maxRunningApps>200</maxRunningApps>
                  <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
                      <weight>1.0</weight>
                          <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
                            </queue>

   <queue name="tool">
         <minResources>102400 mb, 30 vcores</minResources>
               <maxResources>153600 mb, 50 vcores</maxResources>
                  </queue>

   <queue name="sentiment">
         <minResources>102400 mb, 30 vcores</minResources>
               <maxResources>153600 mb, 50 vcores</maxResources>
                  </queue>

</allocations>

15:yarn-env.sh

export YARN_PID_DIR=/var/hadoop/pids

16:复制到其余主机

scp -r hadoop root@kks2:/home
scp -r hadoop root@kks3:/home
scp -r hadoop root@kks4:/home

17:启动集群

注意:所有操作均在Hadoop部署目录下进行。

在kks1上,对其进行格式化,并启动:
bin/hdfs namenode -format

sbin/hadoop-daemon.sh start namenode

在kks1上,启动所有datanode
sbin/hadoop-daemons.sh start datanode

启动YARN:
sbin/start-yarn.sh

至此,Hadoop 搭建完毕。 可用jps命令查看jvm进程。

以后关闭,停闭集群可以使用以下命令:
sbin/stop-dfs.sh

sbin/start-dfs.sh

sbin/start-yarn.sh

sbin/stop-yarn.sh

18:zookeeper安装

wget http://mirrors.cnnic.cn/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

tar -zvxf zookeeper-..jar
cd zookeeper 目录
mkdir data
mkdir datalog

19:在data目录下创建myid文件

在kks1,kks2,kks3上依次为1,2,3。

20:在conf目录下建立zoo.cfg文件

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/zookeeper/data
dataLogDir=/home/zookeeper/datalog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=kks1:2888:3888
server.2=kks2:2888:3888
server.3=kks3:2888:3888

21:在每个节点上启动zookeeper

bin/zkServer.sh stop
bin/zkServer.sh start
bin/zkServer.sh status 查看节点状态

22:Hbase安装

wget http://mirrors.koehn.com/apache/hbase/0.98.13/hbase-0.98.13-hadoop2-bin.tar.gz

23:hbase-env.sh

vim hbase-env.sh
export HBASE_MANAGES_ZK=false
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
export HBASE_PID_DIR=/var/hadoop/pids

24:hbase-site.xml

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://kks1:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
 <name>hbase.zookeeper.property.clientPort</name>
  <value>2181</value>
   </property> 
   <property>
   <name>hbase.zookeeper.quorum</name>
   <value>kks1,kks2,kks3</value>
   </property>
   <property>
   <name>zookeeper.session.timeout</name>
   <value>120000</value>
   </property>
   <property>
   <name>hbase.zookeeper.property.dataDir</name>
   <value>/home/hbase/data</value>
   </property>
</configuration>

25:regionservers

kks2
kks3
kks4

26:复制到其他节点

scp -r hbase root@kks4:/home

27:启动集群

bin/start-hbase.sh

bin/stop-hbase.sh

至此HBase集群搭建完毕。


《OLAP 性能优化指南》欢迎 Star&共建

《OLAP 性能优化指南》

欢迎关注微信公众号