阿里云ECS的Hadoop2.6.0|Zookeeper3.4.6|Hbase0.98.13集群搭建

日期: 2016-04-10

分类: HBase

1：修改hostname
2:挂载数据盘
3：修改/etc/hosts
4:SSH免密码登录
5:安装JDK1.7
6:安装maven:(只在kks1节点上安装即可)
7：下载Hadoop
8:修改hadoop-env.sh
9:core-site.xml
10:hdfs-site.xml
11:yarn-site.xml
12:mapred-site.xml
13:slaves
14:fairscheduler.xml
15:yarn-env.sh
16:复制到其余主机
17：启动集群
18：zookeeper安装
19:在data目录下创建myid文件
20：在conf目录下建立zoo.cfg文件
21：在每个节点上启动zookeeper
22：Hbase安装
23：hbase-env.sh
24：hbase-site.xml
25：regionservers
26：复制到其他节点
27：启动集群

今天小梦从阿里云ECS裸机开始给大家一步一步演示一下Hadoop2.6.0|Zookeeper3.4.6|Hbase；0.98.13集群搭建。

我的集群情况如下：依次是公网IP，内网IP，hostname，HDFS节点，HBase节点，ZK，Yarn节点

120.24.83.53  10.169.132.145 kks1 namenode hmatser zookeeper ResourceManager
120.24.50.76  10.45.162.55 kks2  datanode  regionserver zookeeper  NodeManager
120.24.50.27 10.45.162.0  kks3   datanode regionserver zookeeper    NodeManager SecondaryNameNode
120.24.51.109  10.45.165.59 kks4 datanode regionserver NodeManager

1：修改hostname

vim /etc/sysconfig/network
修改：HOSTNAME=kks1
sudo hostname kks1

重启即可。

2:挂载数据盘

fdisk -l
fdisk /dev/xvdb
依次输入“n”，“p” “1”，两次回车，“wq"
fdisk -l

mkfs.ext3 /dev/xvdb1
echo '/dev/xvdb1 /mnt ext3 defaults 0 0' &gt;&gt; /etc/fstab
mount -a
df -h

3：修改/etc/hosts

vim /etc/hosts
10.169.132.145 kks1
10.45.162.55 kks2
10.45.162.0 kks3
10.45.165.59 kks4

4:SSH免密码登录

A为本地主机(即用于控制其他主机的机器) ;
B为远程主机(即被控制的机器Server), 假如ip为172.24.253.2 ;
A和B的系统都是Linux

在A上的命令:
# ssh-keygen -t rsa (连续三次回车,即在本地生成了公钥和私钥,不设置密码)
# ssh [email protected] "mkdir .ssh;chmod 0700 .ssh" (需要输入密码， 注:必须将.ssh的权限设为700)
# scp ~/.ssh/id_rsa.pub [email protected]:.ssh/id_rsa.pub (需要输入密码)

ssh root@kks1 "mkdir .ssh;chmod 0700 .ssh"
scp ~/.ssh/id_rsa.pub root@kks1:.ssh/id_rsa.pub
在B上的命令:
# touch /root/.ssh/authorized_keys (如果已经存在这个文件, 跳过这条)
# chmod 600 ~/.ssh/authorized_keys (# 注意： 必须将~/.ssh/authorized_keys的权限改为600, 该文件用于保存ssh客户端生成的公钥，可以修改服务器的ssh服务端配置文件/etc/ssh/sshd_config来指定其他文件名）
# cat /root/.ssh/id_rsa.pub &gt;&gt; /root/.ssh/authorized_keys (将id_rsa.pub的内容追加到 authorized_keys 中, 注意不要用 &gt; ，否则会清空原有的内容，使其他人无法使用原有的密钥登录)

5:安装JDK1.7

yum install java-1.7.0-openjdk-devel.x86_64 -y

6:安装maven:(只在kks1节点上安装即可)

wget http://ftp.tsukuba.wide.ad.jp/software/apache/maven/maven-3/3.3.1/binaries/apache-maven-3.3.1-bin.tar.gz
maven下载到home目录下，则运行：
echo export PATH='$PATH':/home/maven/bin &gt;&gt; /etc/profile
配置好环境变量后，运行：source /etc/profile
运行：mvn --version，如系统打印maven版本信息，则配置成功

7：下载Hadoop

wget http://apache.cs.utah.edu/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz

8:修改hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
export HADOOP_PID_DIR=/var/hadoop/pids

注1：为了避免以后 sbin/stop-dfs.sh等命令失效，强烈建议设置HADOOP_PID_DIR

注2：以后所有配置的目录尽量在启动集群前都建立好。

9:core-site.xml

<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://kks1:8020</value>
</property>
</configuration>

10:hdfs-site.xml

<configuration>

<property>
<name>dfs.namenode.name.dir</name>
<value>file:///home/hadoop/hdfs/name</value>
</property>

<property>
<name>dfs.datanode.data.dir</name>
<value>file:///home/hadoop/hdfs/data</value>
</property>

<property>
<name>dfs.namenode.secondary.http-address</name>
<value>kks3:9001</value>
</property>

</configuration>

11:yarn-site.xml

<configuration>

<!-- Site specific YARN configuration properties -->
<property>
<description>The hostname of the RM.</description>
<name>yarn.resourcemanager.hostname</name>
<value>kks1</value>
</property>    

<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>

<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>

<property>
<description>The http address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>

<property>
<description>The https adddress of the RM web application.</description>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>

<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>

<property>
<description>The address of the RM admin interface.</description>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>

<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

<property>
<name>yarn.scheduler.fair.allocation.file</name>
<value>/home/hadoop/etc/hadoop/fairscheduler.xml</value>
</property>

<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/home/hadoop/yarn/local</value>
</property>

<property>
<description>Whether to enable log aggregation</description>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>

<property>
<description>Where to aggregate logs to.</description>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/tmp/logs</value>
</property>

<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>30720</value>
</property>

<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>12</value>
</property>

<property>
<description>the valid service name should only contain a-zA-Z0-9_ and can not start with numbers</description>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

12:mapred-site.xml

<configuration>

<!-- MR YARN Application properties -->

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<!-- jobhistory properties -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>kks2:10020</value>
<description>MapReduce JobHistory Server IPC host:port</description>
</property>

<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>kks2:19888</value>
<description>MapReduce JobHistory Server Web UI host:port</description>
</property>

</configuration>

13:slaves

kks2
kks3
kks4

14:fairscheduler.xml

<?xml version="1.0"?>
<allocations>

  <queue name="infrastructure">
      <minResources>102400 mb, 50 vcores </minResources>
          <maxResources>153600 mb, 100 vcores </maxResources>
              <maxRunningApps>200</maxRunningApps>
                  <minSharePreemptionTimeout>300</minSharePreemptionTimeout>
                      <weight>1.0</weight>
                          <aclSubmitApps>root,yarn,search,hdfs</aclSubmitApps>
                            </queue>

   <queue name="tool">
         <minResources>102400 mb, 30 vcores</minResources>
               <maxResources>153600 mb, 50 vcores</maxResources>
                  </queue>

   <queue name="sentiment">
         <minResources>102400 mb, 30 vcores</minResources>
               <maxResources>153600 mb, 50 vcores</maxResources>
                  </queue>

</allocations>

15:yarn-env.sh

export YARN_PID_DIR=/var/hadoop/pids

16:复制到其余主机

scp -r hadoop root@kks2:/home
scp -r hadoop root@kks3:/home
scp -r hadoop root@kks4:/home

17：启动集群

注意：所有操作均在Hadoop部署目录下进行。

在kks1上，对其进行格式化，并启动：
bin/hdfs namenode -format

sbin/hadoop-daemon.sh start namenode

在kks1上，启动所有datanode
sbin/hadoop-daemons.sh start datanode

启动YARN：
sbin/start-yarn.sh

至此，Hadoop 搭建完毕。 可用jps命令查看jvm进程。

以后关闭，停闭集群可以使用以下命令：
sbin/stop-dfs.sh

sbin/start-dfs.sh

sbin/start-yarn.sh

sbin/stop-yarn.sh

18：zookeeper安装

wget http://mirrors.cnnic.cn/apache/zookeeper/zookeeper-3.4.6/zookeeper-3.4.6.tar.gz

tar -zvxf zookeeper-..jar
cd zookeeper 目录
mkdir data
mkdir datalog

19:在data目录下创建myid文件

在kks1,kks2,kks3上依次为1,2,3。

20：在conf目录下建立zoo.cfg文件

# The number of milliseconds of each tick
tickTime=2000
# The number of ticks that the initial
# synchronization phase can take
initLimit=10
# The number of ticks that can pass between
# sending a request and getting an acknowledgement
syncLimit=5
# the directory where the snapshot is stored.
# do not use /tmp for storage, /tmp here is just
# example sakes.
dataDir=/home/zookeeper/data
dataLogDir=/home/zookeeper/datalog
# the port at which the clients will connect
clientPort=2181
# the maximum number of client connections.
# increase this if you need to handle more clients
#maxClientCnxns=60
#
# Be sure to read the maintenance section of the
# administrator guide before turning on autopurge.
#
# http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
#
# The number of snapshots to retain in dataDir
#autopurge.snapRetainCount=3
# Purge task interval in hours
# Set to "0" to disable auto purge feature
#autopurge.purgeInterval=1
server.1=kks1:2888:3888
server.2=kks2:2888:3888
server.3=kks3:2888:3888

21：在每个节点上启动zookeeper

bin/zkServer.sh stop
bin/zkServer.sh start
bin/zkServer.sh status 查看节点状态

22：Hbase安装

wget http://mirrors.koehn.com/apache/hbase/0.98.13/hbase-0.98.13-hadoop2-bin.tar.gz

23：hbase-env.sh

vim hbase-env.sh
export HBASE_MANAGES_ZK=false
export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
export HBASE_PID_DIR=/var/hadoop/pids

24：hbase-site.xml

<configuration>
<property>
<name>hbase.rootdir</name>
<value>hdfs://kks1:8020/hbase</value>
</property>
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<property>
 <name>hbase.zookeeper.property.clientPort</name>
  <value>2181</value>
   </property> 
   <property>
   <name>hbase.zookeeper.quorum</name>
   <value>kks1,kks2,kks3</value>
   </property>
   <property>
   <name>zookeeper.session.timeout</name>
   <value>120000</value>
   </property>
   <property>
   <name>hbase.zookeeper.property.dataDir</name>
   <value>/home/hbase/data</value>
   </property>
</configuration>

25：regionservers

kks2
kks3
kks4

26：复制到其他节点

scp -r hbase root@kks4:/home

27：启动集群

bin/start-hbase.sh

bin/stop-hbase.sh

至此HBase集群搭建完毕。

《OLAP 性能优化指南》欢迎 Star&共建

《OLAP 性能优化指南》