#Word文档导入#大数据presto+Alluxio

男儿欲遂平生志,六经勤向窗前读。这篇文章主要讲述#Word文档导入#大数据presto+Alluxio相关的知识,希望能为你提供帮助。
大数据presto+Alluxio?























1.Presto安装
角色分配?

IP地址
HOSTNAME
NodeID
角色
172.16.16.241
incubator-test-dc-001
presto-cdh01
coordinator
172.16.16.246
incubator-test-dc-002
presto-cdh02
worker
172.16.16.250
incubator-test-dc-003
presto-cdh03
worker
172.16.16.242
incubator-test-dc-004
presto-cdh04
worker
172.16.16.249
incubator-test-dc-005?
presto-cdh05?
worker?
环境?测试环境:?
1.CM6.3?
2.Presto版本0.226?
3.操作系统版本为Redhat7.3?
4.采用root用户进行操作?
下载?下载最新版本
Presto服务的安装目录为/opt/cloudera/parcels/presto
??https://repo1.maven.org/maven2/com/facebook/presto/presto-server/0.226/presto-server-0.226.tar.gz???
将下载好的presto-server-0.226.tar.gz上传至Presto集群的所有服务器上
mkdir -p /opt/cloudera/parcels/presto?
scp -r -P53742 presto-server-0.226.* root@incubator-test-dc-002:/opt/cloudera/parcels/presto/?
presto-server-0.226.jar ?


解压安装(presto集群所有机器)?将presto-server-0.205.tar.gz压缩包解压至/opt/cloudera/parcels目录?
# tar -zxvf presto-server-332.tar.gz -C /opt/cloudera/parcels/?
#cd /opt/cloudera/parcels/?
mv presto presto-soft?
mv presto-server-0.226/ presto?
java环境变量设置?vim /opt/cloudera/parcels/presto/bin/launcher文件如下位置添加JAVA环境变量?
JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera?
PATH=$JAVA_HOME/bin:$PATH?
准备Presto的配置文件?#mkdir -p /opt/cloudera/parcels/presto/etc?
#presto配置文件?
#mkdir -p /data/presto ?
#数据盘?
vim /opt/cloudera/parcels/presto/etc/node.properties?
node.environment=presto?
node.id=presto-cdh01?
node.data-dir=/data/presto?
配置说明:?
node.environment:集群名称。所有在同一个集群中的Presto节点必须拥有相同的集群名称。?
node.id:每个Presto节点的唯一标示。每个节点的node.id都必须是唯一的。在Presto进行重启或者升级过程中每个节点的node.id必须保持不变。如果在一个节点上安装多个Presto实例(例如:在同一台机器上安装多个Presto节点),那么每个Presto节点必须拥有唯一的node.id。?
node.data-dir:数据存储目录的位置(操作系统上的路径)。Presto将会把日期和数据存储在这个目录下。?
Presto的jvm配置文件?配置Presto的JVM参数,创建jvm.config文件?
vim /opt/cloudera/parcels/presto/etc/jvm.config?
-server?
-Xmx8G?
-XX:+UseConcMarkSweepGC?
-XX:+ExplicitGCInvokesConcurrent?
-XX:+CMSClassUnloadingEnabled?
-XX:+AggressiveOpts?
-XX:+HeapDumpOnOutOfMemoryError?
-XX:OnOutOfMemoryError=kill -9 %p?
-XX:ReservedCodeCacheSize=150M?
#配置文件的格式是:一系列的选项,每行配置一个单独的选项。由于这些选项不在shell命令中使用。因此即使将每个选项通过空格或者其他的分隔符分开,java程序也不会将这些选项分开,而是作为一个命令行选项处理。(就想下面例子中的OnOutOfMemoryError选项)。?
由于OutOfMemoryError将会导致JVM处于不一致状态,所以遇到这种错误的时候我们一般的处理措施就是将dump headp中的信息(用于debugging),然后强制终止进程。?
Presto会将查询编译成字节码文件,因此Presto会生成很多class,因此我们我们应该增大Perm区的大小(在Perm中主要存储class)并且要允许Jvm class unloading。?
创建config.properties文件?该配置文件包含了Presto Server的所有配置信息。每个Presto Server既是Coordinator也是一个Worker。在大型集群中,处于性能考虑,建议单独用一台服务器作为Coordinator。?
coordinator节点的配置如下:?
Presto会将查询编译成字节码文件,因此Presto会生成很多class,因此我们我们应该增大Perm区的大小(在Perm中主要存储class)并且要允许Jvm class unloading。?
vim /opt/cloudera/parcels/presto/etc/coordinator-config.properties ?
coordinator=true?
node-scheduler.include-coordinator=false?
http-server.http.port=6660?
query.max-memory=4GB?
query.max-memory-per-node=1GB?
discovery-server.enabled=true?
discovery.uri=http://incubator-test-dc-001:6660?
worker节点的配置如下:?
vim /opt/cloudera/parcels/presto/etc/worker-config.properties ?
coordinator=false?
http-server.http.port=6660?
query.max-memory=4GB?
query.max-memory-per-node=1GB?
discovery.uri=http://incubator-test-dc-006:6660?


配置项说明:?
coordinator:指定是否运维Presto实例作为一个coordinator(接收来自客户端的查询情切管理每个查询的执行过程)。?
node-scheduler.include-coordinator:是否允许在coordinator服务中进行调度工作。对于大型的集群,在一个节点上的Presto server即作为coordinator又作为worke将会降低查询性能。因为如果一个服务器作为worker使用,那么大部分的资源都不会被worker占用,那么就不会有足够的资源进行关键任务调度、管理和监控查询执行。?
http-server.http.port:指定HTTP server的端口。Presto 使用 HTTP进行内部和外部的所有通讯。?
discovery.uri:Discoveryserver的URI。由于启用了Prestocoordinator内嵌的Discovery 服务,因此这个uri就是Prestocoordinator的uri。修改example.net:80,根据你的实际环境设置该URI。注意:这个URI一定不能以“/“结尾。?
新建日志文件log.properties?vim /opt/cloudera/parcels/presto/etc/log.properties?
com.facebook.presto=INFO?
重命名config文件?主节点?
/opt/cloudera/parcels/presto/etc/?
mv coordinator-config.properties config.properties?
work节点?
cd /opt/cloudera/parcels/presto/etc/?
mv worker-config.properties config.properties?
Presto服务启停?/opt/cloudera/parcels/presto/bin/launcher start?
#启动?


/opt/cloudera/parcels/presto/bin/launcher stop?
停止?
Presto-web???http://172.16.16.241/ui/???
Presto集成hive?1.在Presto集群的所有节点创建目录?
mkdir -p /opt/cloudera/parcels/presto/etc/catalog?
2.创建hive.properties,该文件与Hive服务集成使用?
vim /opt/cloudera/parcels/presto/etc/catalog/hive.properties ?
connector.name=hive-hadoop2?
hive.metastore.uri=thrift://incubator-test-dc-003:9083?

3.修改presto的jvm.config,在配置文件中增加Presto访问HDFS的用户名?
vim /opt/cloudera/parcels/presto/etc/jvm.config?
添加-DHADOOP_USER_NAME=presto?

4.上面的配置中指定了presto用户作为访问HDFS的用户,需要在集群所有节点添加presto用户?
useradd presto?

修改完后重启presto?
/opt/cloudera/parcels/presto/bin/launcher restart(所有集群机器执行)?

Presto集成hive测试?这里测试Presto与Hive的集成使用Presto提供的Presto CLI,该CLI是一个可执行的JAR文件,也意味着你可以想UNIX终端窗口一样来使用CLI。?
1.下载Presto的presto-cli-0.226-executable.jar,并重命名为presto并赋予可以执行权限?
??https://repo1.maven.org/maven2/com/facebook/presto/presto-cli/0.226/presto-cli-0.226-executable.jar???
2.复制客户端到所有主机上?
scp -r -P53742 /home/testcx/presto-cli-0.226-executable.jar root@incubator-test-dc-005:/opt/cloudera/parcels/presto/etc/?
2.复制客户端到所有主机上?
cd /opt/cloudera/parcels/presto/etc/?
mv presto-cli-0.226-executable.jar presto?
chmod +x presto ?

3.集群启用了Sentry,这里我们使用presto用户访问Hive所以为presto用户授权default库的所有权限?

4.Hive创建角色并授权?
#beeline?
#!connect jdbc:hive2://incubator-test-dc-001:10000/; user=hive; password=****?
create role presto; ?
grant role presto to group presto; ?
grant ALL on database default to role presto; ?
5.impala创建角色并授权?
su hive?
#impala-shell -i incubator-test-dc-002?
create role presto; ?
grant role presto to group presto; ?
grant ALL on database default to role presto; ?
执行查询语句?
[root@incubator-test-dc-001 etc]# ./presto --server localhost:6660 --catalog hive --schema=default

Presto集成kudu测试?添加kudu配置分发到所有节点上面?
# vim /opt/cloudera/parcels/presto/etc/catalog/kudu.properties?
connector.name=kudu?
kudu.client.master-addresses=incubator-test-dc-001:7051,incubator-test-dc-002:7051,incubator-test-dc-003:7051?
#重启服务?
/opt/cloudera/parcels/presto/bin/launcher restart?
#验证kudu?
select * from kudu.default."default.test_kudu_table"?
Presto集成ldaps?#apacheds安装ldaps?
groupadd apacheds?
#添加用户组?
useradd -s /bin/sh -g apacheds apacheds?
添加用户?
wget ??http://mirrors.ocf.berkeley.edu/apache//directory/apacheds/dist/2.0.0.AM25/apacheds-2.0.0.AM25-64bit.bin???
#下载授权?
chmod +x apacheds-2.0.0.AM25-64bit.bin?
./apacheds-2.0.0.AM25-64bit.bin?
#启动
/etc/init.d/apacheds-2.0.0.AM25-default start
[root@incubator-test-dc-002 presto_hue]# netstat -anplt |grep 10389
tcp 0 0 0.0.0.0:10389 0.0.0.0:* LISTEN 24770/java
#配置用户名和密码,ip地址
设置用户名密码,默认:user:uid=admin,ou=system password:secret
#连接客户端
配置客户端远程登录,这里使用Apache Directory Studio,配置界面如下,


打开配置-添加分区


Ctrl+S保存
重启服务
[root@incubator-test-dc-002 presto_hue]# /etc/init.d/apacheds-2.0.0.AM25-default restart
Stopping ApacheDS - default...
Stopped ApacheDS - default.
Starting ApacheDS - default...
[root@incubator-test-dc-002 presto_hue]#



添加组


添加


#添加用户


【#Word文档导入#大数据presto+Alluxio】

?
#启用ldaps
cd /var/lib/apacheds-2.0.0.AM25/default/conf/
密码:testCDH123!
/opt/jdk1.8.0_181/bin/keytool -genkeypair -alias apacheds -keyalg RSA -validity 7 -keystore ads.keystore
chown apacheds:apacheds ./ads.keystore
#配置apacheds.cer
/opt/jdk1.8.0_181/bin/keytool -export -alias apacheds -keystore ads.keystore -rfc -file apacheds.cer
#默认口令
changeit
## 将证书导入系统证书库,实现自认证,这里的密钥库口令是默认的: /opt/jdk1.8.0_181/bin/keytool -import -file apacheds.cer -alias apacheds -keystore /usr/java/jdk1.8.0_181-cloudera/jre/lib/security/cacerts


#配置证书


/var/lib/apacheds-2.0.0.AM25/default/conf/ads.keystore


/etc/init.d/apacheds-2.0.0.AM25-default restart
配置客户端




#测试presto-ldaps
cd /data/presto-server-0.228/etc
/opt/jdk1.8.0_181/bin/keytool -genkeypair -alias presto -keyalg RSA -keystore presto.jks
修改config.properties,添加
http-server.authentication.type=PASSWORD
http-server.https.enabled=true
http-server.https.port=8443
http-server.https.keystore.path=/data/presto-server-0.228/etc/presto.jks
http-server.https.keystore.key=testCDH123!


# vi password-authenticator.properties
password-authenticator.name=ldap
ldap.url=ldaps://172.16.16.246:10636
ldap.user-bind-pattern=uid=$USER,ou=people,dc=test,dc=hadoop
ldap.user-base-dn=dc=test,dc=hadoop


2.Alluxio 内存存储系统部署安装2.1下载并解压?wget ??https://downloads.alluxio.io/downloads/files/2.0.1/alluxio-2.0.1-bin.tar.gz???
cp conf/alluxio-site.properties.template conf/alluxio-site.properties?
2.1拷贝软件到所有节点?scp -r -P53742 /opt/cloudera/parcels/alluxio/ root@incubator-test-dc-002:/opt/cloudera/parcels/?

cd /opt/cloudera/parcels/alluxio/alluxio-2.0.1?
cp conf/alluxio-site.properties.template conf/alluxio-site.properties?
2.1修改配置(集群所有机器)?vim alluxio-site.properties?
alluxio.master.hostname=172.16.16.241?

vim alluxio-site.properties?
更新conf/alluxio-site.properties中的alluxio.master.hostname为你将运行Alluxio Master的机器的主机名。添加所有worker节点的IP地址到conf/workers文件
alluxio.home=/opt/cloudera/parcels/alluxio/alluxio-2.0.1?
alluxio.work.dir=/opt/cloudera/parcels/alluxio/alluxio-2.0.1?
alluxio.conf.dir=$alluxio.home/conf?
alluxio.logs.dir=$alluxio.home/logs?
alluxio.master.mount.table.root.ufs=hdfs://incubator-test-dc-001:8020/alluxio?
#hdfs挂载地址?
alluxio.metrics.conf.file=$alluxio.conf.dir/metrics.properties?
alluxio.master.hostname=incubator-test-dc-001?
alluxio.underfs.address=hdfs://incubator-test-dc-001:8020/alluxio?
alluxio.underfs.hdfs.configuration=/etc/hadoop/conf/core-site.xml?
alluxio.master.bind.host=172.16.16.241?
alluxio.master.journal.folder=/opt/cloudera/parcels/alluxio/alluxio-2.0.1/journal?
alluxio.master.web.bind.host=172.16.16.241?
alluxio.master.web.hostname=incubator-test-dc-001?
alluxio.master.web.port=6661?
alluxio.worker.bind.host=0.0.0.0?
alluxio.worker.memory.size=2048MB?
alluxio.worker.tieredstore.levels=1?
alluxio.worker.tieredstore.level0.alias=MEM?
alluxio.worker.tieredstore.level0.dirs.path=/mnt/ramdisk?
JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera?
alluxio.user.network.netty.timeout.ms=600000?
alluxio.master.security.impersonation.presto.users=*?

#scp所有机器?
scp -r -P53742 alluxio-site.properties root@incubator-test-dc-002:/opt/cloudera/parcels/alluxio/alluxio-2.0.1/conf/?
scp -r -P53742 alluxio-masters.sh alluxio-workers.sh alluxio-start.sh root@incubator-test-dc-002:/opt/cloudera/parcels/alluxio/alluxio-2.0.1/bin?


vim workers?
172.16.16.246
172.16.16.250
172.16.16.242
172.16.16.249



vim masters



cp -rf alluxio-env.sh.template alluxio-env.sh
vim alluxio-env.sh(所有机器)
#添加
export ALLUXIO_SSH_OPTS="-p 53742"
export JAVA_HOME=/usr/java/jdk1.8.0_181-cloudera

cd /opt/cloudera/parcels/alluxio/alluxio-2.0.1/bin
vim alluxio-masters.sh
添加-p 53742
cd /opt/cloudera/parcels/alluxio/alluxio-2.0.1/bin
vim alluxio-workers.sh
搜索ssh
添加-p 53742



[root@incubator-test-dc-001 bin]# ln -s /opt/jdk1.8.0_181/bin/java /usr/bin/java
[root@incubator-test-dc-001 bin]# /usr/bin/java -version
java version "1.8.0_181"
Java(TM) SE Runtime Environment (build 1.8.0_181-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.181-b13, mixed mode)
[root@incubator-test-dc-001 bin]#


./alluxio format
报错需要在所有节点
创建mkdir -p /mnt/ramdisk/alluxioworker
如果不创建会报如下错误。



2.1初始化alluxio?
cd /opt/cloudera/parcels/alluxio/alluxio-2.0.1/bin
[root@incubator-test-dc-001 bin]# ./alluxio format
Executing the following command on all worker nodes and logging to /opt/cloudera/parcels/alluxio/alluxio-2.0.1/logs/task.log: /opt/cloudera/parcels/alluxio/alluxio-2.0.1/bin/alluxio formatWorker
Waiting for tasks to finish...
All tasks finished
Executing the following command on all master nodes and logging to /opt/cloudera/parcels/alluxio/alluxio-2.0.1/logs/task.log: /opt/cloudera/parcels/alluxio/alluxio-2.0.1/bin/alluxio formatJournal
Waiting for tasks to finish...
All tasks finished
2.1启动alluxio?./alluxio-start.sh all NoMountalluxio-start.sh
./alluxio-start.sh all SudoMount
??http://172.16.16.241:6661/overview???
2.1测试?[root@incubator-test-dc-001 bin]# echo "1.txt"> 1.txt
[root@incubator-test-dc-001 bin]# ll
total 68
-rw-r--r-- 1 root root 6 Oct 14 20:27 1.txt
-rwxrwxrwx 1 501 games 11808 Oct 12 14:35 alluxio
-rwxrwxrwx

    推荐阅读