一、简介
HUE=Hadoop User Experience
Hue是一个开源的Apache Hadoop UI系统,由Cloudera Desktop演化而来,最后Cloudera公司将其贡献给Apache基金会的Hadoop社区,它是基于Python Web框架Django实现的。通过使用Hue我们可以在浏览器端的Web控制台上与Hadoop集群进行交互来分析处理数据,例如操作HDFS上的数据,运行MapReduce Job,执行Hive的SQL语句,浏览HBase数据库等等。
Hue在数据库方面,默认使用的是SQLite数据库来管理自身的数据,包括用户认证和授权,另外,可以自定义为MySQL数据库、Postgresql数据库、以及Oracle数据库。其自身的功能包含有:
- 对HDFS的访问,通过浏览器来查阅HDFS的数据。
- Hive编辑器:可以编写HQL和运行HQL脚本,以及查看运行结果等相关Hive功能。
- 提供Solr搜索应用,并对应相应的可视化数据视图以及DashBoard。
- 提供Impala的应用进行数据交互查询。
- 最新的版本集成了Spark编辑器和DashBoard
- 支持Pig编辑器,并能够运行编写的脚本任务。
- Oozie调度器,可以通过DashBoard来提交和监控Workflow、Coordinator以及Bundle。
- 支持HBase对数据的查询修改以及可视化。
- 支持对Metastore的浏览,可以访问Hive的元数据以及对应的HCatalog。
- 另外,还有对Job的支持,Sqoop,ZooKeeper以及DB(MySQL,SQLite,Oracle等)的支持。
二、安装配置
打开hiveserver2
hive --service hiveserver2 否则通过Hue Web控制无法执行Hive查询
配置Hadoop的配置文件core-site.xml,添加以下内容
<property> <name>hadoop.proxyuser.hue.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.hue.groups</name> <value>*</value> </property> //如果在pseudo-distributed.ini中配置的是root //server_user=root //server_group=root //default_user=root //default_hdfs_superuser=root //就把hue改成root,给root权限 <property> <name>hadoop.proxyuser. client.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser. client.groups</name> <value>*</value> </property>
配置Hadoop的配置文件hdfs-site.xml,添加以下内容
<property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
使用yum工具来添加maven源
进入root权限wget http://repos.fedorapeople.org/repos/dchen/apache-maven/epel-apache-maven.repo -O /etc/yum.repos.d/epel-apache-maven.repo
使用yum工具来安装依赖
进入root权限yum install ant asciidoc cyrus-sasl-devel cyrus-sasl-gssapi cyrus-sasl-plain gcc gcc-c++ krb5-devel libffi-devel libxml2-devel libxslt-devel make mysql mysql-devel openldap-devel python-devel sqlite-devel gmp-devel
使用yum工具来安装git
进入root权限yum install git //默认安装路径在/usr/libexec/下
下载源码
进入root权限
git clone https://github.com/cloudera/hue.git branch-3.9
安装python环境
进入roor权限yum install python-devel
安装可能缺少的依赖
进入root权限yum install libffi-devel yum install gcc openssl-devel yum install libxslt-devel yum install mysql-server mysql mysql-devel yum install gmp-devel yum install sqlite-devel yum install openldap-devel
编译源码,编译完成后可以选择安装
cd branch-3.9 make apps //编译出错后 make clean之后再编译 //可选 make install
进入到安装目录branch-3.9下,访问Hue(此时只能进去,但是无法查看里面的各个组件信息)
进入root权限build/env/bin/hue runserver //默认端口由Django服务器提供,端口号是127.0.0.1:8000 build/env/bin/hue runserver 0.0.0.0:8000 //添加第三个参数 打开外网访问 linux ip地址:8000
修改desktop/conf/pseudo-distributed.ini配置文件
[desktop] http_host=0.0.0.0 http_port=8000 server_user=root server_group=root default_user=root default_hdfs_superuser=root —————————————————————————————————————————————————————————————————————————————————— [hadoop] # Enter the filesystem uri fs_defaultfs=hdfs://oracle:9000 # Use WebHdfs/HttpFs as the communication mechanism. # Domain should be the NameNode or HttpFs host. # Default port is 14000 for HttpFs. webhdfs_url=http://oracle:50070/webhdfs/v1 # Directory of the Hadoop configuration hadoop_hdfs_home=/home/oracle/hadoop-2.6.0 hadoop_bin=/home/oracle/hadoop-2.6.0/bin hadoop_conf_dir=/home/oracle/hadoop-2.6.0/etc/hadoop —————————————————————————————————————————————————————————————————————————————————— [[yarn_clusters]] # Enter the host on which you are running the ResourceManager resourcemanager_host=oracle # The port where the ResourceManager IPC listens on resourcemanager_port=8032 # Whether to submit jobs to this cluster submit_to=True # Change this if your YARN cluster is Kerberos-secured security_enabled=true # URL of the ResourceManager API resourcemanager_api_url=http://oracle:8088 # URL of the ProxyServer API proxy_api_url=http://oracle:8088 # URL of the HistoryServer API history_server_api_url=http://oracle:19888 —————————————————————————————————————————————————————————————————————————————————— [[mapred_clusters]] # Enter the host on which you are running the Hadoop JobTracker jobtracker_host=oracle # The port where the JobTracker IPC listens on jobtracker_port=8021 # Whether to submit jobs to this cluster submit_to=False # Change this if your MapReduce cluster is Kerberos-secured security_enabled=true —————————————————————————————————————————————————————————————————————————————————— [beeswax] # Host where HiveServer2 is running. # If Kerberos security is enabled, use fully-qualified domain name (FQDN). hive_server_host=oracle # Port where HiveServer2 Thrift server runs on. hive_server_port=10000 # Hive configuration directory, where hive-site.xml is located hive_conf_dir=/home/oracle/apache-hive-1.2.1-bin/conf ——————————————————————————————————————————————————————————————————————————————————
配置Hadoop的配置文件yarn-site.xml,添加以下内容
<property> <name>yarn.resourcemanager.hostname</name> <value>oracle</value> </property> <!-- historyserver############ --> <property> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>106800</value> </property> <!-- historyserver############ -->
在打开网页时,先启动Hadoop,HBase,HBase Thrift,Hive
hbase-daemon.sh start thrift -p 9090 hive --service hiveserver2 // 不要关闭该选项卡
在外网下访问端口号:8000
Browsers → Files 进入File Browser,可以查看HDFS文件 Browsers → JObs 进入Job Browser,可以查看MapReduce的Job信息 Browsers → HBases 进入Hbase Browser,可以查看Hbase中的表单信息 Editor → Hive 进入Hive,可以对Hive中的表单进行操作
修改权限
chown -R root:root branch-3.7.1/
出现database is locked错误
将hue自己的数据库迁移到mysql下
//注意database不要找错了
[[database]]
# Database engine is typically one of:
# postgresql_psycopg2, mysql, sqlite3 or oracle.
#
# Note that for sqlite3, 'name', below is a path to the filename. For other backends, it is the database name.
# Note for Oracle, options={"threaded":true} must be set in order to avoid crashes.
# Note for Oracle, you can use the Oracle Service Name by setting "host=" and "port=" and then "name=<host>:<port>/<service_name>".
# Note for MariaDB use the 'mysql' engine.
engine=mysql
host=localhost
port=3306
user=root
password=root
# conn_max_age option to make database connection persistent value in seconds
# https://docs.djangoproject.com/en/1.9/ref/databases/#persistent-connections
## conn_max_age=0
# Execute this script to produce the database password. This will be used when 'password' is not set.
## password_script=/path/script
name=hue
## options={}
进入root权限,进入hue安装目录下
build/env/bin/hue syncdb
build/env/bin/hue migrate
再进入hue时需要使用在 syncdb 中设置的username和password
oracle 123456