数仓实验1

张开发
2026/4/15 1:01:12 15 分钟阅读

分享文章

数仓实验1
Hive的安装部署安装MySQL数据库在hadoop01节点上(1) 在线安装MySQL使用root用户登录hadoop01节点执行以下命令# 切换到root用户sudo-i# 在线安装MySQLaptinstallmysql-server-y(2) 启动MySQL服务# 启动MySQL服务systemctl start mysql#如果你想 设置MySQL开机自启systemctlenablemysql(3) 设置MySQL root用户密码1) 无密码登录MySQLmysql-uroot-p# 出现密码提示时直接按Enter键2) 设置root用户密码在MySQL客户端中执行ALTERUSERrootlocalhostIDENTIFIEDWITHmysql_native_passwordBYroot;FLUSHPRIVILEGES;exit3) 有密码登录MySQL验证mysql-uroot-p# 输入密码root(4) 创建Hive账户在MySQL客户端中执行CREATEUSERhive%IDENTIFIEDBYhive;2) 授予Hive用户权限-- 赋予所有数据库的所有权限GRANTALLPRIVILEGESON*.*TOhive%WITHGRANTOPTION;-- 确保 hive 用户允许从任何主机连接CREATEUSERIFNOTEXISTShive%IDENTIFIEDBYhive;GRANTALLPRIVILEGESON*.*TOhive%WITHGRANTOPTION;FLUSHPRIVILEGES;-- 刷新权限FLUSHPRIVILEGES;3) 验证Hive用户登录#密码为hivemysql-hhadoop01-uhive-p遇到问题了hive连接不上# 修改 bind-address 为正确的 IPsudosed-is/bind-address.*.*/bind-address 192.168.255.145//etc/mysql/mysql.conf.d/mysqld.cnf# 验证修改sudogrepbind-address /etc/mysql/mysql.conf.d/mysqld.cnfsudosystemctl restart mysql安装Hive(1) 下载Hive切换到hadoop用户suhadoopcd/home/hadoop/app从官网下载或使用本书配套资源中的Hive安装包# 如果使用wget下载wgethttp://archive.apache.org/dist/hive/hive-2.3.7/apache-hive-2.3.7-bin.tar.gz(2) 解压Hive并创建软链接# 解压Hive安装包tar-zxvfapache-hive-2.3.7-bin.tar.gz# 创建软链接ln-sapache-hive-2.3.7-bin hive(3) 配置Hive环境变量编辑hadoop用户的bashrc文件vim~/.bashrc在文件末尾添加以下内容exportHIVE_HOME/home/hadoop/app/hiveexportPATH$JAVA_HOME/bin:$HADOOP_HOME/bin:$HIVE_HOME/bin:$PATH使配置生效source~/.bashrc(4) 配置hive-site.xmlcd$HIVE_HOME/confcphive-default.xml.template hive-site.xml编辑hive-site.xml文件修改以下配置1) 配置MySQL连接驱动propertynamejavax.jdo.option.ConnectionDriverName/namevaluecom.mysql.cj.jdbc.Driver/valuedescriptionDriver class name for a JDBC metastore/description/property2) 修改MySQL连接URLpropertynamejavax.jdo.option.ConnectionURL/namevaluejdbc:mysql://hadoop01:3306/hive?createDatabaseIfNotExisttrueamp;useSSLfalse/valuedescriptionJDBC connect string for a JDBC metastore/description/property3) 配置数据库用户名和密码propertynamejavax.jdo.option.ConnectionUserName/namevaluehive/valuedescriptionUsername to use against metastore database/description/propertypropertynamejavax.jdo.option.ConnectionPassword/namevaluehive/valuedescriptionpassword to use against metastore database/description/property4) 添加Hive数据目录propertynamehive.querylog.location/namevalue/home/hadoop/app/hive/iotmp/valuedescriptionLocation of Hive run time structured log file/description/propertypropertynamehive.exec.local.scratchdir/namevalue/home/hadoop/app/hive/iotmp/valuedescriptionLocal scratch space for Hive jobs/description/propertypropertynamehive.downloaded.resources.dir/namevalue/home/hadoop/app/hive/iotmp/valuedescriptionTemporary local directory for added resources in the remote file system./description/property需要删除的代码块propertynamejavax.jdo.option.ConnectionPassword/namevaluemine/valuedescriptionpassword to use against metastore database/description/propertypropertynamejavax.jdo.option.ConnectionUserName/namevalueAPP/valuedescriptionUsername to use against metastore database/description/propertypropertynamejavax.jdo.option.ConnectionURL/namevaluejdbc:derby:;databaseNamemetastore_db;createtrue/value/propertypropertynamejavax.jdo.option.ConnectionDriverName/namevalueorg.apache.derby.jdbc.EmbeddedDriver/valuedescriptionDriver class name for a JDBC metastore/description/property把临时目录改了propertynamehive.exec.scratchdir/namevalue/tmp/hive/valuedescription.../description/property改成自己的临时目录propertynamehive.exec.scratchdir/name!-- 改为你已创建的 /home/hadoop/app/hive/iotmp --value/home/hadoop/app/hive/iotmp/valuedescription.../description/property把{…}里面的说清楚改成propertynamehive.exec.local.scratchdir/name!-- 原配置${system:java.io.tmpdir}/${system:user.name} --value/home/hadoop/app/hive/iotmp/value!-- 替换为具体路径 --descriptionLocal scratch space for Hive jobs/description/propertypropertynamehive.downloaded.resources.dir/name!-- 原配置${system:java.io.tmpdir}/${hive.session.id}_resources --value/home/hadoop/app/hive/iotmp/${hive.session.id}_resources/value!-- 替换前缀 --descriptionTemporary local directory for added resources in the remote file system./description/property以及propertynamehive.querylog.location/name!-- 原配置${system:java.io.tmpdir}/${system:user.name} --value/home/hadoop/app/hive/iotmp/value!-- 替换为具体路径 --descriptionLocation of Hive run time structured log file/description/propertypropertynamehive.server2.logging.operation.log.location/name!-- 原配置${system:java.io.tmpdir}/${system:user.name}/operation_logs --value/home/hadoop/app/hive/iotmp/operation_logs/value!-- 替换为具体路径 --description.../description/property验证 XML 是否正确(5) 添加MySQL驱动包下载mysql-connector-java-8.0.33 .jar驱动包在hive的lib目录下https://downloads.mysql.com/archives/c-j/(6) 创建Hive临时目录mkdir-p/home/hadoop/app/hive/iotmp(7) 初始化Hive元数据库先启动好hdfs,yarnmysqlcd/home/hadoop/app/hive schematool-dbTypemysql-initSchema启动和验证Hive(1) 启动Hive# 确保Hadoop集群已启动也就是zookeeper,hdfs,yarncd/home/hadoop/app/zookeeper/bin ./zkServer.sh start start-dfs.sh start-yarn.sh# 启动Hivehive(2) 验证Hive安装在Hive命令行中执行-- 显示数据库showdatabases;-- 创建测试表createtabletest(idint,name string);-- 查看表showtables;-- 插入数据insertintotestvalues(1,test);-- 查询数据select*fromtest;故障排除常见问题1MySQL连接失败解决方案检查MySQL服务是否启动service mysqld status检查MySQL用户权限确保hive用户有远程登录权限检查防火墙设置systemctl stop firewalld常见问题2HDFS权限问题解决方案在Hadoop的core-site.xml中添加propertynamehadoop.proxyuser.hadoop.hosts/namevalue*/value/propertypropertynamehadoop.proxyuser.hadoop.groups/namevalue*/value/property常见问题3元数据库初始化失败解决方案删除MySQL中的hive数据库mysql -u root -p -e DROP DATABASE hive重新初始化schematool -dbType mysql -initSchemaWordCount 实验7.1 准备数据文件# 创建数据目录# sudo mkdir -p /opt/hive/data# 如果提示权限不足可以用sudo或改成/home/hadoop/data# 这里建议改成hadoop用户有权限的目录避免权限问题mkdir-p/home/hadoop/hive_data# 创建单词文件vim/home/hadoop/hive_data/words.txt输入以下内容hive hadoop mapreduce hadoop hive spark hive7.2 在 Hive 中创建表并加载数据进入 Hive 命令行后CREATETABLEwords(line STRING);LOADDATALOCALINPATH/home/hadoop/hive_data/words.txtINTOTABLEwords;7.3 执行 WordCountSELECT*FROMwords;SELECTword,COUNT(*)AScountFROM(SELECTexplode(split(line, ))ASwordFROMwords)wGROUPBYwordORDERBYcountDESC;

更多文章