hive命令行01

Hive 是基于 Hadoop 的一个数据仓库工具，用来进行数据提取、转化、加载，这是一种可以存储、查询和分析存储在Hadoop中的大规模数据的机制。hive数据仓库工具能将结构化的数据文件映射为一张数据库表，并提供 SQL 查询功能，能将SQL语句转变成 MapReduce 任务来执行。Hive的优点是学习成本低，可以通过类似 SQL语句实现快速MapReduce统计，使MapReduce变得更加简单，而不必开发专门的MapReduce应用程序。hive十分适合对数据仓库进行统计分析。以下是我学习hive记录的笔记第一部分：

 set;  
	输出命名空间 hive var,hiveconf,system和env中所有的变量

set -v;
	在上述内容基础上，还会打印Hadoop中所定义的所有属性。

hivevar:用户自定义变量
hiveconf:hive相关的配置
system： java 定义的配置属性
env:shell环境定义的环境变量

set hivevar:tableName=var_test;
set hivevar:var=age;

create table czs.${hivevar:tableName}(i int,${hivevar:var} string);

describe czs.${hivevar:tableName};

drop table czs.${hivevar:tableName};

set hiveconf:hive.cli.print.current.db;
	默认为false,
	开启后提示符前会显示当前所在数据库

hive -e "select * from czs.docs"
	hive一次使用命令

hive -S -e "select * from czs.docs" > docs_select
	hive静默模式 将查询出来的数据重定向到一个文件当中

hive -S -e "set" | grep warehouse
	hive很nice呀

hive -f xxx.sql 执行hive脚本文件
hive -S -f xxx.sql 岂不是更好

在hive-Shell中执行脚本和 mysql 一样
	source path/xxx.sql


hive执行shell脚本
	!pwd;	!ls;	!du -sh;

hive使用 hdfs 的dfs命令
	dfs -ls /
	dfs -cat *.sql
	hive -S -e "dfs -ls /"

hive 注释 --

hive显示查询出来的数据字段名称
	hive -S -e "set hive.cli.print.header=true"
	hive -S -e "select * from czs.person"

${HOME}目录下新建文件  .hiverc  设置hive-cli属性
	#在命令行中显示当前数据库名
    set hive.cli.print.current.db=true; 
    #查询出来的结果显示列的名称
    set hive.cli.print.header=true;
    #启用桶表
    set hive.enforce.bucketing=true;
    #压缩hive的中间结果
    set hive.exec. compress .intermediate=true;
    #对map端输出的内容使用BZip2编码/解码器
    set mapred.map. OUTPUT .compression.codec=org.apache.hadoop.io.compress.BZip2Codec;
    #压缩hive的输出
    set hive.exec.compress.output=true;
    #对hive中的MR输出内容使用BZip2编码/解码器
    set mapred.output.compression.codec=org.apache.hadoop.io.compress.BZip2Codec;
    #让hive尽量尝试local模式查询而不是mapred方式
    set hive.exec.mode.local.auto=true;


测试 timestamp 类型的数据
	unix_timestamp函数
		select unix_timestamp(); 返回当前时间戳
		select unix_timestamp('2018-06-29 00:00:00'); 返回指定时间时间戳
		select unix_timestamp('2018/06/29 09', 'yyyy/MM/dd HH');

	from_unixtime函数
		select from_unixtime(1000000000); 格式为 yyyy-MM-dd HH:mm:ss
		select from_unixtime(1000000000, 'yyyy/MM/dd HH');

	from_utc_timestamp函数
		select from_utc_timestamp('1970-01-01 00:00:00','PRC') as bj_time;
		将utc的时间转化为北京时区的时间
	to_utc_timestamp()
		select to_utc_timestamp('1970-01-01 08:00:00','PRC') as bj_time;
		将北京时间转化为utc时间

drop table if  exists  tms;
create table tms as
select name,unix_timestamp() as tms from person;

drop table if exists dt_test;
create table dt_test as
select name,from_unixtime(tms) as dt from tms;

类型强制转化
	select id,cast(age as double) from person;


文本文件数据编码
	 CSV  逗号分隔符
	TSV 制表符分隔符

	\n 换行符
	^A (CTRL+A) \001 分割字段列
	^B          \002 分割array or  struct 类型的数据 
	^C          \003 分割map


CREATE TABLE `employees`(
  `name` string, 
  `salary` float, 
  `subordinates` array<string>, 
  `deductions` map<string,float>, 
  `address` struct<street:string,city:string,state:string,zip:int>)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://localhost:9000/user/hive/warehouse/czs.db/employees'
TBLPROPERTIES (
  'transient_lastDdlTime'='1595918775')

row format delimited 要写在其他关键字之前

hive加载数据文件时不会验证数据，而是在查询的时候验证，也就是读时模式


 HQL  --Hive Query Language
	create database czs;
	create database if not exists czs;
	create database if not exists czs_cp location '/czs_cp';
	create database if not exists czs_cp location '/czs_cp' with dbproperties('creator'='czs','date'='2020/08/01','msg'='我好想你');
	
	show databases;
	show databases like 'h.*'
	describe database czs;
	describe database extended czs_cp;

	drop database if exists czs_cp;
	alter database czs_cp set dbproperties ('edited-by'='czs');


复制表
	潜复制(不拷贝数据) 可以指定location
	create table person_cp like person location xxx; 

	完全复制(MR作业)
	create table person_cp_all as select * from person;

	TBLPROPERTIES
		记录着键值对信息 表额外增加的文档说明

	查看所有表名
		show tables in czs/default;
		show tables 'person*';
		describe person;
		describe extended person;
		describe formatted person; --使用的更多些
		查看表列的信息
		describe person.age;


ROW FORMAT SERDE 
  'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' 
WITH SERDEPROPERTIES ( 
  'field.delim'='\t', 
  ' serialization .format'='\t', 
  'serialization.null.format'='') 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  'hdfs://earth/user/db_jian_cheng/biz_detail_all/';

  az copy  copy '/data2/tt_proc_hive/ddl_hive.zip' '#39;

智云一二三科技

hive命令行01

关于作者: 智云科技

给这篇文章的作者打赏

关于作者: 智云科技

相关文章

Map和List的几种遍历方式

全网大佬都在用的Java+Python这两套视频学习教程，学习很重要

JavaWeb快速进阶全套教程(程序员必备2020版)：视频+笔记+源码

热门文章

1分享新浪图床上传接口源码

2PHP简单实现路由Route功能

3Tideways、xhprof 和 xhgui 打造 PHP 非侵入式监控平台

4centos系统如何查看是否安装了mysql

5curl 工具简述