Hive 执行计划

2023-06-08,,

执行语句

hive> explain select s.id, s.name from student s left outer join student_tmp st on s.name = st.name;

结果,红色字体为我添加的注释

hive> explain select s.id, s.name from student s left outer join student_tmp st on s.name = st.name;
OK
ABSTRACT SYNTAX TREE:
(TOK_QUERY (TOK_FROM (TOK_LEFTOUTERJOIN (TOK_TABREF (TOK_TABNAME student) s) (TOK_TABREF (TOK_TABNAME student_tmp) st) (= (. (TOK_TABLE_OR_COL s) name) (. (TOK_TABLE_OR_COL st) name)))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR TOK_TMP_FILE)) (TOK_SELECT (TOK_SELEXPR (. (TOK_TABLE_OR_COL s) id)) (TOK_SELEXPR (. (TOK_TABLE_OR_COL s) name))))) STAGE DEPENDENCIES: “这个sql将被分成两个阶段执行。基本上每个阶段会对应一个mapreduce job,Stage-0除外。因为Stage-0只是fetch结果集,不需要mapreduce job”
Stage- is a root stage
Stage- is a root stage STAGE PLANS:
Stage: Stage-
Map Reduce
Alias -> Map Operator Tree: “map job开始”
s
TableScan
alias: s “扫描表student”
Reduce Output Operator “这里描述map的输出,也就是reduce的输入。比如key,partition,sort等信息。”
key expressions: “reduce job的key”
expr: name
type: string
sort order: + “这里表示按一个字段排序,如果是按两个字段排序,那么就会有两个+(++),更多以此类推”
Map-reduce partition columns: “partition的信息,由此也可以看出hive在join的时候会以join on后的列作为partition的列,以保证具有相同此列的值的行被分到同一个reduce中去”
expr: name
type: string
tag: 0 “用于标示这个扫描的结果,后面的join会用到它”
value expressions: “表示select 后面的列”
expr: id
type: int
expr: name
type: string
st
TableScan “开始扫描第二张表,和上面的一样”
alias: st
Reduce Output Operator
key expressions:
expr: name
type: string
sort order: +
Map-reduce partition columns:
expr: name
type: string
tag:
Reduce Operator Tree: “reduce job开始”
Join Operator
condition map:
Left Outer Join0 to 1 “tag 0 out join tag 1”
condition expressions: “这里也是描述select 后的列,和join没有关系。这里我们的select后的列是 s.id 和 s.name, 所以0后面有两个字段, 1后面没有”
{VALUE._col0} {VALUE._col2} handleSkewJoin: false
outputColumnNames: _col0, _col2
Select Operator
expressions:
expr: _col0
type: int
expr: _col2
type: string
outputColumnNames: _col0, _col1
File Output Operator
compressed: false
GlobalTableId:
table:
input format: org.apache.hadoop.mapred.TextInputFormat
output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Stage: Stage-
Fetch Operator
limit: - Time taken: 0.216 seconds

Hive 执行计划的相关教程结束。

《Hive 执行计划.doc》

下载本文的Word格式文档,以方便收藏与打印。