The column KEY._col2:0._col0 is not in the vectorization context… – 小飞侠

The column KEY._col2:0._col0 is not in the vectorization context…

Post author:xfxia
Post published:2023年9月18日
Post category:其他

问题出现场景

shell脚本运行hql时报错：

FAILED: SemanticException org.apache.hadoop.hive.ql.metadata.HiveException: The column KEY._col2:0._col0 is not in the vectorization context column map {KEY._col0=0, KEY._col1=1, KEY._col2=2, VALUE._col1=3}.

运行的SQL语句：

select content_sort, content_type, count(distinct postid) as content_cnt, sum(topic_num) as topic_cnt      
from tmp_post where content_sort = 'post' group by content_type, content_sort
         union
select content_sort, content_type, count(distinct  ugc_id) as content_cnt, sum(topic_num) as topic_cnt
from tmp_ugc where content_sort = 'blog' group by content_type, content_sort

依然是不懂啥原因，通过反复验证得出的结论是：有distinct + union函数就会导致hive的向量化执行失败（注：执行引擎为spark）

解决方案

在拜读了大佬博客：

https://www.codenong.com/jscb200f6bd25b/

之后，我于是进到hive的客户端进行如下设置

结果再次执行上述shell脚本的时候，依然是同样的报错（看来这个并不支持全局设置？）
第二次尝试，直接在脚本中的sql语句执行之前加上这个设置，每次执行该语句之前都关闭向量化执行，最后脚本顺利执行结束。
结论：在hive客户端设置set hive.vectorized.execution.enabled = false;不生效，得在sql执行时配合使用该设置。

最后唠叨一句

有任何问题，欢迎过路大佬批评指正，我有一颗赤诚的知错就改，求知若渴的心

0.0

版权声明：本文为mjjyszazc原创文章，遵循 CC 4.0 BY-SA 版权协议，转载请附上原文出处链接和本声明。

原文链接：https://blog.csdn.net/mjjyszazc/article/details/119423935