错误记录–hive临时表小于1的小数与零比较

  • Post author:
  • Post category:其他


在用真实数据测试之前写过的一段代码逻辑时,发现了一个很神奇的错误,sql看着没错,但是判断index_value>0的时候明明有4条数据大于零,但是查询跑出的结果却是2条,由于数据量太大不好验证,所以写了个demo来验证。

	public static void main(String[] args) {
		Date startTime = new Date(System.currentTimeMillis());
		SparkUtil sparkUtil = SparkUtil.getInstance();
		JavaSparkContext sparkContext = sparkUtil.getSparkContext(null, "f251");
		HiveContext hiveContext = sparkUtil.getHiveContext(sparkContext.sc());
		
		List<Index> indexList = new ArrayList();
		indexList.add(new Index("F005", "100"));
		indexList.add(new Index("F005", "0.1"));
		indexList.add(new Index("F005", "5"));
		indexList.add(new Index("F005", "0.6"));
		JavaRDD<Index> statMonth0100Rdd = sparkContext.parallelize(indexList);
		Dataset<Row> dataSet = hiveContext.createDataFrame(statMonth0100Rdd, Index.class);
		dataSet.show();
		dataSet.toDF().registerTempTable("F005TB");
		String sql = "select * from F005TB where indexValue>0";
		Dataset<Row> groupPowerDS = sparkUtil.executeSqlHive(hiveContext, sql);
		groupPowerDS.show();
	}

查询结果为:

+---------+----------+
|indexCode|indexValue|
+---------+----------+
|     F005|       100|
|     F005|         5|
+---------+----------+

考虑到可能是类型转换出现了问题,因此修改代码如下:

public static void main(String[] args) {
		Date startTime = new Date(System.currentTimeMillis());
		SparkUtil sparkUtil = SparkUtil.getInstance();
		JavaSparkContext sparkContext = sparkUtil.getSparkContext(null, "f251");
		HiveContext hiveContext = sparkUtil.getHiveContext(sparkContext.sc());
		
		List<Index> indexList = new ArrayList();
		indexList.add(new Index("F005", "100"));
		indexList.add(new Index("F005", "0.1"));
		indexList.add(new Index("F005", "5"));
		indexList.add(new Index("F005", "0.6"));
		JavaRDD<Index> statMonth0100Rdd = sparkContext.parallelize(indexList);
		Dataset<Row> dataSet = hiveContext.createDataFrame(statMonth0100Rdd, Index.class);
		dataSet.show();
		dataSet.toDF().registerTempTable("F005TB");
		String sql = "select * from F005TB where cast(indexValueAS float) >0";
		Dataset<Row> groupPowerDS = sparkUtil.executeSqlHive(hiveContext, sql);
		groupPowerDS.show();
	}

运行结果为:

+---------+----------+
|indexCode|indexValue|
+---------+----------+
|     F005|       100|
|     F005|       0.1|
|     F005|         5|
|     F005|       0.6|
+---------+----------+

修改代码如下:

public static void main(String[] args) {
		Date startTime = new Date(System.currentTimeMillis());
		SparkUtil sparkUtil = SparkUtil.getInstance();
		JavaSparkContext sparkContext = sparkUtil.getSparkContext(null, "f251");
		HiveContext hiveContext = sparkUtil.getHiveContext(sparkContext.sc());
		
		List<Index> indexList = new ArrayList();
		indexList.add(new Index("F005", "100"));
		indexList.add(new Index("F005", "0.1"));
		indexList.add(new Index("F005", "5"));
		indexList.add(new Index("F005", "0.6"));
		JavaRDD<Index> statMonth0100Rdd = sparkContext.parallelize(indexList);
		Dataset<Row> dataSet = hiveContext.createDataFrame(statMonth0100Rdd, Index.class);
		dataSet.show();
		dataSet.toDF().registerTempTable("F005TB");
		String sql = "select * from F005TB where indexValue>0.0";
		Dataset<Row> groupPowerDS = sparkUtil.executeSqlHive(hiveContext, sql);
		groupPowerDS.show();
	}

运行结果为:

+---------+----------+
|indexCode|indexValue|
+---------+----------+
|     F005|       100|
|     F005|       0.1|
|     F005|         5|
|     F005|       0.6|
+---------+----------+

在实际表中查询时不会出现这样的问题

具体的原因和机制有待深入的研究



版权声明:本文为chishuazhi5205原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。