Problem:
- Running a Hive query via ODAS can return no output, as was observed in an EMR 5.9 cluster.
Cause:
- A Hive query using count(*) reduces to an execution plan that avoids MapReduce and computes the result instead using table stats. If the table in question is EXTERNAL, and the underlying storage has been directly modified, the stats don't get updated. A count(*) query under these conditions will return an inaccurate value.
If an EXTERNAL VIEW is defined in ODAS over a large dataset without explicitly computing stats, there will appear to be no data for this query.
This behavior is documented in HIVE-11266 and by many vendors who use Hive 2.3.x or earlier versions in their distributions. The patch is targeted for release in Apache Hive 3.0.
This issue is also documented in EMR release notes. EMR 5.9 supports Hive 2.3.0. The latest version (5.20) supports Hive 2.3.4.
Workaround:
- Every vendor who relies on Hive versions prior to 2.4.0 recommend setting the property hive.compute.query.using.stats to false. You can make this change in Okera’s /etc/hive/conf/hive-site.xml file, restart the service, and all should be fine.
Comments
0 comments
Please sign in to leave a comment.