Problem
ODAS Workers restart and produce a core dump when queries access tables that are backed by Parquet files and use certain, complex MAP data types in their schema.
Answer
For current information regarding Okera data types, refer here. Specific information regarding the map data types can be found here.
Okera provides limited support for complex MAP data types for schemas backed by Parquet files. A user can do the following:
- Create table with MAP<STRING, STRING>
For example, this table stores key/value pairs for users. The pairs are stored as STRING types:CREATE EXTERNAL TABLE referencedb.user_settings (
user_id BIGINT,
key_values MAP<STRING, STRING>
)
STORED AS PARQUET
LOCATION 's3://examplebucket/warehouse/usersettings'; - Create a table with MAP<STRING, STRUCT> and MAP<STRING, STRUCT<ARRAY>>
Note however that Okera does not support direct access of these complex types. To avoid application failure, create a VIEW on the complex tables without referring to the complex MAP types. For example, here we create a view on the base table that does not reference the complex map:
CREATE VIEW referencedb.user_settings_safe
AS SELECT id FROM referencedb.user_settings;
WARNING: If you use MAP types with STRUCT members directly, the ODAS Worker process will restart and the query fail. Use the VIEW approach shown above to access the remaining fields of the table.
Comments
0 comments
Please sign in to leave a comment.