Question
- How do you access tables from an Okera-enabled EMR that are not natively supported in ODAS.
- Need support to use DynamoDB
Answer
Using DynamoDB as an example of a type that is not supported by ODAS yet.
While Okera does not provide native support for DynamoDB yet, you can access DynamoDB based tables from an Okera enabled EMR by using a special cluster-local database that enables bypassing ODAS called localdb.
- In Okera enabled EMRs, we support a special database name, called `localdb`, such that any tables created in this database, will bypass ODAS, and behave just as you'd expect a table in hive without Okera to behave.
- The steps to do this are as follows:
- Create a database called localdb on the EMR cluster where you wish to access a DynamoDB table:
hive create database localdb;
- PLEASE NOTE: The localdb database will need to be created on each EMR cluster separately.
- Get the location of the table you want to use. For this, you can use the follwoing command and get the location:
-
describe formatted
-
Here is an example: look at the section in bold (You might need to do this step from a non-okera cluster, in case of lack of support):
hive> describe formatted rs.sometypes;
OK
# col_name data_type comment
int_col int
float_col float
string_col string
# Detailed Table Information
Database: rs
Owner: root
CreateTime: Fri Jul 20 14:46:30 UTC 2018
LastAccessTime: Fri Jul 20 14:46:30 UTC 2018
Retention: 0
Location: s3://test-bucket/folder-name/or-something-else
Table Type: EXTERNAL_TABLE
Table Parameters:
EXTERNAL TRUE
numFiles 1000
numRows 100000000
rawDataSize 100000000000
spark.sql.sources.provider com.cerebro.recordservice.spark.DefaultSource
spark.sql.sources.schema {\"fields\":[{\"name\":\"bool_col\",\"nullable\":true,\"type\":\"boolean\"},{\"name\":\"tinyint_col\",\"nullable\":true,\"type\":\"byte\"},{\"name\":\"smallint_col\",\"nullable\":true,\"type\":\"short\"},{\"name\":\"int_col\",\"nullable\":true,\"type\":\"integer\"},{\"name\":\"bigint_col\",\"nullable\":true,\"type\":\"long\"},{\"name\":\"float_col\",\"nullable\":true,\"type\":\"float\"},{\"name\":\"double_col\",\"nullable\":true,\"type\":\"double\"},{\"name\":\"string_col\",\"nullable\":true,\"type\":\"string\"},{\"name\":\"varchar_col\",\"nullable\":true,\"type\":\"string\"},{\"name\":\"char_col\",\"nullable\":true,\"type\":\"string\"},{\"name\":\"timestamp_col\",\"nullable\":true,\"type\":\"timestamp\"},{\"name\":\"decimal_col\",\"nullable\":true,\"type\":\"decimal(24,10)\"}],\"type\":\"struct\"}
storage_handler com.cloudera.recordservice.hive.RecordServiceStorageHandler
totalSize 100000000000
transient_lastDdlTime 1532097990
# Storage Information
SerDe Library: com.cloudera.recordservice.hive.RecordServiceSerDe
InputFormat: com.cloudera.recordservice.hive.RecordServiceHiveInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: 0
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
RecordServiceTable rs.sometypes
RecordServiceTableSize 100000000000
Time taken: 1.057 seconds, Fetched: 36 row(s)
3. Replicate this table in localdb
- We can do this simply by utilizing the `create external table like` hive command and using the location of the original table.
Please Note: If using an Okera-enabled hive, please make sure you have the actual location of the table, which is only visible if you have ALL access on the table.
The format for this is:
CREATE EXTERNAL TABLE <target_db_name.target_tbl_name> LIKE <source_db_name.source_tbl_name> LOCATION '<localtion>'
For the example in step 2, this would be:
CREATE EXTERNAL TABLE localdb.sometypes LIKE rs.sometypes LOCATION 's3://test-bucket/folder-name/or-something-else/'
Once done, you should be able to read the dynamodb table from its localdb replica.
Comments
0 comments
Please sign in to leave a comment.