Scenario
User wants to take Okera datasets and save them in the databricks metastore.
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark.sql import SQLContext
sc = SparkContext()
sql_ctx = SQLContext(sc)
spark = SparkSession.builder.getOrCreate()
sql_ctx.sql("""create external table testdb.test_create_tb (id int) location 's3://cerberus-cerebro/test/data/'""")
query = "SELECT * FROM testdb.test_create_tb"
result = sql_ctx.sql(query)
result.show()
However, creating an external table from within Spark generates an error.
pyspark.sql.utils.AnalysisException: u"Hive support is required to CREATE Hive TABLE (AS SELECT);;\n'CreateTable `testdb`.`test_create_tb`, org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe, ErrorIfExists\n"
- It seems the job is not able to get the Hive context.
- To correct this, we need to tell spark to use hive for metadata.
- This can be done at spark submit time by adding the catalog to the command line:
spark-submit --conf spark.sql.catalogImplementation=hive 356.py
Or, you can configure it for all requests by adding the following to /etc/spark/conf/spark-defaults.conf:
spark.sql.catalogImplementation=hive
Comments
0 comments
Please sign in to leave a comment.