Problem
Encountering the following error message, when using token sent from spark config in Databricks in an Okera-enabled cluster. The spark config includes the following parameter at the cluster level:
spark.recordservice.delegation-token.token at the cluster level
Error : org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Could not get database: ktur10; (NOTE: If you wish to use SparkR, import it by calling 'library(SparkR)'.)
org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: Could not get database: ktur10;
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1$$anonfun$apply$1.apply(HiveExternalCatalog.scala:149)
at org.apache.spark.sql.hive.HiveExternalCatalog.org$apache$spark$sql$hive$HiveExternalCatalog$$maybeSynchronized(HiveExternalCatalog.scala:103)
at org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$withClient$1.apply(HiveExternalCatalog.scala:138)
at com.databricks.backend.daemon.driver.ProgressReporter$.withStatusCode(ProgressReporter.scala:317)
at com.databricks.spark.util.SparkDatabricksProgressReporter$.withStatusCode(ProgressReporter.scala:23)
at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:136)
at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:274)
at org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.databaseExists(ExternalCatalogWithListener.scala:73)
at org.apache.spark.sql.catalyst.catalog.SessionCatalog.databaseExists(SessionCatalog.scala:257)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.isRunningDirectlyOnFiles(Analyzer.scala:706)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$.resolveRelation(Analyzer.scala:637)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:669)
at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveRelations$$anonfun$apply$8.applyOrElse(Analyzer.scala:662)
at org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$6.apply(TreeNode.scala:301)
Solution
There are certain cases for which Databricks' plumbing for setting the user token in a threadLocal context does not yet exist. As a result of this, there is no token for the Okera client libraries to read, and your request fails with a failure to authenticate.
Some of the cases where this plumbing doesn't yet exist are:
a) Spark-submit
b) R notebook
However, we can allow the users to set the token on a per-cluster basis (thereby, using the databricks cluster as a single-tenant cluster), and take advantages of the Okera-Databricks integration. You can go about this in the following way:
- Acquire a user token that the ODAS cluster can understand
- Open the
Clusters
tab on your Databricks's left-hand side menu.
- Select the ODAS-integrated Databricks cluster you want to use.
- Click
Edit
to edit the cluster configs. - Scroll to the bottom and click the
Spark
tab to edit the spark configs. - Set the following two configs with the token you acquired earlier.
recordservice.delegation-token.token
spark.recordservice.delegation-token.token
- Click
Start
to start your cluster.
For eg., let's say your token is foo
, add the following two lines to your spark config on odas-integrated databricks cluster:
-
recordservice.delegation-token.token foo
spark.recordservice.delegation-token.token foo
This should let you use your R notebook or Spark-submit on Databricks with Okera.
Comments
0 comments
Please sign in to leave a comment.