Catalog¶
The catalog exposes the server's metadata: catalogs, databases, tables,
columns, and functions, plus management of temporary views and the table cache.
It mirrors Apache Spark's Catalog API and is reached through
spark.catalog.
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.remote("sc://localhost:15002").getOrCreate()
val catalog = spark.catalog
Catalogs and databases¶
catalog.currentCatalog() // the active catalog name
catalog.setCurrentCatalog("spark_catalog")
catalog.listCatalogs().show() // a DataFrame of available catalogs
catalog.currentDatabase // the active database
catalog.setCurrentDatabase("analytics")
catalog.listDatabases().show() // a DataFrame of databases
catalog.databaseExists("analytics") // Boolean
Tables and columns¶
The listing methods return DataFrames, so you can filter and collect them like any other query result:
catalog.listTables().show()
catalog.listTables("analytics").show()
catalog.listColumns("daily_summary").show()
catalog.listColumns("analytics", "daily_summary").show()
catalog.tableExists("daily_summary") // Boolean
catalog.tableExists("analytics", "daily_summary") // Boolean
Functions¶
Temporary views¶
spark.range(10).createOrReplaceTempView("t")
catalog.dropTempView("t") // returns true if a view was dropped
catalog.dropGlobalTempView("g")
Caching¶
Cache tables and views in cluster memory, and manage the cache:
catalog.cacheTable("daily_summary")
catalog.isCached("daily_summary") // Boolean
catalog.uncacheTable("daily_summary")
catalog.clearCache()