Connecting and Authentication¶
Connections use the standard Apache Spark Connect connection string, passed
to the driver as the ADBC uri option.
The connection string¶
The scheme is always sc://. The host and port default to localhost:15002,
which is where a local Spark Connect server listens. Everything after the path
separator (/;) is a list of key=value pairs separated by semicolons.
| Field | Meaning | Default |
|---|---|---|
host:port |
Spark Connect server address | localhost:15002 |
token |
Bearer token (sent as Authorization: Bearer) |
none |
use_ssl |
true or false; enable TLS for the gRPC channel |
inferred true when a token is set |
user_id |
Spark Connect user id for the session | none |
user_agent |
User agent advertised to the server | adbc-driver-spark/<version> |
session_id |
Reuse an existing session id (a UUID) | new session |
Note
Each option in the connection string also has an equivalent ADBC option key
(for example adbc.spark.token). Inline string fields and option
keys can be mixed; option keys take precedence. See the
Configuration Reference.
Connecting without authentication¶
For a local, plaintext Spark Connect server, the URI is all you need.
/* The uri option is all you need for a local, plaintext server. */
AdbcDatabaseNew(&database, &error);
AdbcDatabaseSetOption(&database, "driver",
"/path/to/libadbc_driver_spark.so", &error);
AdbcDatabaseSetOption(&database, "uri", "sc://localhost:15002", &error);
AdbcDatabaseInit(&database, &error);
Note
The full setup/teardown (error checking, opening a connection, releases) and the compile command live in Using from C and C++.
// driver_path is the path to libadbc_driver_spark.{so,dylib,dll}.
let mut driver = ManagedDriver::load_dynamic_from_filename(driver_path, None, AdbcVersion::V110)?;
let mut database = driver.new_database_with_opts(
[(OptionDatabase::Uri, "sc://localhost:15002".into())])?;
let mut connection = database.new_connection()?;
Note
For the full Rust and Ruby setup see Using from Rust and Using from Ruby. The token, TLS, and header options below are passed the same way for those languages.
TLS and bearer tokens¶
To reach a TLS endpoint that requires a bearer token, set the token and enable SSL. When a token is supplied the driver enables TLS automatically, because a bearer token over plaintext would leak the credential.
import adbc_driver_spark.dbapi as dbapi
# Inline in the URI.
uri = "sc://spark.example.com:443/;token=eyJhbGci...;use_ssl=true"
with dbapi.connect(uri) as conn:
...
# Or with the convenience keyword (implies use_ssl=True).
with dbapi.connect("sc://spark.example.com:443", token="eyJhbGci...") as conn:
...
/* Auth settings are plain database options. Setting a token enables TLS. */
AdbcDatabaseSetOption(&database, "uri", "sc://spark.example.com:443", &error);
AdbcDatabaseSetOption(&database, "adbc.spark.token", "eyJhbGci...",
&error);
AdbcDatabaseSetOption(&database, "adbc.spark.tls.enabled", "true", &error);
// Auth settings are extra database options. Setting a token enables TLS.
let mut database = driver.new_database_with_opts([
(OptionDatabase::Uri, "sc://spark.example.com:443".into()),
(OptionDatabase::Other("adbc.spark.token".into()), "eyJhbGci...".into()),
(OptionDatabase::Other("adbc.spark.tls.enabled".into()), "true".into()),
])?;
Warning
Never commit tokens to source control. Read them from the environment or a secret store and inject them at connect time.
Custom headers¶
Arbitrary gRPC metadata headers can be attached to every request using the
adbc.spark.headers.<NAME> option prefix, or inline header fields in
the URI where the server supports them.
/* Attach a gRPC metadata header with the headers.<NAME> option prefix. */
AdbcDatabaseSetOption(&database, "uri", "sc://spark.example.com:443", &error);
AdbcDatabaseSetOption(&database, "adbc.spark.token", "eyJhbGci...",
&error);
AdbcDatabaseSetOption(&database,
"adbc.spark.headers.x-request-source",
"analytics-team", &error);
// Attach a gRPC metadata header with the headers.<NAME> option prefix.
let mut database = driver.new_database_with_opts([
(OptionDatabase::Uri, "sc://spark.example.com:443".into()),
(OptionDatabase::Other("adbc.spark.token".into()), "eyJhbGci...".into()),
(
OptionDatabase::Other("adbc.spark.headers.x-request-source".into()),
"analytics-team".into(),
),
])?;
Databricks-style bearer token¶
Databricks Connect compatible endpoints authenticate with a personal access token (or OAuth token) carried as a bearer token over TLS, and identify the workspace cluster through a header. The shape is the same as any other token-and-TLS endpoint.
import os
import adbc_driver_spark.dbapi as dbapi
headers = "adbc.spark.headers."
with dbapi.connect(
"sc://dbc-12345678-90ab.cloud.databricks.com:443",
token=os.environ["DATABRICKS_TOKEN"], # implies use_ssl=True
db_kwargs={
f"{headers}x-databricks-cluster-id": os.environ["DATABRICKS_CLUSTER_ID"],
},
) as conn:
with conn.cursor() as cur:
cur.execute("SHOW CATALOGS")
print(cur.fetchall())
/* Token over TLS plus the cluster id header, read from the environment. */
AdbcDatabaseSetOption(&database, "uri",
"sc://dbc-12345678-90ab.cloud.databricks.com:443", &error);
AdbcDatabaseSetOption(&database, "adbc.spark.token",
getenv("DATABRICKS_TOKEN"), &error);
AdbcDatabaseSetOption(&database, "adbc.spark.tls.enabled", "true", &error);
AdbcDatabaseSetOption(&database,
"adbc.spark.headers.x-databricks-cluster-id",
getenv("DATABRICKS_CLUSTER_ID"), &error);
use std::env;
// Token over TLS plus the cluster id header, read from the environment.
let mut database = driver.new_database_with_opts([
(OptionDatabase::Uri, "sc://dbc-12345678-90ab.cloud.databricks.com:443".into()),
(
OptionDatabase::Other("adbc.spark.token".into()),
env::var("DATABRICKS_TOKEN")?.into(),
),
(OptionDatabase::Other("adbc.spark.tls.enabled".into()), "true".into()),
(
OptionDatabase::Other("adbc.spark.headers.x-databricks-cluster-id".into()),
env::var("DATABRICKS_CLUSTER_ID")?.into(),
),
])?;
# Token over TLS plus the cluster id header, read from the environment.
ADBC::Database.open(
driver: driver,
uri: "sc://dbc-12345678-90ab.cloud.databricks.com:443",
token: ENV.fetch("DATABRICKS_TOKEN"),
use_ssl: true,
"adbc.spark.headers.x-databricks-cluster-id": ENV.fetch("DATABRICKS_CLUSTER_ID"),
) do |database|
database.connect do |connection|
puts connection.query("SHOW CATALOGS")
end
end
Session reuse¶
Pass session_id (a UUID) to attach to an existing Spark Connect session
instead of creating a new one. This lets temporary views and session
configuration carry across connections.
with dbapi.connect(
"sc://localhost:15002/;session_id=2a1b9c64-7c1d-4e2f-9a3b-0f8e6d5c4b3a"
) as conn:
...
See Configuration Reference for the full option list and Troubleshooting if a connection fails.