Type Mapping¶
Spark Connect returns results as Arrow IPC batches, so the driver maps Spark SQL types to Arrow types directly. This page is the reference for that mapping in both directions (result columns and bound parameters).
Spark to Arrow¶
| Spark SQL type | Arrow type |
|---|---|
boolean |
bool |
byte (tinyint) |
int8 |
short (smallint) |
int16 |
int (integer) |
int32 |
long (bigint) |
int64 |
float |
float32 |
double |
float64 |
decimal(p, s) |
decimal128(p, s) |
string |
utf8 |
binary |
binary |
date |
date32 |
timestamp |
timestamp[us, tz] |
timestamp_ntz |
timestamp[us] (no zone) |
array<T> |
list<T> |
map<K, V> |
map<K, V> |
struct<...> |
struct<...> |
null |
null |
Notes on specific types¶
Timestamps¶
Spark has two timestamp types and they map differently:
timestampis an instant and carries a time zone. It maps to Arrowtimestamp[us, tz], where the zone reflects the session time zone.timestamp_ntz(no time zone) is wall-clock time. It maps to Arrowtimestamp[us]with no zone attached.
Both use microsecond precision, matching Spark's internal representation.
Note
The effective time zone for timestamp follows the session's
spark.sql.session.timeZone. Set it through a connection configuration
option if you need a specific zone.
Decimal¶
decimal(p, s) maps to Arrow decimal128(p, s) with the same precision and
scale, so values are exact (no floating point rounding).
Nested types¶
array, map, and struct map to the corresponding Arrow nested types and can
be composed arbitrarily (for example array<struct<...>>). Field names and
nullability are preserved.
Parameter binding¶
When you bind parameters, build an Arrow record whose column types match the
Spark types expected by the query, using the same mapping in reverse. For
example bind an int64 array for a bigint column and a decimal128(p, s)
array for a decimal(p, s) column. See Querying Data for binding
examples.
Tip
Because the mapping is one to one and columnar, results from this driver
drop straight into pandas, Polars, or DuckDB through pyarrow without any
per-row conversion.