Skip to content

Type Mapping

Spark Connect returns results as Arrow IPC batches, so the driver maps Spark SQL types to Arrow types directly. This page is the reference for that mapping in both directions (result columns and bound parameters).

Spark to Arrow

Spark SQL type Arrow type
boolean bool
byte (tinyint) int8
short (smallint) int16
int (integer) int32
long (bigint) int64
float float32
double float64
decimal(p, s) decimal128(p, s)
string utf8
binary binary
date date32
timestamp timestamp[us, tz]
timestamp_ntz timestamp[us] (no zone)
array<T> list<T>
map<K, V> map<K, V>
struct<...> struct<...>
null null

Notes on specific types

Timestamps

Spark has two timestamp types and they map differently:

  • timestamp is an instant and carries a time zone. It maps to Arrow timestamp[us, tz], where the zone reflects the session time zone.
  • timestamp_ntz (no time zone) is wall-clock time. It maps to Arrow timestamp[us] with no zone attached.

Both use microsecond precision, matching Spark's internal representation.

Note

The effective time zone for timestamp follows the session's spark.sql.session.timeZone. Set it through a connection configuration option if you need a specific zone.

Decimal

decimal(p, s) maps to Arrow decimal128(p, s) with the same precision and scale, so values are exact (no floating point rounding).

Nested types

array, map, and struct map to the corresponding Arrow nested types and can be composed arbitrarily (for example array<struct<...>>). Field names and nullability are preserved.

Parameter binding

When you bind parameters, build an Arrow record whose column types match the Spark types expected by the query, using the same mapping in reverse. For example bind an int64 array for a bigint column and a decimal128(p, s) array for a decimal(p, s) column. See Querying Data for binding examples.

Tip

Because the mapping is one to one and columnar, results from this driver drop straight into pandas, Polars, or DuckDB through pyarrow without any per-row conversion.