ArrowSerializer

org.apache.spark.sql.connect.client.ArrowSerializer

Encodes local rows into an Apache Arrow IPC stream so that the client can ship local data to the server as a LocalRelation (used by SparkSession.createDataFrame).

This is the symmetric ENCODE counterpart to SparkResult (which DECODES the Arrow batches returned by the server). The Spark-type-to-Arrow-type mapping and the per-cell value conversions here mirror the decode logic in SparkResult.getValue so that a value round-trips faithfully.

Modelled on the same-author Ruby reference (spark_connect/arrow.rb, from_rows / build_arrow_schema / arrow_field_type).

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Self type

Members list

Value members

Concrete methods

def serialize(rows: Seq[Seq[Any]], schema: StructType, allocator: BufferAllocator): Array[Byte]

Serializes rows into a single, self-contained Arrow IPC stream (schema + one record batch).

Serializes rows into a single, self-contained Arrow IPC stream (schema + one record batch).

Value parameters

allocator

the Arrow BufferAllocator used to back the vectors (caller owns its lifecycle; the VectorSchemaRoot created here is closed before returning).

rows

the local data, one inner Seq per row, ordered to match schema.fields.

schema

the Spark schema describing the columns.

Attributes

Returns

the Arrow IPC stream bytes.

def toArrowSchema(schema: StructType): Schema

Builds an Arrow Schema from a Spark StructType.

Builds an Arrow Schema from a Spark StructType.

Attributes