SparkSession

org.apache.spark.sql.SparkSession
See theSparkSession companion object
class SparkSession extends AutoCloseable

The entry point to programming Spark with the DataFrame API over Spark Connect.

 val spark = SparkSession.builder
   .remote("sc://localhost:15002")
   .appName("my-app")
   .getOrCreate()

 spark.range(10).filter(col("id") % 2 === 0).show()
 spark.stop()

A session owns the SparkConnectClient, a monotonic plan-id allocator (so every relation is uniquely identifiable to the server), an Arrow allocator for decoding results, and the RuntimeConfig facade.

Attributes

Companion
object
Graph
Supertypes
trait AutoCloseable
class Object
trait Matchable
class Any

Members list

Type members

Classlikes

object implicits extends SQLImplicits

Implicit conversions ($"col", Seq(...).toDF(...)); use import spark.implicits.*.

Implicit conversions ($"col", Seq(...).toDF(...)); use import spark.implicits.*.

Attributes

Supertypes
class SQLImplicits
class Object
trait Matchable
class Any
Self type
implicits.type

Value members

Concrete methods

override def close(): Unit

Attributes

Definition Classes
AutoCloseable
def createDataFrame(rows: Seq[Row], schema: StructType): DataFrame

Creates a DataFrame from a local sequence of Rows using the given schema.

Creates a DataFrame from a local sequence of Rows using the given schema.

Attributes

def createDataFrame(rows: List[Row], schema: StructType): DataFrame

Creates a DataFrame from a Java list of Rows using the given schema.

Creates a DataFrame from a Java list of Rows using the given schema.

Attributes

def createDataset[T](data: Seq[T])(using enc: Encoder[T]): Dataset[T]

Creates a typed Dataset from a local sequence of T using its Encoder. The data is serialized via the encoder and shipped as a local relation; no server-side closure is involved.

Creates a typed Dataset from a local sequence of T using its Encoder. The data is serialized via the encoder and shipped as a local relation; no server-side closure is involved.

Attributes

def createDataset[T](data: List[T])(using enc: Encoder[T]): Dataset[T]

Creates a typed Dataset from a Java list of T using its Encoder.

Creates a typed Dataset from a Java list of T using its Encoder.

Attributes

Returns a DataFrame with no rows or columns.

Returns a DataFrame with no rows or columns.

Attributes

Starts a new independent session against the same endpoint (fresh server-side session).

Starts a new independent session against the same endpoint (fresh server-side session).

Attributes

def pipeline(defaultCatalog: Option[String], defaultDatabase: Option[String], sqlConf: Map[String, String]): Pipeline

Creates a new Spark Declarative Pipeline (a dataflow graph) in this session. Available on Spark 4.1 and later servers.

Creates a new Spark Declarative Pipeline (a dataflow graph) in this session. Available on Spark 4.1 and later servers.

Attributes

def range(end: Long): DataFrame

Creates a DataFrame with a single id column of [0, end).

Creates a DataFrame with a single id column of [0, end).

Attributes

def range(start: Long, end: Long): DataFrame

Creates a DataFrame with a single id column of [start, end).

Creates a DataFrame with a single id column of [start, end).

Attributes

def range(start: Long, end: Long, step: Long): DataFrame
def range(start: Long, end: Long, step: Long, numPartitions: Int): DataFrame

Returns a DataFrameReader that can be used to read non-streaming data as a DataFrame.

Returns a DataFrameReader that can be used to read non-streaming data as a DataFrame.

Attributes

Returns a org.apache.spark.sql.streaming.DataStreamReader for reading streaming data.

Returns a org.apache.spark.sql.streaming.DataStreamReader for reading streaming data.

Attributes

def sessionId: String

The client session id (a UUID).

The client session id (a UUID).

Attributes

Make this the active session for the current thread.

Make this the active session for the current thread.

Attributes

def sql(query: String): DataFrame

Executes a SQL query and returns a lazy DataFrame over its result.

Executes a SQL query and returns a lazy DataFrame over its result.

Attributes

def sql(query: String, args: Array[Any]): DataFrame

Executes a SQL query with positional parameters bound into the query.

Executes a SQL query with positional parameters bound into the query.

Attributes

def sql(query: String, args: Map[String, Any]): DataFrame

Executes a SQL query with named parameters bound into the query.

Executes a SQL query with named parameters bound into the query.

Attributes

def stop(): Unit

Releases the server-side session resources and closes the channel.

Releases the server-side session resources and closes the channel.

Attributes

def table(tableName: String): DataFrame

Returns the named table or view as a DataFrame.

Returns the named table or view as a DataFrame.

Attributes

def version: String

The version of Spark on which the connected server is running.

The version of Spark on which the connected server is running.

Attributes

Concrete fields

lazy val catalog: Catalog

Returns the org.apache.spark.sql.catalog.Catalog interface for this session.

Returns the org.apache.spark.sql.catalog.Catalog interface for this session.

Attributes

lazy val conf: RuntimeConfig

Runtime configuration for Spark.

Runtime configuration for Spark.

Attributes