org.apache.spark.sql
Members list
Packages
Type members
Classlikes
Thrown when a query fails to analyze on the server: an unknown table or column, a type mismatch, an ambiguous reference, and similar. Mirrors org.apache.spark.sql.AnalysisException.
Thrown when a query fails to analyze on the server: an unknown table or column, a type mismatch, an ambiguous reference, and similar. Mirrors org.apache.spark.sql.AnalysisException.
Attributes
- Supertypes
-
class SparkExceptionclass RuntimeExceptionclass Exceptionclass Throwabletrait Serializableclass Objecttrait Matchableclass AnyShow all
- Known subtypes
-
class ParseException
A column in a Dataset: a lazily-evaluated reference to a column or a computation over columns. Columns are immutable; operators and methods return new Columns.
A column in a Dataset: a lazily-evaluated reference to a column or a computation over columns. Columns are immutable; operators and methods return new Columns.
Build columns with functions.col, functions.lit, by indexing a DataFrame (df("id")), or by combining other columns with operators.
import org.apache.spark.sql.functions._
(col("age") + 1).as("next_age")
col("name").like("a%") && (col("age") >= 18)
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Turns a local sequence into a DataFrame. Product elements (tuples / case classes) become multi-column rows; any other value becomes a single column. Column types are inferred from the first row.
Turns a local sequence into a DataFrame. Product elements (tuples / case classes) become multi-column rows; any other value becomes a single column. Column types are inferred from the first row.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Functionality for working with missing data in a Dataset, reached via df.na. Mirrors PySpark's DataFrame.na (DataFrameNaFunctions).
Functionality for working with missing data in a Dataset, reached via df.na. Mirrors PySpark's DataFrame.na (DataFrameNaFunctions).
df.na.drop()
df.na.fill(0)
df.na.fill(Map("name" -> "unknown", "age" -> 0))
df.na.replace("name", Map("UNKNOWN" -> "unnamed"))
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Loads data from external storage systems (e.g. file systems, key-value stores, JDBC) into a DataFrame. Use SparkSession.read to access this.
Loads data from external storage systems (e.g. file systems, key-value stores, JDBC) into a DataFrame. Use SparkSession.read to access this.
Mirrors the public surface of org.apache.spark.sql.DataFrameReader over the Spark Connect protocol.
Attributes
- Example
-
spark.read.format("csv").option("header", true).load("data.csv") spark.read.json("events.json") spark.read.table("my_table") - Supertypes
-
class Objecttrait Matchableclass Any
Statistical functions for a Dataset, reached via df.stat. Mirrors PySpark's DataFrame.stat (DataFrameStatFunctions).
Statistical functions for a Dataset, reached via df.stat. Mirrors PySpark's DataFrame.stat (DataFrameStatFunctions).
df.stat.corr("x", "y")
df.stat.approxQuantile("x", Array(0.25, 0.5, 0.75), 0.01)
df.stat.crosstab("a", "b").show()
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Saves the contents of a DataFrame to external storage systems (e.g. file systems, key-value stores, tables). Use Dataset.write to access this.
Saves the contents of a DataFrame to external storage systems (e.g. file systems, key-value stores, tables). Use Dataset.write to access this.
Mirrors the public surface of org.apache.spark.sql.DataFrameWriter over the Spark Connect protocol.
Attributes
- Example
-
df.write.format("parquet").mode("overwrite").save("out.parquet") df.write.mode("append").saveAsTable("my_table") - Supertypes
-
class Objecttrait Matchableclass Any
Interface used to write a Dataset to external storage using the v2 (catalog) API. Use Dataset.writeTo to access this.
Interface used to write a Dataset to external storage using the v2 (catalog) API. Use Dataset.writeTo to access this.
df.writeTo("catalog.db.table").using("parquet").create()
df.writeTo("catalog.db.table").append()
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
A distributed collection of rows. A Dataset is lazy: transformations build up a protobuf logical plan, and nothing executes until an action (e.g. collect, show, count) is called.
A distributed collection of rows. A Dataset is lazy: transformations build up a protobuf logical plan, and nothing executes until an action (e.g. collect, show, count) is called.
Mirrors the public surface of org.apache.spark.sql.Dataset over the Spark Connect protocol.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Converts between JVM values of type T and the Row representation produced and consumed by Spark Connect. Encoders are derived at compile time for primitives, Option, collections, maps, tuples, and case classes, so df.as[Person] and spark.createDataset(people) work with no server-side closures.
Converts between JVM values of type T and the Row representation produced and consumed by Spark Connect. Encoders are derived at compile time for primitives, Option, collections, maps, tuples, and case classes, so df.as[Person] and spark.createDataset(people) work with no server-side closures.
Encoders intentionally do NOT cover Dataset.map/flatMap/groupByKey/reduce/mapPartitions: those evaluate a JVM closure per row on the server (the same mechanism as a UDF), which Spark Connect for Scala 3 does not support.
Attributes
- Companion
- object
- Supertypes
-
trait Serializableclass Objecttrait Matchableclass Any
Companion holding the derived Encoder instances. Because they live here, df.as[T] and spark.createDataset resolve an encoder for any supported T with no explicit import.
A Row backed by an Array[Any].
Builder for a MERGE INTO statement, obtained from Dataset.mergeInto. Specify one or more whenMatched / whenNotMatched / whenNotMatchedBySource clauses, then call merge to execute the statement against the target table.
Builder for a MERGE INTO statement, obtained from Dataset.mergeInto. Specify one or more whenMatched / whenNotMatched / whenNotMatchedBySource clauses, then call merge to execute the statement against the target table.
spark.table("source")
.mergeInto("target", col("source.id") === col("target.id"))
.whenMatched().updateAll()
.whenNotMatched().insertAll()
.whenNotMatchedBySource().delete()
.merge()
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Holder for named aggregate metrics computed while a Dataset is being materialised, without an extra pass over the data. Pair with Dataset.observe.
Holder for named aggregate metrics computed while a Dataset is being materialised, without an extra pass over the data. Pair with Dataset.observe.
val obs = new Observation("metrics")
df.observe(obs, count(lit(1)).as("rows"), max("id").as("max_id")).collect()
obs.get // Map("rows" -> 100, "max_id" -> 99)
The result of get is a map from metric column name to its value. get blocks until the metrics have been observed (i.e. until an action on the observed Dataset has run and the result code has called setMetricsFromLiterals).
Value parameters
- name
-
a unique name for this observation.
Attributes
- Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
Companion for Observation. Maintains a registry mapping observation names to instances so the result-collection code can route observed metric values back to the originating Observation, and provides a helper for decoding proto.Expression.Literal values.
Companion for Observation. Maintains a registry mapping observation names to instances so the result-collection code can route observed metric values back to the originating Observation, and provides a helper for decoding proto.Expression.Literal values.
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
Observation.type
A set of methods for aggregations on a Dataset, created by Dataset.groupBy, Dataset.rollup, Dataset.cube, or Dataset.pivot.
A set of methods for aggregations on a Dataset, created by Dataset.groupBy, Dataset.rollup, Dataset.cube, or Dataset.pivot.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Represents one row of output from a relational operator. Mirrors org.apache.spark.sql.Row.
Represents one row of output from a relational operator. Mirrors org.apache.spark.sql.Row.
Attributes
- Companion
- object
- Supertypes
-
trait Serializableclass Objecttrait Matchableclass Any
- Known subtypes
-
class GenericRowclass GenericRowWithSchema
Runtime configuration interface for Spark, accessible via spark.conf. Reads and writes are delegated to the Spark Connect server's Config RPC.
Runtime configuration interface for Spark, accessible via spark.conf. Reads and writes are delegated to the Spark Connect server's Config RPC.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Implicit conversions available via import spark.implicits.*: the $"columnName" column syntax and a .toDF method on local sequences of values, tuples, or case classes.
Implicit conversions available via import spark.implicits.*: the $"columnName" column syntax and a .toDF method on local sequences of values, tuples, or case classes.
import spark.implicits.*
val df = Seq((1, "a"), (2, "b")).toDF("id", "name")
df.select($"id", $"name")
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Known subtypes
-
object implicits
Specifies the expected behavior when saving a DataFrame to a data source that already has data. Mirrors org.apache.spark.sql.SaveMode.
Specifies the expected behavior when saving a DataFrame to a data source that already has data. Mirrors org.apache.spark.sql.SaveMode.
Attributes
- Companion
- object
- Supertypes
-
trait Enumtrait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all
The entry point to programming Spark with the DataFrame API over Spark Connect.
The entry point to programming Spark with the DataFrame API over Spark Connect.
val spark = SparkSession.builder
.remote("sc://localhost:15002")
.appName("my-app")
.getOrCreate()
spark.range(10).filter(col("id") % 2 === 0).show()
spark.stop()
A session owns the SparkConnectClient, a monotonic plan-id allocator (so every relation is uniquely identifiable to the server), an Arrow allocator for decoding results, and the RuntimeConfig facade.
Attributes
- Companion
- object
- Supertypes
-
trait AutoCloseableclass Objecttrait Matchableclass Any
Attributes
- Companion
- class
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
SparkSession.type
A WHEN MATCHED [AND condition] clause of a MergeIntoWriter.
A WHEN MATCHED [AND condition] clause of a MergeIntoWriter.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
A WHEN NOT MATCHED [AND condition] clause of a MergeIntoWriter.
A WHEN NOT MATCHED [AND condition] clause of a MergeIntoWriter.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
A WHEN NOT MATCHED BY SOURCE [AND condition] clause of a MergeIntoWriter.
A WHEN NOT MATCHED BY SOURCE [AND condition] clause of a MergeIntoWriter.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
Built-in functions for working with Columns, mirroring org.apache.spark.sql.functions.
Built-in functions for working with Columns, mirroring org.apache.spark.sql.functions.
import org.apache.spark.sql.functions._
df.select(col("id"), upper(col("name")), (col("x") + 1).as("x1"))
df.groupBy("dept").agg(avg("salary"), count(lit(1)))
This object exposes a comprehensive subset of Spark's function library. Any Spark function not listed here can still be invoked by name via callUDF / expr.
Following Spark's convention, a String argument denotes a column name for most functions (e.g. sum("salary") aggregates the salary column), while functions whose parameters are genuinely literal (regex patterns, date formats, JSON paths, ...) treat their String arguments as literal values.
Attributes
- Supertypes
-
class Objecttrait Matchableclass Any
- Self type
-
functions.type