Dataset
A distributed collection of rows. A Dataset is lazy: transformations build up a protobuf logical plan, and nothing executes until an action (e.g. collect, show, count) is called.
Mirrors the public surface of org.apache.spark.sql.Dataset over the Spark Connect protocol.
Attributes
- Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
Members list
Value members
Concrete methods
Returns a new Dataset where each record is mapped to type U via its Encoder. This is a purely client-side reinterpretation (no server-side closure), so it works over Spark Connect.
Returns a new Dataset where each record is mapped to type U via its Encoder. This is a purely client-side reinterpretation (no server-side closure), so it works over Spark Connect.
Attributes
Persists this Dataset with the default storage level.
Persists this Dataset with the default storage level.
Attributes
Eagerly checkpoints this Dataset to reliable storage and returns the checkpointed copy.
Eagerly checkpoints this Dataset to reliable storage and returns the checkpointed copy.
Attributes
Checkpoints this Dataset to reliable storage.
Checkpoints this Dataset to reliable storage.
Attributes
Selects a column by name, qualified by this Dataset's plan id so that it resolves unambiguously even in self-joins.
Selects a column by name, qualified by this Dataset's plan id so that it resolves unambiguously even in self-joins.
Attributes
Selects columns based on a column name regular expression.
Selects columns based on a column name regular expression.
Attributes
Computes basic statistics (count, mean, stddev, min, max) for numeric and string columns.
Computes basic statistics (count, mean, stddev, min, max) for numeric and string columns.
Attributes
Drops duplicates within the event-time watermark, keeping state bounded for streaming.
Drops duplicates within the event-time watermark, keeping state bounded for streaming.
Attributes
Lateral join with a correlated right relation.
Lateral join with a correlated right relation.
Attributes
Eagerly locally checkpoints this Dataset.
Eagerly locally checkpoints this Dataset.
Attributes
Locally checkpoints this Dataset.
Locally checkpoints this Dataset.
Attributes
Merges this Dataset (the source) into the table (the target) using condition to match rows. Returns a MergeIntoWriter to configure the WHEN clauses; call merge() to run it.
Merges this Dataset (the source) into the table (the target) using condition to match rows. Returns a MergeIntoWriter to configure the WHEN clauses; call merge() to run it.
Attributes
Returns a DataFrameNaFunctions for working with missing data.
Defines named observed metrics computed while this Dataset is processed.
Defines named observed metrics computed while this Dataset is processed.
Attributes
Persists this Dataset with the default storage level (MEMORY_AND_DISK).
Persists this Dataset with the default storage level (MEMORY_AND_DISK).
Attributes
Persists this Dataset with the given storage level.
Persists this Dataset with the given storage level.
Attributes
Randomly splits this Dataset with the given weights and a fixed seed.
Randomly splits this Dataset with the given weights and a fixed seed.
Attributes
Randomly splits this Dataset with the given weights.
Randomly splits this Dataset with the given weights.
Attributes
Range-partitions by the given expressions into numPartitions.
Range-partitions by the given expressions into numPartitions.
Attributes
Range-partitions by the given expressions.
Range-partitions by the given expressions.
Attributes
The schema of this Dataset.
The schema of this Dataset.
Attributes
Returns a DataFrameStatFunctions for statistic functions.
Returns the current storage level of this Dataset.
Returns the current storage level of this Dataset.
Attributes
Computes the requested summary statistics; defaults match Spark's summary().
Computes the requested summary statistics; defaults match Spark's summary().
Attributes
Returns a new Dataset where each row is reconciled to match the specified schema (by column name, reordering and casting as needed).
Returns a new Dataset where each row is reconciled to match the specified schema (by column name, reordering and casting as needed).
Attributes
Returns the content as a DataFrame of JSON strings in a single value column.
Returns the content as a DataFrame of JSON strings in a single value column.
Attributes
Concisely applies a transformation to this Dataset.
Concisely applies a transformation to this Dataset.
Attributes
Transposes the DataFrame, turning the first column into the new column names.
Transposes the DataFrame, turning the first column into the new column names.
Attributes
Transposes the DataFrame using indexColumn for the new column names.
Transposes the DataFrame using indexColumn for the new column names.
Attributes
Marks this Dataset as non-persistent.
Marks this Dataset as non-persistent.
Attributes
Marks this Dataset as non-persistent.
Marks this Dataset as non-persistent.
Attributes
Unpivots (melts) a DataFrame from wide to long format.
Unpivots (melts) a DataFrame from wide to long format.
Attributes
Unpivots, inferring the value columns from those not in ids.
Unpivots, inferring the value columns from those not in ids.
Attributes
Defines an event-time watermark for this streaming Dataset.
Defines an event-time watermark for this streaming Dataset.
Attributes
Interface for saving the content of this Dataset to external storage.
Interface for saving the content of this Dataset to external storage.
Attributes
Interface for saving the content of a streaming Dataset to external storage.
Interface for saving the content of a streaming Dataset to external storage.
Attributes
Creates a v2 (catalog) write configuration builder.
Creates a v2 (catalog) write configuration builder.