org.apache.spark.sql.pipelines

Members list

Type members

Classlikes

sealed abstract class OutputType(val toProto: OutputType)

The type of output registered in a Pipeline dataflow graph.

The type of output registered in a Pipeline dataflow graph.

Mirrors spark.connect.OutputType but only exposes the kinds that can be defined without user-defined functions.

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
Known subtypes
object Sink
object Table
object TemporaryView
object OutputType

Attributes

Companion
class
Supertypes
trait Sum
trait Mirror
class Object
trait Matchable
class Any
Self type
OutputType.type
class Pipeline

A Spark Declarative Pipeline (SDP) dataflow graph.

A Spark Declarative Pipeline (SDP) dataflow graph.

A pipeline is built by registering outputs (tables, materialized views, temporary views, or sinks) and the flows that populate them, then started with startRun. Each flow is defined by a DataFrame (an unresolved relation), so flows are composed with the same API used for ordinary queries.

Create one with Pipeline.create.

Attributes

Note

foreach/foreachBatch flows and query-function evaluation are not supported (they require user-defined functions); define each flow with a relation instead.

Example
 val pipe = Pipeline.create(spark, storage = Some("/tmp/pipeline_storage"))
 pipe.createMaterializedView("bronze", Some(spark.read.json("/data/raw")))
 pipe.createTable("silver", Some(pipe.read("bronze").filter(col("ok"))))
 val events = pipe.startRun()
Companion
object
Supertypes
class Object
trait Matchable
class Any
object Pipeline

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type
Pipeline.type
final case class PipelineEvent(timestamp: Option[Timestamp], message: Option[String])

A timestamped event emitted by the server during a pipeline run.

A timestamped event emitted by the server during a pipeline run.

Value parameters

message

the human readable message for the event, or None if absent.

timestamp

the time the event occurred, or None if the server did not provide one.

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all