org.apache.spark.sql.pipelines
Members list
Type members
Classlikes
The type of output registered in a Pipeline dataflow graph.
Attributes
- Companion
- class
- Supertypes
-
trait Sumtrait Mirrorclass Objecttrait Matchableclass Any
- Self type
-
OutputType.type
A Spark Declarative Pipeline (SDP) dataflow graph.
A Spark Declarative Pipeline (SDP) dataflow graph.
A pipeline is built by registering outputs (tables, materialized views, temporary views, or sinks) and the flows that populate them, then started with startRun. Each flow is defined by a DataFrame (an unresolved relation), so flows are composed with the same API used for ordinary queries.
Create one with Pipeline.create.
Attributes
- Note
-
foreach/foreachBatchflows and query-function evaluation are not supported (they require user-defined functions); define each flow with a relation instead. - Example
-
val pipe = Pipeline.create(spark, storage = Some("/tmp/pipeline_storage")) pipe.createMaterializedView("bronze", Some(spark.read.json("/data/raw"))) pipe.createTable("silver", Some(pipe.read("bronze").filter(col("ok")))) val events = pipe.startRun() - Companion
- object
- Supertypes
-
class Objecttrait Matchableclass Any
A timestamped event emitted by the server during a pipeline run.
A timestamped event emitted by the server during a pipeline run.
Value parameters
- message
-
the human readable message for the event, or
Noneif absent. - timestamp
-
the time the event occurred, or
Noneif the server did not provide one.
Attributes
- Supertypes
-
trait Serializabletrait Producttrait Equalsclass Objecttrait Matchableclass AnyShow all