org.apache.spark.sql.streaming

Members list

Type members

Classlikes

Loads a streaming DataFrame from a streaming source. Use SparkSession.readStream to access this.

Loads a streaming DataFrame from a streaming source. Use SparkSession.readStream to access this.

Mirrors the public surface of org.apache.spark.sql.streaming.DataStreamReader over the Spark Connect protocol. A streaming read is expressed as a Read relation whose is_streaming flag is set, so the resulting DataFrame is unbounded and is meant to be consumed through a DataStreamWriter.

Attributes

Example
 val df = spark.readStream.format("rate").option("rowsPerSecond", 5).load()
Supertypes
class Object
trait Matchable
class Any

Writes a streaming Dataset to an external sink and starts the streaming query. Use Dataset.writeStream to access this.

Writes a streaming Dataset to an external sink and starts the streaming query. Use Dataset.writeStream to access this.

Mirrors the public surface of org.apache.spark.sql.streaming.DataStreamWriter over the Spark Connect protocol. Calling start or toTable builds a WriteStreamOperationStart command, sends it to the server, and returns a StreamingQuery handle built from the returned query id, run id and name.

foreach and foreachBatch are intentionally unsupported: they require user-defined functions, whose Spark Connect protobuf definitions are not handled here.

Attributes

Example
 val query = df.writeStream
   .format("console")
   .outputMode("append")
   .trigger(Trigger.ProcessingTime("1 second"))
   .start()
 query.stop()
Supertypes
class Object
trait Matchable
class Any

A handle to a query that is executing continuously in the background as new data arrives. Returned by DataStreamWriter#start and StreamingQueryManager.

A handle to a query that is executing continuously in the background as new data arrives. Returned by DataStreamWriter#start and StreamingQueryManager.

Mirrors the public surface of org.apache.spark.sql.streaming.StreamingQuery over the Spark Connect protocol. Every method issues a StreamingQueryCommand to the server, identified by the query's id and runId, and reads the typed result back from the response stream.

Value parameters

id

the stable query id that persists across restarts from a checkpoint.

name

the user-specified query name, or null if none was set.

runId

the run id that is unique for each start or restart of the query.

spark

the session that owns this query.

Attributes

Supertypes
class Object
trait Matchable
class Any
case class StreamingQueryException(message: String, errorClass: String, stackTrace: String)

Describes the exception that terminated a StreamingQuery.

Describes the exception that terminated a StreamingQuery.

Value parameters

errorClass

the error class of the exception, if any.

message

the exception message, matching the server-side StreamingQueryException.toString.

stackTrace

the stack trace of the exception, if any.

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
abstract class StreamingQueryListener extends Serializable

Interface for listening to streaming query lifecycle events. Register an implementation with spark.streams.addListener(...).

Interface for listening to streaming query lifecycle events. Register an implementation with spark.streams.addListener(...).

Over Spark Connect the callbacks run on the client: the server forwards serialized events on a dedicated stream and this client dispatches them to the registered listeners. No user code runs on the server, so listeners (unlike foreach/foreachBatch) are fully supported.

 spark.streams.addListener(new StreamingQueryListener {
   def onQueryStarted(e: StreamingQueryListener.QueryStartedEvent): Unit = println(s"started ${e.id}")
   def onQueryProgress(e: StreamingQueryListener.QueryProgressEvent): Unit = println(e.json)
   def onQueryTerminated(e: StreamingQueryListener.QueryTerminatedEvent): Unit = println(s"done ${e.id}")
 })

Attributes

Companion
object
Supertypes
trait Serializable
class Object
trait Matchable
class Any

Attributes

Companion
class
Supertypes
class Object
trait Matchable
class Any
Self type

Manages all the StreamingQuery instances active in a single SparkSession. Use SparkSession.streams to access this.

Manages all the StreamingQuery instances active in a single SparkSession. Use SparkSession.streams to access this.

Mirrors the public surface of org.apache.spark.sql.streaming.StreamingQueryManager over the Spark Connect protocol. Every method issues a StreamingQueryManagerCommand to the server and reads the typed result back from the response stream.

Value parameters

spark

the session whose streaming queries are managed.

Attributes

Supertypes
class Object
trait Matchable
class Any
case class StreamingQueryStatus(message: String, isDataAvailable: Boolean, isTriggerActive: Boolean, isActive: Boolean)

A snapshot of a StreamingQuery's current status.

A snapshot of a StreamingQuery's current status.

Value parameters

isActive

whether the query is still running.

isDataAvailable

whether there is any data available to be processed.

isTriggerActive

whether a trigger is currently active (a micro-batch is being processed).

message

a human-readable description of what the query is currently doing.

Attributes

Supertypes
trait Serializable
trait Product
trait Equals
class Object
trait Matchable
class Any
Show all
sealed trait Trigger

Policy used to indicate how often results should be produced by a StreamingQuery. Mirrors org.apache.spark.sql.streaming.Trigger. Pass an instance to DataStreamWriter#trigger.

Policy used to indicate how often results should be produced by a StreamingQuery. Mirrors org.apache.spark.sql.streaming.Trigger. Pass an instance to DataStreamWriter#trigger.

Attributes

Companion
object
Supertypes
class Object
trait Matchable
class Any
object Trigger

Factory methods for the supported Trigger policies.

Factory methods for the supported Trigger policies.

Attributes

Companion
trait
Supertypes
class Object
trait Matchable
class Any
Self type
Trigger.type