DataStreamReader

org.apache.spark.sql.streaming.DataStreamReader

Loads a streaming DataFrame from a streaming source. Use SparkSession.readStream to access this.

Mirrors the public surface of org.apache.spark.sql.streaming.DataStreamReader over the Spark Connect protocol. A streaming read is expressed as a Read relation whose is_streaming flag is set, so the resulting DataFrame is unbounded and is meant to be consumed through a DataStreamWriter.

Attributes

Example
 val df = spark.readStream.format("rate").option("rowsPerSecond", 5).load()
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def csv(path: String): DataFrame

Loads CSV file(s) and returns the result as a streaming DataFrame.

Loads CSV file(s) and returns the result as a streaming DataFrame.

Attributes

def format(source: String): DataStreamReader

Specifies the input data source format (e.g. "rate", "kafka").

Specifies the input data source format (e.g. "rate", "kafka").

Attributes

Returns

this reader, for chaining.

def json(path: String): DataFrame

Loads JSON file(s) and returns the result as a streaming DataFrame.

Loads JSON file(s) and returns the result as a streaming DataFrame.

Attributes

def load(): DataFrame

Loads input as a streaming DataFrame, for data sources that do not require a path (e.g. "rate" or "kafka").

Loads input as a streaming DataFrame, for data sources that do not require a path (e.g. "rate" or "kafka").

Attributes

def load(path: String): DataFrame

Loads input from the given path as a streaming DataFrame.

Loads input from the given path as a streaming DataFrame.

Attributes

def option(key: String, value: String): DataStreamReader

Adds an input option for the underlying data source.

Adds an input option for the underlying data source.

Attributes

Returns

this reader, for chaining.

def option(key: String, value: Boolean): DataStreamReader

Adds a boolean input option. @return this reader, for chaining.

Adds a boolean input option. @return this reader, for chaining.

Attributes

def option(key: String, value: Long): DataStreamReader

Adds a long input option. @return this reader, for chaining.

Adds a long input option. @return this reader, for chaining.

Attributes

def option(key: String, value: Double): DataStreamReader

Adds a double input option. @return this reader, for chaining.

Adds a double input option. @return this reader, for chaining.

Attributes

def options(options: Map[String, String]): DataStreamReader

Adds multiple input options.

Adds multiple input options.

Attributes

Returns

this reader, for chaining.

def orc(path: String): DataFrame

Loads ORC file(s) and returns the result as a streaming DataFrame.

Loads ORC file(s) and returns the result as a streaming DataFrame.

Attributes

def parquet(path: String): DataFrame

Loads Parquet file(s) and returns the result as a streaming DataFrame.

Loads Parquet file(s) and returns the result as a streaming DataFrame.

Attributes

def schema(schemaString: String): DataStreamReader

Specifies the input schema using a DDL-formatted string (e.g. "a INT, b STRING").

Specifies the input schema using a DDL-formatted string (e.g. "a INT, b STRING").

Attributes

Returns

this reader, for chaining.

Specifies the input schema using a StructType.

Specifies the input schema using a StructType.

Attributes

Returns

this reader, for chaining.

def table(tableName: String): DataFrame

Loads a streaming DataFrame from a registered table.

Loads a streaming DataFrame from a registered table.

Attributes

def text(path: String): DataFrame

Loads text file(s) and returns the result as a streaming DataFrame with a single value column.

Loads text file(s) and returns the result as a streaming DataFrame with a single value column.

Attributes