DataFrameReader

org.apache.spark.sql.DataFrameReader

Loads data from external storage systems (e.g. file systems, key-value stores, JDBC) into a DataFrame. Use SparkSession.read to access this.

Mirrors the public surface of org.apache.spark.sql.DataFrameReader over the Spark Connect protocol.

Attributes

Example
 spark.read.format("csv").option("header", true).load("data.csv")
 spark.read.json("events.json")
 spark.read.table("my_table")
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def csv(path: String): DataFrame

Loads CSV file(s) and returns the result as a DataFrame.

Loads CSV file(s) and returns the result as a DataFrame.

Attributes

def csv(paths: String*): DataFrame

Loads CSV file(s) and returns the result as a DataFrame.

Loads CSV file(s) and returns the result as a DataFrame.

Attributes

def csv(csvDataset: Dataset[String]): DataFrame

Parses each row of a Dataset[String] as a CSV record, returning a DataFrame.

Parses each row of a Dataset[String] as a CSV record, returning a DataFrame.

Attributes

def format(source: String): DataFrameReader

Specifies the input data source format (e.g. "csv", "json", "parquet", "orc").

Specifies the input data source format (e.g. "csv", "json", "parquet", "orc").

Attributes

Returns

this reader, for chaining.

def jdbc(url: String, table: String, properties: Map[String, String]): DataFrame

Constructs a DataFrame representing the database table accessible via JDBC.

Constructs a DataFrame representing the database table accessible via JDBC.

Value parameters

properties

connection properties (e.g. "user", "password"); these are merged into the read options.

table

the name of the table in the external database (or a subquery).

url

the JDBC URL of the form jdbc:subprotocol:subname.

Attributes

def json(path: String): DataFrame

Loads JSON file(s) and returns the result as a DataFrame.

Loads JSON file(s) and returns the result as a DataFrame.

Attributes

def json(paths: String*): DataFrame

Loads JSON file(s) and returns the result as a DataFrame.

Loads JSON file(s) and returns the result as a DataFrame.

Attributes

def json(jsonDataset: Dataset[String]): DataFrame

Parses each row of a Dataset[String] as a JSON object, returning a DataFrame.

Parses each row of a Dataset[String] as a JSON object, returning a DataFrame.

Attributes

def load(): DataFrame

Loads input as a DataFrame, for data sources that do not require a path (e.g. external key-value stores).

Loads input as a DataFrame, for data sources that do not require a path (e.g. external key-value stores).

Attributes

def load(path: String): DataFrame

Loads input from the given path as a DataFrame.

Loads input from the given path as a DataFrame.

Attributes

def load(paths: String*): DataFrame

Loads input from the given paths as a DataFrame, for data sources that support reading multiple paths.

Loads input from the given paths as a DataFrame, for data sources that support reading multiple paths.

Attributes

def option(key: String, value: String): DataFrameReader

Adds an input option for the underlying data source.

Adds an input option for the underlying data source.

Attributes

Returns

this reader, for chaining.

def option(key: String, value: Boolean): DataFrameReader

Adds a boolean input option. @return this reader, for chaining.

Adds a boolean input option. @return this reader, for chaining.

Attributes

def option(key: String, value: Long): DataFrameReader

Adds a long input option. @return this reader, for chaining.

Adds a long input option. @return this reader, for chaining.

Attributes

def option(key: String, value: Double): DataFrameReader

Adds a double input option. @return this reader, for chaining.

Adds a double input option. @return this reader, for chaining.

Attributes

def options(options: Map[String, String]): DataFrameReader

Adds multiple input options.

Adds multiple input options.

Attributes

Returns

this reader, for chaining.

def orc(paths: String*): DataFrame

Loads ORC file(s) and returns the result as a DataFrame.

Loads ORC file(s) and returns the result as a DataFrame.

Attributes

def parquet(paths: String*): DataFrame

Loads Parquet file(s) and returns the result as a DataFrame.

Loads Parquet file(s) and returns the result as a DataFrame.

Attributes

def schema(schemaString: String): DataFrameReader

Specifies the input schema using a DDL-formatted string (e.g. "a INT, b STRING").

Specifies the input schema using a DDL-formatted string (e.g. "a INT, b STRING").

Attributes

Returns

this reader, for chaining.

Specifies the input schema using a StructType.

Specifies the input schema using a StructType.

Attributes

Returns

this reader, for chaining.

def table(tableName: String): DataFrame

Returns the specified table/view as a DataFrame.

Returns the specified table/view as a DataFrame.

Attributes

def text(paths: String*): DataFrame

Loads text file(s) and returns the result as a DataFrame with a single value column.

Loads text file(s) and returns the result as a DataFrame with a single value column.

Attributes

def textFile(paths: String*): DataFrame

Loads text file(s), returning each line as a row in a single-column (value) DataFrame.

Loads text file(s), returning each line as a row in a single-column (value) DataFrame.

Apache Spark's textFile returns a Dataset[String]; this client is untyped (closures and custom encoders are not transported over Spark Connect), so the equivalent single-string-column DataFrame is returned instead.

Attributes

def xml(paths: String*): DataFrame

Loads XML file(s) and returns the result as a DataFrame.

Loads XML file(s) and returns the result as a DataFrame.

Attributes