DataFrameReader
Loads data from external storage systems (e.g. file systems, key-value stores, JDBC) into a DataFrame. Use SparkSession.read to access this.
Mirrors the public surface of org.apache.spark.sql.DataFrameReader over the Spark Connect protocol.
Attributes
- Example
-
spark.read.format("csv").option("header", true).load("data.csv") spark.read.json("events.json") spark.read.table("my_table") - Graph
-
- Supertypes
-
class Objecttrait Matchableclass Any
Members list
Value members
Concrete methods
Specifies the input data source format (e.g. "csv", "json", "parquet", "orc").
Specifies the input data source format (e.g. "csv", "json", "parquet", "orc").
Attributes
- Returns
-
this reader, for chaining.
Constructs a DataFrame representing the database table accessible via JDBC.
Constructs a DataFrame representing the database table accessible via JDBC.
Value parameters
- properties
-
connection properties (e.g.
"user","password"); these are merged into the read options. - table
-
the name of the table in the external database (or a subquery).
- url
-
the JDBC URL of the form
jdbc:subprotocol:subname.
Attributes
Adds an input option for the underlying data source.
Adds an input option for the underlying data source.
Attributes
- Returns
-
this reader, for chaining.
Adds a boolean input option. @return this reader, for chaining.
Adds a boolean input option. @return this reader, for chaining.
Attributes
Adds a long input option. @return this reader, for chaining.
Adds a long input option. @return this reader, for chaining.
Attributes
Adds a double input option. @return this reader, for chaining.
Adds a double input option. @return this reader, for chaining.
Attributes
Adds multiple input options.
Adds multiple input options.
Attributes
- Returns
-
this reader, for chaining.
Specifies the input schema using a DDL-formatted string (e.g. "a INT, b STRING").
Specifies the input schema using a DDL-formatted string (e.g. "a INT, b STRING").
Attributes
- Returns
-
this reader, for chaining.
Specifies the input schema using a StructType.
Specifies the input schema using a StructType.
Attributes
- Returns
-
this reader, for chaining.
Loads text file(s), returning each line as a row in a single-column (value) DataFrame.
Loads text file(s), returning each line as a row in a single-column (value) DataFrame.
Apache Spark's textFile returns a Dataset[String]; this client is untyped (closures and custom encoders are not transported over Spark Connect), so the equivalent single-string-column DataFrame is returned instead.