DataFrameWriter

org.apache.spark.sql.DataFrameWriter

Saves the contents of a DataFrame to external storage systems (e.g. file systems, key-value stores, tables). Use Dataset.write to access this.

Mirrors the public surface of org.apache.spark.sql.DataFrameWriter over the Spark Connect protocol.

Attributes

Example
 df.write.format("parquet").mode("overwrite").save("out.parquet")
 df.write.mode("append").saveAsTable("my_table")
Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def bucketBy(numBuckets: Int, colName: String, colNames: String*): DataFrameWriter

Buckets the output by the given columns into numBuckets buckets. If specified, the output is laid out on the file system similar to Hive's bucketing scheme.

Buckets the output by the given columns into numBuckets buckets. If specified, the output is laid out on the file system similar to Hive's bucketing scheme.

Attributes

Returns

this writer, for chaining.

def csv(path: String): Unit

Saves the content as CSV at the given path.

Saves the content as CSV at the given path.

Attributes

def format(source: String): DataFrameWriter

Specifies the output data source format (e.g. "csv", "json", "parquet", "orc").

Specifies the output data source format (e.g. "csv", "json", "parquet", "orc").

Attributes

Returns

this writer, for chaining.

def insertInto(tableName: String): Unit

Inserts the DataFrame into the given existing table (by position).

Inserts the DataFrame into the given existing table (by position).

Attributes

def json(path: String): Unit

Saves the content as JSON at the given path.

Saves the content as JSON at the given path.

Attributes

def mode(saveMode: SaveMode): DataFrameWriter

Specifies the behavior when data or table already exists, using a SaveMode.

Specifies the behavior when data or table already exists, using a SaveMode.

Attributes

def mode(saveMode: String): DataFrameWriter
def option(key: String, value: String): DataFrameWriter

Adds an output option for the underlying data source.

Adds an output option for the underlying data source.

Attributes

Returns

this writer, for chaining.

def option(key: String, value: Boolean): DataFrameWriter

Adds a boolean output option. @return this writer, for chaining.

Adds a boolean output option. @return this writer, for chaining.

Attributes

def option(key: String, value: Long): DataFrameWriter

Adds a long output option. @return this writer, for chaining.

Adds a long output option. @return this writer, for chaining.

Attributes

def option(key: String, value: Double): DataFrameWriter

Adds a double output option. @return this writer, for chaining.

Adds a double output option. @return this writer, for chaining.

Attributes

def options(options: Map[String, String]): DataFrameWriter

Adds multiple output options.

Adds multiple output options.

Attributes

Returns

this writer, for chaining.

def orc(path: String): Unit

Saves the content as ORC at the given path.

Saves the content as ORC at the given path.

Attributes

def parquet(path: String): Unit

Saves the content as Parquet at the given path.

Saves the content as Parquet at the given path.

Attributes

def partitionBy(colNames: String*): DataFrameWriter

Partitions the output by the given columns on the file system.

Partitions the output by the given columns on the file system.

Attributes

Returns

this writer, for chaining.

def save(): Unit

Saves the DataFrame as output, for data sources that do not require a path (e.g. external key-value stores).

Saves the DataFrame as output, for data sources that do not require a path (e.g. external key-value stores).

Attributes

def save(path: String): Unit

Saves the DataFrame at the given path.

Saves the DataFrame at the given path.

Attributes

def saveAsTable(tableName: String): Unit

Saves the DataFrame as the given managed/registered table.

Saves the DataFrame as the given managed/registered table.

Attributes

def sortBy(colName: String, colNames: String*): DataFrameWriter

Sorts the output in each bucket by the given columns.

Sorts the output in each bucket by the given columns.

Attributes

Returns

this writer, for chaining.

def text(path: String): Unit

Saves the content as text at the given path.

Saves the content as text at the given path.

Attributes