DataFrameNaFunctions

org.apache.spark.sql.DataFrameNaFunctions

Functionality for working with missing data in a Dataset, reached via df.na. Mirrors PySpark's DataFrame.na (DataFrameNaFunctions).

 df.na.drop()
 df.na.fill(0)
 df.na.fill(Map("name" -> "unknown", "age" -> 0))
 df.na.replace("name", Map("UNKNOWN" -> "unnamed"))

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any

Members list

Value members

Concrete methods

def drop(): DataFrame

Returns a new Dataset that drops rows containing any null values.

Returns a new Dataset that drops rows containing any null values.

Attributes

def drop(how: String): DataFrame

Returns a new Dataset that drops rows containing null values.

Returns a new Dataset that drops rows containing null values.

Value parameters

how

"any" drops a row if it contains any null, "all" drops a row only if every value is null.

Attributes

def drop(minNonNulls: Int): DataFrame

Returns a new Dataset that drops rows containing fewer than minNonNulls non-null values.

Returns a new Dataset that drops rows containing fewer than minNonNulls non-null values.

Attributes

def drop(cols: Seq[String]): DataFrame

Returns a new Dataset that drops rows containing any null values in the given columns.

Returns a new Dataset that drops rows containing any null values in the given columns.

Attributes

def drop(how: String, cols: Seq[String]): DataFrame

Returns a new Dataset that drops rows containing null values in the given columns.

Returns a new Dataset that drops rows containing null values in the given columns.

Value parameters

how

"any" drops a row if any of cols is null, "all" only if all of cols are null.

Attributes

def fill(value: Long): DataFrame

Returns a new Dataset that replaces null values in all columns with value.

Returns a new Dataset that replaces null values in all columns with value.

Attributes

def fill(value: Double): DataFrame

Returns a new Dataset that replaces null values in all columns with value.

Returns a new Dataset that replaces null values in all columns with value.

Attributes

def fill(value: String): DataFrame

Returns a new Dataset that replaces null values in all columns with value.

Returns a new Dataset that replaces null values in all columns with value.

Attributes

def fill(value: Boolean): DataFrame

Returns a new Dataset that replaces null values in all columns with value.

Returns a new Dataset that replaces null values in all columns with value.

Attributes

def fill(value: Long, cols: Seq[String]): DataFrame

Returns a new Dataset that replaces null values in cols with value.

Returns a new Dataset that replaces null values in cols with value.

Attributes

def fill(value: Double, cols: Seq[String]): DataFrame

Returns a new Dataset that replaces null values in cols with value.

Returns a new Dataset that replaces null values in cols with value.

Attributes

def fill(value: String, cols: Seq[String]): DataFrame

Returns a new Dataset that replaces null values in cols with value.

Returns a new Dataset that replaces null values in cols with value.

Attributes

def fill(value: Boolean, cols: Seq[String]): DataFrame

Returns a new Dataset that replaces null values in cols with value.

Returns a new Dataset that replaces null values in cols with value.

Attributes

def fill(valueMap: Map[String, Any]): DataFrame

Returns a new Dataset that replaces null values per column, keyed by column name.

Returns a new Dataset that replaces null values per column, keyed by column name.

Value parameters

valueMap

a column -> fill value mapping; values must be Long, Double, String or Boolean.

Attributes

def replace[T](col: String, replacement: Map[T, T]): DataFrame

Returns a new Dataset that replaces values matching keys of replacement in col.

Returns a new Dataset that replaces values matching keys of replacement in col.

Value parameters

col

the column to apply the replacement to.

replacement

an old -> new value mapping.

Attributes

def replace[T](cols: Seq[String], replacement: Map[T, T]): DataFrame

Returns a new Dataset that replaces values matching keys of replacement in cols.

Returns a new Dataset that replaces values matching keys of replacement in cols.

Value parameters

cols

the columns to apply the replacement to.

replacement

an old -> new value mapping.

Attributes