functions

org.apache.spark.sql.functions
object functions

Built-in functions for working with Columns, mirroring org.apache.spark.sql.functions.

 import org.apache.spark.sql.functions._
 df.select(col("id"), upper(col("name")), (col("x") + 1).as("x1"))
 df.groupBy("dept").agg(avg("salary"), count(lit(1)))

This object exposes a comprehensive subset of Spark's function library. Any Spark function not listed here can still be invoked by name via callUDF / expr.

Following Spark's convention, a String argument denotes a column name for most functions (e.g. sum("salary") aggregates the salary column), while functions whose parameters are genuinely literal (regex patterns, date formats, JSON paths, ...) treat their String arguments as literal values.

Attributes

Graph
Supertypes
class Object
trait Matchable
class Any
Self type
functions.type

Members list

Value members

Concrete methods

def abs(e: Column): Column
def acos(e: Column): Column
def acos(columnName: String): Column
def acosh(e: Column): Column
def add_months(start: Column, numMonths: Int): Column
def add_months(start: Column, numMonths: Column): Column
def aggregate(expr: Column, initialValue: Column, merge: (Column, Column) => Column, finish: Column => Column): Column

Applies a binary operator to an initial state and all array elements, then a finish step.

Applies a binary operator to an initial state and all array elements, then a finish step.

Attributes

def aggregate(expr: Column, initialValue: Column, merge: (Column, Column) => Column): Column
def any(e: Column): Column
def any_value(e: Column, ignoreNulls: Column): Column
def approx_count_distinct(columnName: String): Column
def approx_count_distinct(e: Column, rsd: Double): Column
def approx_count_distinct(columnName: String, rsd: Double): Column
def array(cols: Column*): Column
def array(colName: String, colNames: String*): Column
def array_append(column: Column, element: Any): Column
def array_compact(column: Column): Column
def array_contains(column: Column, value: Any): Column
def array_except(col1: Column, col2: Column): Column
def array_insert(arr: Column, pos: Column, value: Column): Column
def array_intersect(col1: Column, col2: Column): Column
def array_join(column: Column, delimiter: String): Column
def array_join(column: Column, delimiter: String, nullReplacement: String): Column
def array_position(column: Column, value: Any): Column
def array_prepend(column: Column, element: Any): Column
def array_remove(column: Column, element: Any): Column
def array_repeat(e: Column, count: Int): Column
def array_repeat(e: Column, count: Column): Column
def array_union(col1: Column, col2: Column): Column
def asc(columnName: String): Column
def asc_nulls_first(columnName: String): Column
def asc_nulls_last(columnName: String): Column
def ascii(e: Column): Column
def asin(e: Column): Column
def asin(columnName: String): Column
def asinh(e: Column): Column
def atan(e: Column): Column
def atan(columnName: String): Column
def atan2(y: Column, x: Column): Column
def atan2(y: Column, xName: String): Column
def atan2(yName: String, x: Column): Column
def atan2(yName: String, xName: String): Column
def atan2(y: Column, xValue: Double): Column
def atan2(yValue: Double, x: Column): Column
def atanh(e: Column): Column
def avg(e: Column): Column
def avg(columnName: String): Column
def base64(e: Column): Column
def bin(e: Column): Column
def bin(columnName: String): Column
def bit_and(e: Column): Column
def bit_or(e: Column): Column
def bit_xor(e: Column): Column
def bool_or(e: Column): Column
def broadcast(df: Dataset[_]): DataFrame

Marks a DataFrame as small enough for a broadcast join.

Marks a DataFrame as small enough for a broadcast join.

Attributes

def bround(e: Column): Column
def bround(e: Column, scale: Int): Column
def btrim(str: Column): Column
def btrim(str: Column, trim: Column): Column
def callUDF(funcName: String, cols: Column*): Column

Calls a Spark function by name with the given columns as arguments.

Calls a Spark function by name with the given columns as arguments.

Attributes

def call_function(funcName: String, cols: Column*): Column
def cbrt(e: Column): Column
def cbrt(columnName: String): Column
def ceil(e: Column): Column
def ceil(columnName: String): Column
def ceil(e: Column, scale: Column): Column
def ceiling(e: Column): Column
def coalesce(e: Column*): Column
def col(colName: String): Column

Returns a Column based on the given column name.

Returns a Column based on the given column name.

Attributes

def collect_list(columnName: String): Column
def collect_set(columnName: String): Column
def column(colName: String): Column

Returns a Column based on the given column name. Alias of col.

Returns a Column based on the given column name. Alias of col.

Attributes

def concat(exprs: Column*): Column
def concat_ws(sep: String, exprs: Column*): Column
def contains(left: Column, right: Column): Column
def conv(num: Column, fromBase: Int, toBase: Int): Column
def corr(column1: Column, column2: Column): Column
def corr(columnName1: String, columnName2: String): Column
def cos(e: Column): Column
def cos(columnName: String): Column
def cosh(e: Column): Column
def cosh(columnName: String): Column
def cot(e: Column): Column
def count(e: Column): Column
def count(columnName: String): Column
def countDistinct(expr: Column, exprs: Column*): Column
def countDistinct(columnName: String, columnNames: String*): Column
def count_distinct(expr: Column, exprs: Column*): Column
def covar_pop(column1: Column, column2: Column): Column
def covar_pop(columnName1: String, columnName2: String): Column
def covar_samp(column1: Column, column2: Column): Column
def covar_samp(columnName1: String, columnName2: String): Column
def crc32(e: Column): Column
def csc(e: Column): Column
def cume_dist(): Column
def date_add(start: Column, days: Int): Column
def date_add(start: Column, days: Column): Column
def date_diff(end: Column, start: Column): Column
def date_format(dateExpr: Column, format: String): Column
def date_part(field: Column, source: Column): Column
def date_sub(start: Column, days: Int): Column
def date_sub(start: Column, days: Column): Column
def date_trunc(format: String, timestamp: Column): Column
def datediff(end: Column, start: Column): Column
def datepart(field: Column, source: Column): Column
def day(e: Column): Column
def decode(value: Column, charset: String): Column
def degrees(e: Column): Column
def degrees(columnName: String): Column
def desc(columnName: String): Column
def desc_nulls_first(columnName: String): Column
def desc_nulls_last(columnName: String): Column
def element_at(column: Column, value: Any): Column
def encode(value: Column, charset: String): Column
def endswith(str: Column, suffix: Column): Column
def every(e: Column): Column
def exists(column: Column, f: Column => Column): Column

True if the predicate holds for any element of the array.

True if the predicate holds for any element of the array.

Attributes

def exp(e: Column): Column
def exp(columnName: String): Column
def explode(e: Column): Column
def expm1(e: Column): Column
def expm1(columnName: String): Column
def expr(expr: String): Column

Parses the expression string into the column it represents.

Parses the expression string into the column it represents.

Attributes

def extract(field: Column, source: Column): Column
def filter(column: Column, f: Column => Column): Column

Filters an array keeping elements for which the predicate holds.

Filters an array keeping elements for which the predicate holds.

Attributes

def filter(column: Column, f: (Column, Column) => Column): Column

Filters an array using the (element, index) predicate.

Filters an array using the (element, index) predicate.

Attributes

def first(e: Column): Column
def first(columnName: String): Column
def first(e: Column, ignoreNulls: Boolean): Column
def first_value(e: Column, ignoreNulls: Column): Column
def flatten(e: Column): Column
def floor(e: Column): Column
def floor(columnName: String): Column
def floor(e: Column, scale: Column): Column
def forall(column: Column, f: Column => Column): Column

True if the predicate holds for every element of the array.

True if the predicate holds for every element of the array.

Attributes

def format_number(x: Column, d: Int): Column
def format_string(format: String, arguments: Column*): Column
def from_csv(e: Column, schema: Column): Column
def from_json(e: Column, schema: String): Column
def from_json(e: Column, schema: Column): Column
def from_unixtime(ut: Column, f: String): Column
def from_utc_timestamp(ts: Column, tz: String): Column
def get(column: Column, index: Column): Column
def get_json_object(e: Column, path: String): Column
def greatest(exprs: Column*): Column
def grouping(columnName: String): Column
def grouping_id(cols: Column*): Column
def grouping_id(colName: String, colNames: String*): Column
def hash(cols: Column*): Column
def hex(column: Column): Column
def hour(e: Column): Column
def hypot(l: Column, r: Column): Column
def hypot(l: Column, rightName: String): Column
def hypot(leftName: String, r: Column): Column
def hypot(leftName: String, rightName: String): Column
def hypot(l: Column, r: Double): Column
def hypot(l: Double, r: Column): Column
def ifnull(col1: Column, col2: Column): Column
def initcap(e: Column): Column
def inline(e: Column): Column
def instr(str: Column, substring: String): Column
def isnan(e: Column): Column
def isnull(e: Column): Column
def json_tuple(json: Column, fields: String*): Column
def kurtosis(columnName: String): Column
def lag(e: Column, offset: Int): Column
def lag(columnName: String, offset: Int): Column
def lag(e: Column, offset: Int, defaultValue: Any): Column
def last(e: Column): Column
def last(columnName: String): Column
def last(e: Column, ignoreNulls: Boolean): Column
def last_value(e: Column, ignoreNulls: Column): Column
def lcase(e: Column): Column
def lead(e: Column, offset: Int): Column
def lead(columnName: String, offset: Int): Column
def lead(e: Column, offset: Int, defaultValue: Any): Column
def least(exprs: Column*): Column
def length(e: Column): Column
def levenshtein(l: Column, r: Column, threshold: Int): Column
def lit(literal: Any): Column

Creates a Column of literal value.

Creates a Column of literal value.

Attributes

def ln(e: Column): Column
def locate(substr: String, str: Column): Column
def locate(substr: String, str: Column, pos: Int): Column
def log(e: Column): Column
def log(columnName: String): Column
def log(base: Double, e: Column): Column
def log(base: Double, columnName: String): Column
def log10(e: Column): Column
def log10(columnName: String): Column
def log1p(e: Column): Column
def log1p(columnName: String): Column
def log2(e: Column): Column
def log2(columnName: String): Column
def lower(e: Column): Column
def lpad(str: Column, len: Int, pad: String): Column
def ltrim(e: Column): Column
def ltrim(e: Column, trimString: String): Column
def make_date(year: Column, month: Column, day: Column): Column
def make_timestamp(years: Column, months: Column, days: Column, hours: Column, mins: Column, secs: Column): Column
def map(cols: Column*): Column
def map_concat(cols: Column*): Column
def map_contains_key(column: Column, key: Any): Column
def map_filter(expr: Column, f: (Column, Column) => Column): Column

Filters a map keeping entries for which the (key, value) predicate holds.

Filters a map keeping entries for which the (key, value) predicate holds.

Attributes

def map_from_arrays(keys: Column, values: Column): Column
def map_zip_with(left: Column, right: Column, f: (Column, Column, Column) => Column): Column

Merges two maps by key using the (key, value1, value2) function.

Merges two maps by key using the (key, value1, value2) function.

Attributes

def max(e: Column): Column
def max(columnName: String): Column
def max_by(e: Column, ord: Column): Column
def md5(e: Column): Column
def mean(e: Column): Column
def mean(columnName: String): Column
def median(e: Column): Column
def min(e: Column): Column
def min(columnName: String): Column
def min_by(e: Column, ord: Column): Column
def minute(e: Column): Column
def mode(e: Column): Column
def month(e: Column): Column
def months_between(end: Column, start: Column): Column
def months_between(end: Column, start: Column, roundOff: Boolean): Column
def named_struct(cols: Column*): Column
def nanvl(col1: Column, col2: Column): Column
def negate(e: Column): Column
def next_day(date: Column, dayOfWeek: String): Column
def next_day(date: Column, dayOfWeek: Column): Column
def now(): Column
def nth_value(e: Column, offset: Int): Column
def nth_value(e: Column, offset: Int, ignoreNulls: Boolean): Column
def ntile(n: Int): Column
def nullif(col1: Column, col2: Column): Column
def nvl(col1: Column, col2: Column): Column
def nvl2(col1: Column, col2: Column, col3: Column): Column
def overlay(src: Column, replace: Column, pos: Column): Column
def overlay(src: Column, replace: Column, pos: Column, len: Column): Column
def percentile(e: Column, percentage: Column): Column
def percentile(e: Column, percentage: Column, frequency: Column): Column
def percentile_approx(e: Column, percentage: Column, accuracy: Column): Column
def pmod(dividend: Column, divisor: Column): Column
def pow(l: Column, r: Column): Column
def pow(l: Column, r: Double): Column
def pow(l: Double, r: Column): Column
def pow(l: Column, rightName: String): Column
def pow(leftName: String, r: Column): Column
def pow(leftName: String, rightName: String): Column
def power(l: Column, r: Column): Column
def product(e: Column): Column
def quarter(e: Column): Column
def radians(e: Column): Column
def radians(columnName: String): Column
def rand(): Column
def rand(seed: Long): Column
def randn(): Column
def randn(seed: Long): Column
def rank(): Column
def regexp_count(e: Column, pattern: Column): Column
def regexp_extract(e: Column, exp: String, groupIdx: Int): Column
def regexp_extract_all(e: Column, exp: Column, groupIdx: Column): Column
def regexp_instr(e: Column, pattern: Column): Column
def regexp_like(e: Column, pattern: Column): Column
def regexp_replace(e: Column, pattern: String, replacement: String): Column
def regexp_replace(e: Column, pattern: Column, replacement: Column): Column
def regexp_substr(e: Column, pattern: Column): Column
def repeat(str: Column, n: Int): Column
def replace(src: Column, search: Column): Column
def replace(src: Column, search: Column, replace: Column): Column
def reverse(e: Column): Column
def rint(e: Column): Column
def rint(columnName: String): Column
def round(e: Column): Column
def round(e: Column, scale: Int): Column
def rpad(str: Column, len: Int, pad: String): Column
def rtrim(e: Column): Column
def rtrim(e: Column, trimString: String): Column
def schema_of_csv(csv: String): Column
def schema_of_json(json: String): Column
def sec(e: Column): Column
def second(e: Column): Column
def sentences(string: Column): Column
def sentences(string: Column, language: Column, country: Column): Column
def sequence(start: Column, stop: Column): Column
def sequence(start: Column, stop: Column, step: Column): Column
def session_window(timeColumn: Column, gapDuration: String): Column
def sha1(e: Column): Column
def sha2(e: Column, numBits: Int): Column
def shiftleft(e: Column, numBits: Int): Column
def shiftright(e: Column, numBits: Int): Column
def shiftrightunsigned(e: Column, numBits: Int): Column
def shuffle(e: Column): Column
def signum(e: Column): Column
def signum(columnName: String): Column
def sin(e: Column): Column
def sin(columnName: String): Column
def sinh(e: Column): Column
def sinh(columnName: String): Column
def size(e: Column): Column
def skewness(columnName: String): Column
def slice(x: Column, start: Int, length: Int): Column
def slice(x: Column, start: Column, length: Column): Column
def some(e: Column): Column
def sort_array(e: Column, asc: Boolean): Column
def soundex(e: Column): Column
def split(str: Column, pattern: String): Column
def split(str: Column, pattern: String, limit: Int): Column
def split_part(str: Column, delimiter: Column, partNum: Column): Column
def sqrt(e: Column): Column
def sqrt(colName: String): Column
def startswith(str: Column, prefix: Column): Column
def stddev(e: Column): Column
def stddev(columnName: String): Column
def stddev_pop(columnName: String): Column
def stddev_samp(columnName: String): Column
def struct(cols: Column*): Column
def struct(colName: String, colNames: String*): Column
def substring(str: Column, pos: Int, len: Int): Column
def substring(str: Column, pos: Column, len: Column): Column
def substring_index(str: Column, delim: String, count: Int): Column
def sum(e: Column): Column
def sum(columnName: String): Column
def sumDistinct(columnName: String): Column
def tan(e: Column): Column
def tan(columnName: String): Column
def tanh(e: Column): Column
def tanh(columnName: String): Column
def to_char(e: Column, format: Column): Column
def to_csv(e: Column): Column
def to_date(e: Column): Column
def to_date(e: Column, fmt: String): Column
def to_json(e: Column): Column
def to_number(e: Column, format: Column): Column
def to_timestamp(e: Column, fmt: String): Column
def to_utc_timestamp(ts: Column, tz: String): Column
def to_varchar(e: Column, format: Column): Column
def transform(column: Column, f: Column => Column): Column

Transforms elements of an array using the given function.

Transforms elements of an array using the given function.

Attributes

def transform(column: Column, f: (Column, Column) => Column): Column

Transforms elements of an array using the (element, index) function.

Transforms elements of an array using the (element, index) function.

Attributes

Applies a function to every (key, value) entry of a map and returns transformed keys.

Applies a function to every (key, value) entry of a map and returns transformed keys.

Attributes

Applies a function to every (key, value) entry of a map and returns transformed values.

Applies a function to every (key, value) entry of a map and returns transformed values.

Attributes

def translate(src: Column, matchingString: String, replaceString: String): Column
def trim(e: Column): Column
def trim(e: Column, trimString: String): Column
def trunc(date: Column, format: String): Column
def typeof(e: Column): Column
def ucase(e: Column): Column
def unhex(column: Column): Column
def unix_timestamp(s: Column, p: String): Column
def upper(e: Column): Column
def uuid(): Column
def var_pop(e: Column): Column
def var_pop(columnName: String): Column
def var_samp(columnName: String): Column
def variance(columnName: String): Column
def version(): Column
def weekday(e: Column): Column
def when(condition: Column, value: Any): Column

Evaluates a list of conditions and returns one of multiple possible result expressions.

Evaluates a list of conditions and returns one of multiple possible result expressions.

Attributes

def window(timeColumn: Column, windowDuration: String): Column
def window(timeColumn: Column, windowDuration: String, slideDuration: String): Column
def window(timeColumn: Column, windowDuration: String, slideDuration: String, startTime: String): Column
def xxhash64(cols: Column*): Column
def year(e: Column): Column
def zip_with(left: Column, right: Column, f: (Column, Column) => Column): Column

Merges two arrays element-wise using the given function.

Merges two arrays element-wise using the given function.

Attributes