Module: SparkConnect::Functions
Overview
The standard Spark SQL function library, mirroring PySpark's
pyspark.sql.functions. Every function returns a Column.
Available both as SparkConnect::Functions and the shorthand
SparkConnect::F. All methods are module functions.
Following PySpark's convention, a String argument denotes a column name
for most functions (e.g. F.sum("salary") aggregates the salary column),
while functions whose parameters are genuinely literal (regex patterns, date
formats, JSON paths, ...) treat their String arguments as literal values.
Constant Summary collapse
- Proto =
SparkConnect::Proto
- UNIFORM =
The following functions are generated programmatically below (
UNIFORMandNO_ARG). The@!methoddirectives document them so they appear in the API reference; each returns a Column.---- Generated uniform functions -------------------------------------- Functions whose arguments are all ColumnOrName (a String denotes a column name). Defined programmatically to keep the surface complete and compact.
%w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze
- NO_ARG =
No-argument functions.
%w[ current_date current_timestamp now current_timezone current_user current_catalog current_database current_schema monotonically_increasing_id spark_partition_id input_file_name input_file_block_start input_file_block_length version uuid row_number rank dense_rank percent_rank cume_dist ].freeze
Class Attribute Summary collapse
- .lambda_counter ⇒ Object private
Instance Method Summary collapse
-
#_col(value) ⇒ Object
private
ColumnOrName coercion: String/Symbol -> column reference, Column -> itself, everything else -> literal.
-
#_lambda(block) ⇒ Object
private
Build a Column wrapping a LambdaFunction from a Ruby block.
- #_lit_or_col(value) ⇒ Object private
-
#abs(*cols) ⇒ Column
The Spark SQL
absfunction. -
#acos(*cols) ⇒ Column
The Spark SQL
acosfunction. -
#acosh(*cols) ⇒ Column
The Spark SQL
acoshfunction. - #add_months(col, months) ⇒ Object
-
#aggregate(col, initial, merge, finish = nil) ⇒ Column
Aggregate (fold) an array.
-
#any_value(*cols) ⇒ Column
The Spark SQL
any_valuefunction. -
#approx_count_distinct(col, rsd = nil) ⇒ Column
Approximate distinct count (optionally with relative SD).
-
#array(*cols) ⇒ Column
An array from the given columns.
- #array_append(col, value) ⇒ Object
-
#array_compact(*cols) ⇒ Column
The Spark SQL
array_compactfunction. -
#array_contains(col, value) ⇒ Object
---- Array / map functions with value arguments -----------------------.
-
#array_distinct(*cols) ⇒ Column
The Spark SQL
array_distinctfunction. -
#array_except(*cols) ⇒ Column
The Spark SQL
array_exceptfunction. - #array_insert(col, pos, value) ⇒ Object
-
#array_intersect(*cols) ⇒ Column
The Spark SQL
array_intersectfunction. - #array_join(col, delimiter, null_replacement = nil) ⇒ Object
-
#array_max(*cols) ⇒ Column
The Spark SQL
array_maxfunction. -
#array_min(*cols) ⇒ Column
The Spark SQL
array_minfunction. - #array_position(col, value) ⇒ Object
- #array_prepend(col, value) ⇒ Object
- #array_remove(col, element) ⇒ Object
- #array_repeat(col, count) ⇒ Object
-
#array_sort(*cols) ⇒ Column
The Spark SQL
array_sortfunction. -
#array_union(*cols) ⇒ Column
The Spark SQL
array_unionfunction. -
#arrays_overlap(*cols) ⇒ Column
The Spark SQL
arrays_overlapfunction. -
#arrays_zip(*cols) ⇒ Column
The Spark SQL
arrays_zipfunction. -
#asc(col) ⇒ Column
An ascending sort order for the named/given column.
- #asc_nulls_first(col) ⇒ Object
- #asc_nulls_last(col) ⇒ Object
-
#ascii(*cols) ⇒ Column
The Spark SQL
asciifunction. -
#asin(*cols) ⇒ Column
The Spark SQL
asinfunction. -
#asinh(*cols) ⇒ Column
The Spark SQL
asinhfunction. -
#atan(*cols) ⇒ Column
The Spark SQL
atanfunction. -
#atan2(*cols) ⇒ Column
The Spark SQL
atan2function. -
#atanh(*cols) ⇒ Column
The Spark SQL
atanhfunction. -
#avg(*cols) ⇒ Column
The Spark SQL
avgfunction. -
#base64(*cols) ⇒ Column
The Spark SQL
base64function. -
#bin(*cols) ⇒ Column
The Spark SQL
binfunction. -
#bit_and(*cols) ⇒ Column
The Spark SQL
bit_andfunction. -
#bit_count(*cols) ⇒ Column
The Spark SQL
bit_countfunction. -
#bit_length(*cols) ⇒ Column
The Spark SQL
bit_lengthfunction. -
#bit_or(*cols) ⇒ Column
The Spark SQL
bit_orfunction. -
#bit_xor(*cols) ⇒ Column
The Spark SQL
bit_xorfunction. -
#bitwise_not(*cols) ⇒ Column
The Spark SQL
bitwise_notfunction. -
#bool_and(*cols) ⇒ Column
The Spark SQL
bool_andfunction. -
#bool_or(*cols) ⇒ Column
The Spark SQL
bool_orfunction. -
#broadcast(df) ⇒ DataFrame
Mark a DataFrame for broadcast (map-side) join.
-
#bround(col, scale = 0) ⇒ Column
HALF_EVEN ("banker's") rounding to
scaleplaces. -
#cardinality(*cols) ⇒ Column
The Spark SQL
cardinalityfunction. -
#cbrt(*cols) ⇒ Column
The Spark SQL
cbrtfunction. -
#ceil(*cols) ⇒ Column
The Spark SQL
ceilfunction. -
#ceiling(*cols) ⇒ Column
The Spark SQL
ceilingfunction. -
#char_length(*cols) ⇒ Column
The Spark SQL
char_lengthfunction. -
#character_length(*cols) ⇒ Column
The Spark SQL
character_lengthfunction. -
#coalesce(*cols) ⇒ Column
First non-null among the given columns.
-
#col(name) ⇒ Column
(also: #column)
A column reference by name.
-
#collect_list(*cols) ⇒ Column
The Spark SQL
collect_listfunction. -
#collect_set(*cols) ⇒ Column
The Spark SQL
collect_setfunction. -
#concat(*cols) ⇒ Column
The Spark SQL
concatfunction. -
#concat_ws(sep, *cols) ⇒ Column
Concatenation of columns separated by literal
sep. -
#conv(col, from_base, to_base) ⇒ Column
Convert a number string from
from_basetoto_base. -
#corr(*cols) ⇒ Column
The Spark SQL
corrfunction. -
#cos(*cols) ⇒ Column
The Spark SQL
cosfunction. -
#cosh(*cols) ⇒ Column
The Spark SQL
coshfunction. -
#cot(*cols) ⇒ Column
The Spark SQL
cotfunction. -
#count(col) ⇒ Column
Count of rows (or non-null values of a column).
-
#count_distinct(*cols) ⇒ Column
(also: #countDistinct)
Count of distinct combinations of the given columns.
-
#count_if(*cols) ⇒ Column
The Spark SQL
count_iffunction. -
#covar_pop(*cols) ⇒ Column
The Spark SQL
covar_popfunction. -
#covar_samp(*cols) ⇒ Column
The Spark SQL
covar_sampfunction. -
#crc32(*cols) ⇒ Column
The Spark SQL
crc32function. -
#create_map(*cols) ⇒ Column
A map from alternating key/value columns.
-
#csc(*cols) ⇒ Column
The Spark SQL
cscfunction. -
#cume_dist ⇒ Column
The Spark SQL
cume_distfunction (takes no arguments). -
#current_catalog ⇒ Column
The Spark SQL
current_catalogfunction (takes no arguments). -
#current_database ⇒ Column
The Spark SQL
current_databasefunction (takes no arguments). -
#current_date ⇒ Column
The Spark SQL
current_datefunction (takes no arguments). -
#current_schema ⇒ Column
The Spark SQL
current_schemafunction (takes no arguments). -
#current_timestamp ⇒ Column
The Spark SQL
current_timestampfunction (takes no arguments). -
#current_timezone ⇒ Column
The Spark SQL
current_timezonefunction (takes no arguments). -
#current_user ⇒ Column
The Spark SQL
current_userfunction (takes no arguments). - #date_add(col, days) ⇒ Object
-
#date_format(col, fmt) ⇒ Object
---- Date / time functions with literal arguments ---------------------.
-
#date_from_unix_date(*cols) ⇒ Column
The Spark SQL
date_from_unix_datefunction. - #date_sub(col, days) ⇒ Object
- #date_trunc(fmt, col) ⇒ Object
- #datediff(end_col, start_col) ⇒ Object
-
#day(*cols) ⇒ Column
The Spark SQL
dayfunction. -
#dayofmonth(*cols) ⇒ Column
The Spark SQL
dayofmonthfunction. -
#dayofweek(*cols) ⇒ Column
The Spark SQL
dayofweekfunction. -
#dayofyear(*cols) ⇒ Column
The Spark SQL
dayofyearfunction. -
#degrees(*cols) ⇒ Column
The Spark SQL
degreesfunction. -
#dense_rank ⇒ Column
The Spark SQL
dense_rankfunction (takes no arguments). - #desc(col) ⇒ Object
- #desc_nulls_first(col) ⇒ Object
- #desc_nulls_last(col) ⇒ Object
- #element_at(col, extraction) ⇒ Object
-
#every(*cols) ⇒ Column
The Spark SQL
everyfunction. - #exists(col, &block) ⇒ Object
-
#exp(*cols) ⇒ Column
The Spark SQL
expfunction. -
#explode(*cols) ⇒ Column
The Spark SQL
explodefunction. -
#explode_outer(*cols) ⇒ Column
The Spark SQL
explode_outerfunction. -
#expm1(*cols) ⇒ Column
The Spark SQL
expm1function. -
#expr(sql) ⇒ Column
Parse a SQL expression string into a Column.
-
#factorial(*cols) ⇒ Column
The Spark SQL
factorialfunction. - #filter(col, &block) ⇒ Object
-
#first(*cols) ⇒ Column
The Spark SQL
firstfunction. -
#first_value(*cols) ⇒ Column
The Spark SQL
first_valuefunction. -
#flatten(*cols) ⇒ Column
The Spark SQL
flattenfunction. -
#floor(*cols) ⇒ Column
The Spark SQL
floorfunction. - #forall(col, &block) ⇒ Object
-
#format_number(col, d) ⇒ Column
Number formatted to
ddecimal places. -
#format_string(fmt, *cols) ⇒ Column
Printf-style formatting using literal
fmt. - #from_json(col, schema, options = {}) ⇒ Object
- #from_unixtime(col, fmt = "yyyy-MM-dd HH:mm:ss") ⇒ Object
- #from_utc_timestamp(col, tz) ⇒ Object
-
#get_json_object(col, path) ⇒ Object
---- JSON / CSV --------------------------------------------------------.
-
#greatest(*cols) ⇒ Column
The Spark SQL
greatestfunction. -
#grouping(*cols) ⇒ Column
The Spark SQL
groupingfunction. -
#hash(*cols) ⇒ Column
The Spark SQL
hashfunction. -
#hex(*cols) ⇒ Column
The Spark SQL
hexfunction. -
#hour(*cols) ⇒ Column
The Spark SQL
hourfunction. -
#hypot(*cols) ⇒ Column
The Spark SQL
hypotfunction. -
#initcap(*cols) ⇒ Column
The Spark SQL
initcapfunction. -
#inline(*cols) ⇒ Column
The Spark SQL
inlinefunction. -
#inline_outer(*cols) ⇒ Column
The Spark SQL
inline_outerfunction. -
#input_file_block_length ⇒ Column
The Spark SQL
input_file_block_lengthfunction (takes no arguments). -
#input_file_block_start ⇒ Column
The Spark SQL
input_file_block_startfunction (takes no arguments). -
#input_file_name ⇒ Column
The Spark SQL
input_file_namefunction (takes no arguments). -
#instr(col, substr) ⇒ Column
1-based position of literal
substrwithincol(0 if absent). -
#isnan(*cols) ⇒ Column
The Spark SQL
isnanfunction. -
#isnull(*cols) ⇒ Column
The Spark SQL
isnullfunction. - #json_tuple(col, *fields) ⇒ Object
-
#kurtosis(*cols) ⇒ Column
The Spark SQL
kurtosisfunction. -
#lag(col, offset = 1, default = nil) ⇒ Object
---- Window / analytic functions --------------------------------------.
-
#last(*cols) ⇒ Column
The Spark SQL
lastfunction. -
#last_day(*cols) ⇒ Column
The Spark SQL
last_dayfunction. -
#last_value(*cols) ⇒ Column
The Spark SQL
last_valuefunction. -
#lcase(*cols) ⇒ Column
The Spark SQL
lcasefunction. - #lead(col, offset = 1, default = nil) ⇒ Object
-
#least(*cols) ⇒ Column
The Spark SQL
leastfunction. -
#length(*cols) ⇒ Column
The Spark SQL
lengthfunction. -
#lit(value) ⇒ Column
A literal value column.
-
#ln(*cols) ⇒ Column
The Spark SQL
lnfunction. -
#locate(substr, col, pos = 1) ⇒ Column
1-based position of
substrincolat/afterpos. -
#log(*cols) ⇒ Column
The Spark SQL
logfunction. -
#log10(*cols) ⇒ Column
The Spark SQL
log10function. -
#log1p(*cols) ⇒ Column
The Spark SQL
log1pfunction. -
#log2(*cols) ⇒ Column
The Spark SQL
log2function. -
#lower(*cols) ⇒ Column
The Spark SQL
lowerfunction. -
#lpad(col, len, pad) ⇒ Column
Left-padded string.
-
#ltrim(*cols) ⇒ Column
The Spark SQL
ltrimfunction. - #make_date(year, month, day) ⇒ Object
-
#map_concat(*cols) ⇒ Column
The Spark SQL
map_concatfunction. - #map_contains_key(col, key) ⇒ Object
-
#map_entries(*cols) ⇒ Column
The Spark SQL
map_entriesfunction. - #map_filter(col, &block) ⇒ Object
-
#map_from_arrays(keys, values) ⇒ Column
A map from two array columns (keys, values).
-
#map_from_entries(*cols) ⇒ Column
The Spark SQL
map_from_entriesfunction. -
#map_keys(*cols) ⇒ Column
The Spark SQL
map_keysfunction. -
#map_values(*cols) ⇒ Column
The Spark SQL
map_valuesfunction. - #map_zip_with(c1, c2, &block) ⇒ Object
-
#max(*cols) ⇒ Column
The Spark SQL
maxfunction. -
#max_by(*cols) ⇒ Column
The Spark SQL
max_byfunction. -
#md5(*cols) ⇒ Column
The Spark SQL
md5function. -
#mean(*cols) ⇒ Column
The Spark SQL
meanfunction. -
#median(*cols) ⇒ Column
The Spark SQL
medianfunction. -
#min(*cols) ⇒ Column
The Spark SQL
minfunction. -
#min_by(*cols) ⇒ Column
The Spark SQL
min_byfunction. -
#minute(*cols) ⇒ Column
The Spark SQL
minutefunction. -
#mode(*cols) ⇒ Column
The Spark SQL
modefunction. -
#monotonically_increasing_id ⇒ Column
The Spark SQL
monotonically_increasing_idfunction (takes no arguments). -
#month(*cols) ⇒ Column
The Spark SQL
monthfunction. - #months_between(d1, d2, round_off = true) ⇒ Object
-
#named_struct(*cols) ⇒ Column
A named struct from alternating name/value arguments.
-
#nanvl(col1, col2) ⇒ Column
valueifcolis NaN elsecol. -
#negate(*cols) ⇒ Column
The Spark SQL
negatefunction. -
#negative(*cols) ⇒ Column
The Spark SQL
negativefunction. - #next_day(col, day_of_week) ⇒ Object
-
#now ⇒ Column
The Spark SQL
nowfunction (takes no arguments). - #nth_value(col, offset, ignore_nulls = false) ⇒ Object
- #ntile(n) ⇒ Object
-
#octet_length(*cols) ⇒ Column
The Spark SQL
octet_lengthfunction. -
#overlay(col, replace, pos, len = -1)) ⇒ Column
Overlay
replaceintocolatposforlenchars. -
#percent_rank ⇒ Column
The Spark SQL
percent_rankfunction (takes no arguments). -
#pmod(*cols) ⇒ Column
The Spark SQL
pmodfunction. -
#posexplode(*cols) ⇒ Column
The Spark SQL
posexplodefunction. -
#posexplode_outer(*cols) ⇒ Column
The Spark SQL
posexplode_outerfunction. -
#positive(*cols) ⇒ Column
The Spark SQL
positivefunction. -
#pow(*cols) ⇒ Column
The Spark SQL
powfunction. -
#power(*cols) ⇒ Column
The Spark SQL
powerfunction. -
#product(*cols) ⇒ Column
The Spark SQL
productfunction. -
#quarter(*cols) ⇒ Column
The Spark SQL
quarterfunction. -
#radians(*cols) ⇒ Column
The Spark SQL
radiansfunction. -
#rand(seed = nil) ⇒ Object
---- Randomness --------------------------------------------------------.
- #randn(seed = nil) ⇒ Object
-
#rank ⇒ Column
The Spark SQL
rankfunction (takes no arguments). - #regexp_count(col, pattern) ⇒ Object
-
#regexp_extract(col, pattern, idx = 0) ⇒ Column
The
idx-th group ofpatternmatched incol. -
#regexp_extract_all(col, pattern, idx = 1) ⇒ Column
All matches of group
idxofpattern. -
#regexp_like(col, pattern) ⇒ Column
Whether
colmatchespattern. -
#regexp_replace(col, pattern, replacement) ⇒ Column
colwithpatternreplaced byreplacement. - #regexp_substr(col, pattern) ⇒ Object
-
#repeat(col, n) ⇒ Column
The string repeated
ntimes. -
#reverse(*cols) ⇒ Column
The Spark SQL
reversefunction. -
#rint(*cols) ⇒ Column
The Spark SQL
rintfunction. -
#round(col, scale = 0) ⇒ Column
HALF_UP rounding to
scaledecimal places. -
#row_number ⇒ Column
The Spark SQL
row_numberfunction (takes no arguments). -
#rpad(col, len, pad) ⇒ Column
Right-padded string.
-
#rtrim(*cols) ⇒ Column
The Spark SQL
rtrimfunction. - #schema_of_json(json, options = {}) ⇒ Object
-
#sec(*cols) ⇒ Column
The Spark SQL
secfunction. -
#second(*cols) ⇒ Column
The Spark SQL
secondfunction. - #sequence(start, stop, step = nil) ⇒ Object
-
#sha(*cols) ⇒ Column
The Spark SQL
shafunction. -
#sha1(*cols) ⇒ Column
The Spark SQL
sha1function. -
#sha2(col, num_bits) ⇒ Column
SHA-2 hash with the given bit length (224/256/384/512).
-
#shiftleft(col, num_bits) ⇒ Column
Left shift / right shift by literal bit counts.
- #shiftright(col, num_bits) ⇒ Object
- #shiftrightunsigned(col, num_bits) ⇒ Object
-
#shuffle(*cols) ⇒ Column
The Spark SQL
shufflefunction. -
#signum(*cols) ⇒ Column
The Spark SQL
signumfunction. -
#sin(*cols) ⇒ Column
The Spark SQL
sinfunction. -
#sinh(*cols) ⇒ Column
The Spark SQL
sinhfunction. -
#size(*cols) ⇒ Column
The Spark SQL
sizefunction. -
#skewness(*cols) ⇒ Column
The Spark SQL
skewnessfunction. - #slice(col, start, length) ⇒ Object
-
#some(*cols) ⇒ Column
The Spark SQL
somefunction. -
#sort_array(col, asc = true) ⇒ Object
---- Sorting helpers ---------------------------------------------------.
-
#soundex(*cols) ⇒ Column
The Spark SQL
soundexfunction. -
#spark_partition_id ⇒ Column
The Spark SQL
spark_partition_idfunction (takes no arguments). -
#split(col, pattern, limit = -1)) ⇒ Column
Split
colby the literal regexpattern. -
#sqrt(*cols) ⇒ Column
The Spark SQL
sqrtfunction. -
#stddev(*cols) ⇒ Column
The Spark SQL
stddevfunction. -
#stddev_pop(*cols) ⇒ Column
The Spark SQL
stddev_popfunction. -
#stddev_samp(*cols) ⇒ Column
The Spark SQL
stddev_sampfunction. -
#struct(*cols) ⇒ Column
A struct from the given columns.
-
#substring(col, pos, len) ⇒ Column
Substring of length
lenfrom 1-basedpos. -
#substring_index(col, delim, count) ⇒ Column
Substring before the
count-th occurrence ofdelim. -
#sum(*cols) ⇒ Column
The Spark SQL
sumfunction. -
#sum_distinct(col) ⇒ Column
Sum of distinct values.
-
#tan(*cols) ⇒ Column
The Spark SQL
tanfunction. -
#tanh(*cols) ⇒ Column
The Spark SQL
tanhfunction. -
#timestamp_micros(*cols) ⇒ Column
The Spark SQL
timestamp_microsfunction. -
#timestamp_millis(*cols) ⇒ Column
The Spark SQL
timestamp_millisfunction. -
#timestamp_seconds(*cols) ⇒ Column
The Spark SQL
timestamp_secondsfunction. - #to_date(col, fmt = nil) ⇒ Object
- #to_json(col, options = {}) ⇒ Object
- #to_timestamp(col, fmt = nil) ⇒ Object
- #to_utc_timestamp(col, tz) ⇒ Object
-
#transform(col) {|element| ... } ⇒ Column
Transform each element of an array.
- #transform_keys(col, &block) ⇒ Object
- #transform_values(col, &block) ⇒ Object
-
#translate(col, matching, replace) ⇒ Column
Characters of
colmatchingmatchingreplaced perreplace. -
#trim(*cols) ⇒ Column
The Spark SQL
trimfunction. - #trunc(col, fmt) ⇒ Object
-
#typeof(*cols) ⇒ Column
The Spark SQL
typeoffunction. -
#ucase(*cols) ⇒ Column
The Spark SQL
ucasefunction. -
#udf ⇒ Object
UDFs require a server-side execution environment (Python/Scala) and are not supported by the pure-Ruby client.
-
#unbase64(*cols) ⇒ Column
The Spark SQL
unbase64function. -
#unhex(*cols) ⇒ Column
The Spark SQL
unhexfunction. -
#unix_date(*cols) ⇒ Column
The Spark SQL
unix_datefunction. -
#unix_micros(*cols) ⇒ Column
The Spark SQL
unix_microsfunction. -
#unix_millis(*cols) ⇒ Column
The Spark SQL
unix_millisfunction. -
#unix_seconds(*cols) ⇒ Column
The Spark SQL
unix_secondsfunction. - #unix_timestamp(col = nil, fmt = "yyyy-MM-dd HH:mm:ss") ⇒ Object
-
#upper(*cols) ⇒ Column
The Spark SQL
upperfunction. -
#uuid ⇒ Column
The Spark SQL
uuidfunction (takes no arguments). -
#var_pop(*cols) ⇒ Column
The Spark SQL
var_popfunction. -
#var_samp(*cols) ⇒ Column
The Spark SQL
var_sampfunction. -
#variance(*cols) ⇒ Column
The Spark SQL
variancefunction. -
#version ⇒ Column
The Spark SQL
versionfunction (takes no arguments). -
#weekday(*cols) ⇒ Column
The Spark SQL
weekdayfunction. -
#weekofyear(*cols) ⇒ Column
The Spark SQL
weekofyearfunction. -
#when(condition, value) ⇒ Column
Start a CASE WHEN expression.
-
#xxhash64(*cols) ⇒ Column
The Spark SQL
xxhash64function. -
#year(*cols) ⇒ Column
The Spark SQL
yearfunction. - #zip_with(left, right, &block) ⇒ Object
Class Attribute Details
.lambda_counter ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
880 881 882 |
# File 'lib/spark_connect/functions.rb', line 880 def lambda_counter @lambda_counter end |
Instance Method Details
#_col(value) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
ColumnOrName coercion: String/Symbol -> column reference, Column -> itself, everything else -> literal.
863 864 865 866 867 868 869 |
# File 'lib/spark_connect/functions.rb', line 863 def _col(value) case value when Column then value when String, Symbol then col(value.to_s) else lit(value) end end |
#_lambda(block) ⇒ Object
886 887 888 889 890 891 892 893 894 895 896 897 898 |
# File 'lib/spark_connect/functions.rb', line 886 def _lambda(block) arity = block.arity.negative? ? 1 : [block.arity, 1].max Functions.lambda_counter += 1 names = (0...arity).map { |i| "x_#{Functions.lambda_counter}_#{i}" } vars = names.map do |n| Proto::Expression::UnresolvedNamedLambdaVariable.new(name_parts: [n]) end cols = vars.map { |v| Column.new(Proto::Expression.new(unresolved_named_lambda_variable: v)) } body = block.call(*cols) Column.new(Proto::Expression.new( lambda_function: Proto::Expression::LambdaFunction.new(function: body.to_expr, arguments: vars) )) end |
#_lit_or_col(value) ⇒ Object
This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.
872 873 874 |
# File 'lib/spark_connect/functions.rb', line 872 def _lit_or_col(value) value.is_a?(Column) ? value : lit(value) end |
#abs(*cols) ⇒ Column
The Spark SQL abs function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#acos(*cols) ⇒ Column
The Spark SQL acos function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#acosh(*cols) ⇒ Column
The Spark SQL acosh function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#add_months(col, months) ⇒ Object
159 |
# File 'lib/spark_connect/functions.rb', line 159 def add_months(col, months) = Column.invoke("add_months", _col(col), lit(months)) |
#aggregate(col, initial, merge, finish = nil) ⇒ Column
Aggregate (fold) an array. merge combines accumulator and element;
optional finish post-processes the result.
258 259 260 261 262 |
# File 'lib/spark_connect/functions.rb', line 258 def aggregate(col, initial, merge, finish = nil) args = [_col(col), _col(initial), _lambda(merge)] args << _lambda(finish) if finish Column.invoke("aggregate", *args) end |
#any_value(*cols) ⇒ Column
The Spark SQL any_value function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#approx_count_distinct(col, rsd = nil) ⇒ Column
Returns approximate distinct count (optionally with relative SD).
70 71 72 |
# File 'lib/spark_connect/functions.rb', line 70 def approx_count_distinct(col, rsd = nil) rsd.nil? ? Column.invoke("approx_count_distinct", _col(col)) : Column.invoke("approx_count_distinct", _col(col), lit(rsd)) end |
#array(*cols) ⇒ Column
Returns an array from the given columns.
96 97 |
# File 'lib/spark_connect/functions.rb', line 96 def array(*cols) = Column.invoke("array", *cols.map { |c| _col(c) }) # @return [Column] a map from alternating key/value columns. |
#array_append(col, value) ⇒ Object
201 |
# File 'lib/spark_connect/functions.rb', line 201 def array_append(col, value) = Column.invoke("array_append", _col(col), lit(value)) |
#array_compact(*cols) ⇒ Column
The Spark SQL array_compact function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_contains(col, value) ⇒ Object
---- Array / map functions with value arguments -----------------------
197 |
# File 'lib/spark_connect/functions.rb', line 197 def array_contains(col, value) = Column.invoke("array_contains", _col(col), lit(value)) |
#array_distinct(*cols) ⇒ Column
The Spark SQL array_distinct function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_except(*cols) ⇒ Column
The Spark SQL array_except function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_insert(col, pos, value) ⇒ Object
203 |
# File 'lib/spark_connect/functions.rb', line 203 def array_insert(col, pos, value) = Column.invoke("array_insert", _col(col), lit(pos), lit(value)) |
#array_intersect(*cols) ⇒ Column
The Spark SQL array_intersect function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_join(col, delimiter, null_replacement = nil) ⇒ Object
205 206 207 208 209 210 211 212 |
# File 'lib/spark_connect/functions.rb', line 205 def array_join(col, delimiter, null_replacement = nil) if null_replacement.nil? Column.invoke("array_join", _col(col), lit(delimiter)) else Column.invoke("array_join", _col(col), lit(delimiter), lit(null_replacement)) end end |
#array_max(*cols) ⇒ Column
The Spark SQL array_max function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_min(*cols) ⇒ Column
The Spark SQL array_min function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_position(col, value) ⇒ Object
198 |
# File 'lib/spark_connect/functions.rb', line 198 def array_position(col, value) = Column.invoke("array_position", _col(col), lit(value)) |
#array_prepend(col, value) ⇒ Object
202 |
# File 'lib/spark_connect/functions.rb', line 202 def array_prepend(col, value) = Column.invoke("array_prepend", _col(col), lit(value)) |
#array_remove(col, element) ⇒ Object
199 |
# File 'lib/spark_connect/functions.rb', line 199 def array_remove(col, element) = Column.invoke("array_remove", _col(col), lit(element)) |
#array_repeat(col, count) ⇒ Object
200 |
# File 'lib/spark_connect/functions.rb', line 200 def array_repeat(col, count) = Column.invoke("array_repeat", _col(col), lit(count)) |
#array_sort(*cols) ⇒ Column
The Spark SQL array_sort function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#array_union(*cols) ⇒ Column
The Spark SQL array_union function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#arrays_overlap(*cols) ⇒ Column
The Spark SQL arrays_overlap function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#arrays_zip(*cols) ⇒ Column
The Spark SQL arrays_zip function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#asc(col) ⇒ Column
Returns an ascending sort order for the named/given column.
42 |
# File 'lib/spark_connect/functions.rb', line 42 def asc(col) = _col(col).asc |
#asc_nulls_first(col) ⇒ Object
44 |
# File 'lib/spark_connect/functions.rb', line 44 def asc_nulls_first(col) = _col(col).asc_nulls_first |
#asc_nulls_last(col) ⇒ Object
45 |
# File 'lib/spark_connect/functions.rb', line 45 def asc_nulls_last(col) = _col(col).asc_nulls_last |
#ascii(*cols) ⇒ Column
The Spark SQL ascii function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#asin(*cols) ⇒ Column
The Spark SQL asin function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#asinh(*cols) ⇒ Column
The Spark SQL asinh function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#atan(*cols) ⇒ Column
The Spark SQL atan function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#atan2(*cols) ⇒ Column
The Spark SQL atan2 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#atanh(*cols) ⇒ Column
The Spark SQL atanh function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#avg(*cols) ⇒ Column
The Spark SQL avg function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#base64(*cols) ⇒ Column
The Spark SQL base64 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bin(*cols) ⇒ Column
The Spark SQL bin function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_and(*cols) ⇒ Column
The Spark SQL bit_and function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_count(*cols) ⇒ Column
The Spark SQL bit_count function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_length(*cols) ⇒ Column
The Spark SQL bit_length function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_or(*cols) ⇒ Column
The Spark SQL bit_or function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bit_xor(*cols) ⇒ Column
The Spark SQL bit_xor function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bitwise_not(*cols) ⇒ Column
The Spark SQL bitwise_not function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bool_and(*cols) ⇒ Column
The Spark SQL bool_and function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#bool_or(*cols) ⇒ Column
The Spark SQL bool_or function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#broadcast(df) ⇒ DataFrame
Mark a DataFrame for broadcast (map-side) join.
269 |
# File 'lib/spark_connect/functions.rb', line 269 def broadcast(df) = df.hint("broadcast") |
#bround(col, scale = 0) ⇒ Column
Returns HALF_EVEN ("banker's") rounding to scale places.
82 |
# File 'lib/spark_connect/functions.rb', line 82 def bround(col, scale = 0) = Column.invoke("bround", _col(col), lit(scale)) |
#cardinality(*cols) ⇒ Column
The Spark SQL cardinality function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cbrt(*cols) ⇒ Column
The Spark SQL cbrt function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#ceil(*cols) ⇒ Column
The Spark SQL ceil function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#ceiling(*cols) ⇒ Column
The Spark SQL ceiling function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#char_length(*cols) ⇒ Column
The Spark SQL char_length function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#character_length(*cols) ⇒ Column
The Spark SQL character_length function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#coalesce(*cols) ⇒ Column
Returns first non-null among the given columns.
87 88 |
# File 'lib/spark_connect/functions.rb', line 87 def coalesce(*cols) = Column.invoke("coalesce", *cols.map { |c| _col(c) }) # @return [Column] `value` if `col` is NaN else `col`. |
#col(name) ⇒ Column Also known as: column
A column reference by name. "*" selects all columns.
28 |
# File 'lib/spark_connect/functions.rb', line 28 def col(name) = Column.from_name(name.to_s) |
#collect_list(*cols) ⇒ Column
The Spark SQL collect_list function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#collect_set(*cols) ⇒ Column
The Spark SQL collect_set function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#concat(*cols) ⇒ Column
The Spark SQL concat function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#concat_ws(sep, *cols) ⇒ Column
Returns concatenation of columns separated by literal sep.
107 108 |
# File 'lib/spark_connect/functions.rb', line 107 def concat_ws(sep, *cols) = Column.invoke("concat_ws", lit(sep), *cols.map { |c| _col(c) }) # @return [Column] printf-style formatting using literal `fmt`. |
#conv(col, from_base, to_base) ⇒ Column
Returns convert a number string from from_base to to_base.
145 146 |
# File 'lib/spark_connect/functions.rb', line 145 def conv(col, from_base, to_base) = Column.invoke("conv", _col(col), lit(from_base), lit(to_base)) # @return [Column] left shift / right shift by literal bit counts. |
#corr(*cols) ⇒ Column
The Spark SQL corr function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cos(*cols) ⇒ Column
The Spark SQL cos function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cosh(*cols) ⇒ Column
The Spark SQL cosh function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cot(*cols) ⇒ Column
The Spark SQL cot function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#count(col) ⇒ Column
Returns count of rows (or non-null values of a column). "*"
counts all rows.
59 60 61 |
# File 'lib/spark_connect/functions.rb', line 59 def count(col) col.to_s == "*" ? Column.invoke("count", lit(1)) : Column.invoke("count", _col(col)) end |
#count_distinct(*cols) ⇒ Column Also known as: countDistinct
Returns count of distinct combinations of the given columns.
64 65 66 |
# File 'lib/spark_connect/functions.rb', line 64 def count_distinct(*cols) Column.invoke("count", *cols.map { |c| _col(c) }, is_distinct: true) end |
#count_if(*cols) ⇒ Column
The Spark SQL count_if function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#covar_pop(*cols) ⇒ Column
The Spark SQL covar_pop function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#covar_samp(*cols) ⇒ Column
The Spark SQL covar_samp function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#crc32(*cols) ⇒ Column
The Spark SQL crc32 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#create_map(*cols) ⇒ Column
Returns a map from alternating key/value columns.
98 99 |
# File 'lib/spark_connect/functions.rb', line 98 def create_map(*cols) = Column.invoke("map", *cols.map { |c| _col(c) }) # @return [Column] a map from two array columns (keys, values). |
#csc(*cols) ⇒ Column
The Spark SQL csc function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#cume_dist ⇒ Column
The Spark SQL cume_dist function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_catalog ⇒ Column
The Spark SQL current_catalog function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_database ⇒ Column
The Spark SQL current_database function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_date ⇒ Column
The Spark SQL current_date function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_schema ⇒ Column
The Spark SQL current_schema function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_timestamp ⇒ Column
The Spark SQL current_timestamp function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_timezone ⇒ Column
The Spark SQL current_timezone function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#current_user ⇒ Column
The Spark SQL current_user function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#date_add(col, days) ⇒ Object
156 |
# File 'lib/spark_connect/functions.rb', line 156 def date_add(col, days) = Column.invoke("date_add", _col(col), lit(days)) |
#date_format(col, fmt) ⇒ Object
---- Date / time functions with literal arguments ---------------------
153 |
# File 'lib/spark_connect/functions.rb', line 153 def date_format(col, fmt) = Column.invoke("date_format", _col(col), lit(fmt)) |
#date_from_unix_date(*cols) ⇒ Column
The Spark SQL date_from_unix_date function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#date_sub(col, days) ⇒ Object
157 |
# File 'lib/spark_connect/functions.rb', line 157 def date_sub(col, days) = Column.invoke("date_sub", _col(col), lit(days)) |
#date_trunc(fmt, col) ⇒ Object
163 |
# File 'lib/spark_connect/functions.rb', line 163 def date_trunc(fmt, col) = Column.invoke("date_trunc", lit(fmt), _col(col)) |
#datediff(end_col, start_col) ⇒ Object
158 |
# File 'lib/spark_connect/functions.rb', line 158 def datediff(end_col, start_col) = Column.invoke("datediff", _col(end_col), _col(start_col)) |
#day(*cols) ⇒ Column
The Spark SQL day function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#dayofmonth(*cols) ⇒ Column
The Spark SQL dayofmonth function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#dayofweek(*cols) ⇒ Column
The Spark SQL dayofweek function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#dayofyear(*cols) ⇒ Column
The Spark SQL dayofyear function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#degrees(*cols) ⇒ Column
The Spark SQL degrees function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#dense_rank ⇒ Column
The Spark SQL dense_rank function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#desc(col) ⇒ Object
43 |
# File 'lib/spark_connect/functions.rb', line 43 def desc(col) = _col(col).desc |
#desc_nulls_first(col) ⇒ Object
46 |
# File 'lib/spark_connect/functions.rb', line 46 def desc_nulls_first(col) = _col(col).desc_nulls_first |
#desc_nulls_last(col) ⇒ Object
47 |
# File 'lib/spark_connect/functions.rb', line 47 def desc_nulls_last(col) = _col(col).desc_nulls_last |
#element_at(col, extraction) ⇒ Object
214 |
# File 'lib/spark_connect/functions.rb', line 214 def element_at(col, extraction) = Column.invoke("element_at", _col(col), lit(extraction)) |
#every(*cols) ⇒ Column
The Spark SQL every function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#exists(col, &block) ⇒ Object
246 |
# File 'lib/spark_connect/functions.rb', line 246 def exists(col, &block) = Column.invoke("exists", _col(col), _lambda(block)) |
#exp(*cols) ⇒ Column
The Spark SQL exp function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#explode(*cols) ⇒ Column
The Spark SQL explode function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#explode_outer(*cols) ⇒ Column
The Spark SQL explode_outer function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#expm1(*cols) ⇒ Column
The Spark SQL expm1 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#expr(sql) ⇒ Column
Parse a SQL expression string into a Column.
37 38 39 |
# File 'lib/spark_connect/functions.rb', line 37 def expr(sql) Column.from_expr(Proto::Expression.new(expression_string: Proto::Expression::ExpressionString.new(expression: sql))) end |
#factorial(*cols) ⇒ Column
The Spark SQL factorial function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#filter(col, &block) ⇒ Object
248 |
# File 'lib/spark_connect/functions.rb', line 248 def filter(col, &block) = Column.invoke("filter", _col(col), _lambda(block)) |
#first(*cols) ⇒ Column
The Spark SQL first function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#first_value(*cols) ⇒ Column
The Spark SQL first_value function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#flatten(*cols) ⇒ Column
The Spark SQL flatten function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#floor(*cols) ⇒ Column
The Spark SQL floor function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#forall(col, &block) ⇒ Object
247 |
# File 'lib/spark_connect/functions.rb', line 247 def forall(col, &block) = Column.invoke("forall", _col(col), _lambda(block)) |
#format_number(col, d) ⇒ Column
Returns number formatted to d decimal places.
111 112 |
# File 'lib/spark_connect/functions.rb', line 111 def format_number(col, d) = Column.invoke("format_number", _col(col), lit(d)) # @return [Column] substring of length `len` from 1-based `pos`. |
#format_string(fmt, *cols) ⇒ Column
Returns printf-style formatting using literal fmt.
109 110 |
# File 'lib/spark_connect/functions.rb', line 109 def format_string(fmt, *cols) = Column.invoke("format_string", lit(fmt), *cols.map { |c| _col(c) }) # @return [Column] number formatted to `d` decimal places. |
#from_json(col, schema, options = {}) ⇒ Object
180 181 182 183 184 |
# File 'lib/spark_connect/functions.rb', line 180 def from_json(col, schema, = {}) schema_col = schema.is_a?(Types::DataType) ? lit(schema.json) : lit(schema.to_s) args = [_col(col), schema_col] + .flat_map { |k, v| [lit(k.to_s), lit(v.to_s)] } Column.invoke("from_json", *args) end |
#from_unixtime(col, fmt = "yyyy-MM-dd HH:mm:ss") ⇒ Object
164 |
# File 'lib/spark_connect/functions.rb', line 164 def from_unixtime(col, fmt = "yyyy-MM-dd HH:mm:ss") = Column.invoke("from_unixtime", _col(col), lit(fmt)) |
#from_utc_timestamp(col, tz) ⇒ Object
170 |
# File 'lib/spark_connect/functions.rb', line 170 def (col, tz) = Column.invoke("from_utc_timestamp", _col(col), lit(tz)) |
#get_json_object(col, path) ⇒ Object
---- JSON / CSV --------------------------------------------------------
176 |
# File 'lib/spark_connect/functions.rb', line 176 def get_json_object(col, path) = Column.invoke("get_json_object", _col(col), lit(path)) |
#greatest(*cols) ⇒ Column
The Spark SQL greatest function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#grouping(*cols) ⇒ Column
The Spark SQL grouping function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#hash(*cols) ⇒ Column
The Spark SQL hash function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#hex(*cols) ⇒ Column
The Spark SQL hex function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#hour(*cols) ⇒ Column
The Spark SQL hour function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#hypot(*cols) ⇒ Column
The Spark SQL hypot function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#initcap(*cols) ⇒ Column
The Spark SQL initcap function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#inline(*cols) ⇒ Column
The Spark SQL inline function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#inline_outer(*cols) ⇒ Column
The Spark SQL inline_outer function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#input_file_block_length ⇒ Column
The Spark SQL input_file_block_length function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#input_file_block_start ⇒ Column
The Spark SQL input_file_block_start function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#input_file_name ⇒ Column
The Spark SQL input_file_name function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#instr(col, substr) ⇒ Column
Returns 1-based position of literal substr within col (0 if absent).
117 118 |
# File 'lib/spark_connect/functions.rb', line 117 def instr(col, substr) = Column.invoke("instr", _col(col), lit(substr)) # @return [Column] 1-based position of `substr` in `col` at/after `pos`. |
#isnan(*cols) ⇒ Column
The Spark SQL isnan function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#isnull(*cols) ⇒ Column
The Spark SQL isnull function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#json_tuple(col, *fields) ⇒ Object
177 |
# File 'lib/spark_connect/functions.rb', line 177 def json_tuple(col, *fields) = Column.invoke("json_tuple", _col(col), *fields.map { |f| lit(f) }) |
#kurtosis(*cols) ⇒ Column
The Spark SQL kurtosis function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lag(col, offset = 1, default = nil) ⇒ Object
---- Window / analytic functions --------------------------------------
225 |
# File 'lib/spark_connect/functions.rb', line 225 def lag(col, offset = 1, default = nil) = Column.invoke("lag", _col(col), lit(offset), lit(default)) |
#last(*cols) ⇒ Column
The Spark SQL last function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#last_day(*cols) ⇒ Column
The Spark SQL last_day function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#last_value(*cols) ⇒ Column
The Spark SQL last_value function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lcase(*cols) ⇒ Column
The Spark SQL lcase function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lead(col, offset = 1, default = nil) ⇒ Object
226 |
# File 'lib/spark_connect/functions.rb', line 226 def lead(col, offset = 1, default = nil) = Column.invoke("lead", _col(col), lit(offset), lit(default)) |
#least(*cols) ⇒ Column
The Spark SQL least function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#length(*cols) ⇒ Column
The Spark SQL length function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lit(value) ⇒ Column
A literal value column. See Column.lit for supported Ruby types.
33 |
# File 'lib/spark_connect/functions.rb', line 33 def lit(value) = Column.lit(value) |
#ln(*cols) ⇒ Column
The Spark SQL ln function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#locate(substr, col, pos = 1) ⇒ Column
Returns 1-based position of substr in col at/after pos.
119 120 |
# File 'lib/spark_connect/functions.rb', line 119 def locate(substr, col, pos = 1) = Column.invoke("locate", lit(substr), _col(col), lit(pos)) # @return [Column] left-padded string. |
#log(*cols) ⇒ Column
The Spark SQL log function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#log10(*cols) ⇒ Column
The Spark SQL log10 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#log1p(*cols) ⇒ Column
The Spark SQL log1p function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#log2(*cols) ⇒ Column
The Spark SQL log2 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lower(*cols) ⇒ Column
The Spark SQL lower function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#lpad(col, len, pad) ⇒ Column
Returns left-padded string.
121 122 |
# File 'lib/spark_connect/functions.rb', line 121 def lpad(col, len, pad) = Column.invoke("lpad", _col(col), lit(len), lit(pad)) # @return [Column] right-padded string. |
#ltrim(*cols) ⇒ Column
The Spark SQL ltrim function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#make_date(year, month, day) ⇒ Object
172 |
# File 'lib/spark_connect/functions.rb', line 172 def make_date(year, month, day) = Column.invoke("make_date", _col(year), _col(month), _col(day)) |
#map_concat(*cols) ⇒ Column
The Spark SQL map_concat function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_contains_key(col, key) ⇒ Object
221 |
# File 'lib/spark_connect/functions.rb', line 221 def map_contains_key(col, key) = Column.invoke("map_contains_key", _col(col), lit(key)) |
#map_entries(*cols) ⇒ Column
The Spark SQL map_entries function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_filter(col, &block) ⇒ Object
252 |
# File 'lib/spark_connect/functions.rb', line 252 def map_filter(col, &block) = Column.invoke("map_filter", _col(col), _lambda(block)) |
#map_from_arrays(keys, values) ⇒ Column
Returns a map from two array columns (keys, values).
100 101 |
# File 'lib/spark_connect/functions.rb', line 100 def map_from_arrays(keys, values) = Column.invoke("map_from_arrays", _col(keys), _col(values)) # @return [Column] a named struct from alternating name/value arguments. |
#map_from_entries(*cols) ⇒ Column
The Spark SQL map_from_entries function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_keys(*cols) ⇒ Column
The Spark SQL map_keys function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_values(*cols) ⇒ Column
The Spark SQL map_values function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#map_zip_with(c1, c2, &block) ⇒ Object
253 |
# File 'lib/spark_connect/functions.rb', line 253 def map_zip_with(c1, c2, &block) = Column.invoke("map_zip_with", _col(c1), _col(c2), _lambda(block)) |
#max(*cols) ⇒ Column
The Spark SQL max function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#max_by(*cols) ⇒ Column
The Spark SQL max_by function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#md5(*cols) ⇒ Column
The Spark SQL md5 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#mean(*cols) ⇒ Column
The Spark SQL mean function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#median(*cols) ⇒ Column
The Spark SQL median function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#min(*cols) ⇒ Column
The Spark SQL min function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#min_by(*cols) ⇒ Column
The Spark SQL min_by function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#minute(*cols) ⇒ Column
The Spark SQL minute function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#mode(*cols) ⇒ Column
The Spark SQL mode function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#monotonically_increasing_id ⇒ Column
The Spark SQL monotonically_increasing_id function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#month(*cols) ⇒ Column
The Spark SQL month function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#months_between(d1, d2, round_off = true) ⇒ Object
160 |
# File 'lib/spark_connect/functions.rb', line 160 def months_between(d1, d2, round_off = true) = Column.invoke("months_between", _col(d1), _col(d2), lit(round_off)) |
#named_struct(*cols) ⇒ Column
Returns a named struct from alternating name/value arguments.
102 |
# File 'lib/spark_connect/functions.rb', line 102 def named_struct(*cols) = Column.invoke("named_struct", *cols.map { |c| _col(c) }) |
#nanvl(col1, col2) ⇒ Column
Returns value if col is NaN else col.
89 |
# File 'lib/spark_connect/functions.rb', line 89 def nanvl(col1, col2) = Column.invoke("nanvl", _col(col1), _col(col2)) |
#negate(*cols) ⇒ Column
The Spark SQL negate function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#negative(*cols) ⇒ Column
The Spark SQL negative function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#next_day(col, day_of_week) ⇒ Object
161 |
# File 'lib/spark_connect/functions.rb', line 161 def next_day(col, day_of_week) = Column.invoke("next_day", _col(col), lit(day_of_week)) |
#now ⇒ Column
The Spark SQL now function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#nth_value(col, offset, ignore_nulls = false) ⇒ Object
228 |
# File 'lib/spark_connect/functions.rb', line 228 def nth_value(col, offset, ignore_nulls = false) = Column.invoke("nth_value", _col(col), lit(offset), lit(ignore_nulls)) |
#ntile(n) ⇒ Object
227 |
# File 'lib/spark_connect/functions.rb', line 227 def ntile(n) = Column.invoke("ntile", lit(n)) |
#octet_length(*cols) ⇒ Column
The Spark SQL octet_length function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#overlay(col, replace, pos, len = -1)) ⇒ Column
Returns overlay replace into col at pos for len chars.
141 142 |
# File 'lib/spark_connect/functions.rb', line 141 def (col, replace, pos, len = -1) = Column.invoke("overlay", _col(col), _col(replace), lit(pos), lit(len)) # @return [Column] SHA-2 hash with the given bit length (224/256/384/512). |
#percent_rank ⇒ Column
The Spark SQL percent_rank function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#pmod(*cols) ⇒ Column
The Spark SQL pmod function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#posexplode(*cols) ⇒ Column
The Spark SQL posexplode function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#posexplode_outer(*cols) ⇒ Column
The Spark SQL posexplode_outer function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#positive(*cols) ⇒ Column
The Spark SQL positive function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#pow(*cols) ⇒ Column
The Spark SQL pow function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#power(*cols) ⇒ Column
The Spark SQL power function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#product(*cols) ⇒ Column
The Spark SQL product function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#quarter(*cols) ⇒ Column
The Spark SQL quarter function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#radians(*cols) ⇒ Column
The Spark SQL radians function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#rand(seed = nil) ⇒ Object
---- Randomness --------------------------------------------------------
236 |
# File 'lib/spark_connect/functions.rb', line 236 def rand(seed = nil) = seed.nil? ? Column.invoke("rand") : Column.invoke("rand", lit(seed)) |
#randn(seed = nil) ⇒ Object
237 |
# File 'lib/spark_connect/functions.rb', line 237 def randn(seed = nil) = seed.nil? ? Column.invoke("randn") : Column.invoke("randn", lit(seed)) |
#rank ⇒ Column
The Spark SQL rank function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#regexp_count(col, pattern) ⇒ Object
138 |
# File 'lib/spark_connect/functions.rb', line 138 def regexp_count(col, pattern) = Column.invoke("regexp_count", _col(col), lit(pattern)) |
#regexp_extract(col, pattern, idx = 0) ⇒ Column
Returns the idx-th group of pattern matched in col.
131 132 |
# File 'lib/spark_connect/functions.rb', line 131 def regexp_extract(col, pattern, idx = 0) = Column.invoke("regexp_extract", _col(col), lit(pattern), lit(idx)) # @return [Column] all matches of group `idx` of `pattern`. |
#regexp_extract_all(col, pattern, idx = 1) ⇒ Column
Returns all matches of group idx of pattern.
133 134 |
# File 'lib/spark_connect/functions.rb', line 133 def regexp_extract_all(col, pattern, idx = 1) = Column.invoke("regexp_extract_all", _col(col), lit(pattern), lit(idx)) # @return [Column] `col` with `pattern` replaced by `replacement`. |
#regexp_like(col, pattern) ⇒ Column
Returns whether col matches pattern.
137 |
# File 'lib/spark_connect/functions.rb', line 137 def regexp_like(col, pattern) = Column.invoke("regexp_like", _col(col), lit(pattern)) |
#regexp_replace(col, pattern, replacement) ⇒ Column
Returns col with pattern replaced by replacement.
135 136 |
# File 'lib/spark_connect/functions.rb', line 135 def regexp_replace(col, pattern, replacement) = Column.invoke("regexp_replace", _col(col), lit(pattern), lit(replacement)) # @return [Column] whether `col` matches `pattern`. |
#regexp_substr(col, pattern) ⇒ Object
139 140 |
# File 'lib/spark_connect/functions.rb', line 139 def regexp_substr(col, pattern) = Column.invoke("regexp_substr", _col(col), lit(pattern)) # @return [Column] overlay `replace` into `col` at `pos` for `len` chars. |
#repeat(col, n) ⇒ Column
Returns the string repeated n times.
125 126 |
# File 'lib/spark_connect/functions.rb', line 125 def repeat(col, n) = Column.invoke("repeat", _col(col), lit(n)) # @return [Column] split `col` by the literal regex `pattern`. |
#reverse(*cols) ⇒ Column
The Spark SQL reverse function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#rint(*cols) ⇒ Column
The Spark SQL rint function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#round(col, scale = 0) ⇒ Column
Returns HALF_UP rounding to scale decimal places.
80 81 |
# File 'lib/spark_connect/functions.rb', line 80 def round(col, scale = 0) = Column.invoke("round", _col(col), lit(scale)) # @return [Column] HALF_EVEN ("banker's") rounding to `scale` places. |
#row_number ⇒ Column
The Spark SQL row_number function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#rpad(col, len, pad) ⇒ Column
Returns right-padded string.
123 124 |
# File 'lib/spark_connect/functions.rb', line 123 def rpad(col, len, pad) = Column.invoke("rpad", _col(col), lit(len), lit(pad)) # @return [Column] the string repeated `n` times. |
#rtrim(*cols) ⇒ Column
The Spark SQL rtrim function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#schema_of_json(json, options = {}) ⇒ Object
191 192 193 |
# File 'lib/spark_connect/functions.rb', line 191 def schema_of_json(json, = {}) Column.invoke("schema_of_json", _lit_or_col(json), *.flat_map { |k, v| [lit(k.to_s), lit(v.to_s)] }) end |
#sec(*cols) ⇒ Column
The Spark SQL sec function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#second(*cols) ⇒ Column
The Spark SQL second function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sequence(start, stop, step = nil) ⇒ Object
217 218 219 |
# File 'lib/spark_connect/functions.rb', line 217 def sequence(start, stop, step = nil) step.nil? ? Column.invoke("sequence", _col(start), _col(stop)) : Column.invoke("sequence", _col(start), _col(stop), _col(step)) end |
#sha(*cols) ⇒ Column
The Spark SQL sha function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sha1(*cols) ⇒ Column
The Spark SQL sha1 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sha2(col, num_bits) ⇒ Column
Returns SHA-2 hash with the given bit length (224/256/384/512).
143 144 |
# File 'lib/spark_connect/functions.rb', line 143 def sha2(col, num_bits) = Column.invoke("sha2", _col(col), lit(num_bits)) # @return [Column] convert a number string from `from_base` to `to_base`. |
#shiftleft(col, num_bits) ⇒ Column
Returns left shift / right shift by literal bit counts.
147 |
# File 'lib/spark_connect/functions.rb', line 147 def shiftleft(col, num_bits) = Column.invoke("shiftleft", _col(col), lit(num_bits)) |
#shiftright(col, num_bits) ⇒ Object
148 |
# File 'lib/spark_connect/functions.rb', line 148 def shiftright(col, num_bits) = Column.invoke("shiftright", _col(col), lit(num_bits)) |
#shiftrightunsigned(col, num_bits) ⇒ Object
149 |
# File 'lib/spark_connect/functions.rb', line 149 def shiftrightunsigned(col, num_bits) = Column.invoke("shiftrightunsigned", _col(col), lit(num_bits)) |
#shuffle(*cols) ⇒ Column
The Spark SQL shuffle function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#signum(*cols) ⇒ Column
The Spark SQL signum function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sin(*cols) ⇒ Column
The Spark SQL sin function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sinh(*cols) ⇒ Column
The Spark SQL sinh function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#size(*cols) ⇒ Column
The Spark SQL size function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#skewness(*cols) ⇒ Column
The Spark SQL skewness function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#slice(col, start, length) ⇒ Object
215 |
# File 'lib/spark_connect/functions.rb', line 215 def slice(col, start, length) = Column.invoke("slice", _col(col), _lit_or_col(start), _lit_or_col(length)) |
#some(*cols) ⇒ Column
The Spark SQL some function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sort_array(col, asc = true) ⇒ Object
---- Sorting helpers ---------------------------------------------------
232 |
# File 'lib/spark_connect/functions.rb', line 232 def sort_array(col, asc = true) = Column.invoke("sort_array", _col(col), lit(asc)) |
#soundex(*cols) ⇒ Column
The Spark SQL soundex function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#spark_partition_id ⇒ Column
The Spark SQL spark_partition_id function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#split(col, pattern, limit = -1)) ⇒ Column
Returns split col by the literal regex pattern.
127 128 |
# File 'lib/spark_connect/functions.rb', line 127 def split(col, pattern, limit = -1) = Column.invoke("split", _col(col), lit(pattern), lit(limit)) # @return [Column] characters of `col` matching `matching` replaced per `replace`. |
#sqrt(*cols) ⇒ Column
The Spark SQL sqrt function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#stddev(*cols) ⇒ Column
The Spark SQL stddev function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#stddev_pop(*cols) ⇒ Column
The Spark SQL stddev_pop function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#stddev_samp(*cols) ⇒ Column
The Spark SQL stddev_samp function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#struct(*cols) ⇒ Column
Returns a struct from the given columns.
94 95 |
# File 'lib/spark_connect/functions.rb', line 94 def struct(*cols) = Column.invoke("struct", *cols.map { |c| _col(c) }) # @return [Column] an array from the given columns. |
#substring(col, pos, len) ⇒ Column
Returns substring of length len from 1-based pos.
113 114 |
# File 'lib/spark_connect/functions.rb', line 113 def substring(col, pos, len) = Column.invoke("substring", _col(col), lit(pos), lit(len)) # @return [Column] substring before the `count`-th occurrence of `delim`. |
#substring_index(col, delim, count) ⇒ Column
Returns substring before the count-th occurrence of delim.
115 116 |
# File 'lib/spark_connect/functions.rb', line 115 def substring_index(col, delim, count) = Column.invoke("substring_index", _col(col), lit(delim), lit(count)) # @return [Column] 1-based position of literal `substr` within `col` (0 if absent). |
#sum(*cols) ⇒ Column
The Spark SQL sum function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#sum_distinct(col) ⇒ Column
Returns sum of distinct values.
75 |
# File 'lib/spark_connect/functions.rb', line 75 def sum_distinct(col) = Column.invoke("sum", _col(col), is_distinct: true) |
#tan(*cols) ⇒ Column
The Spark SQL tan function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#tanh(*cols) ⇒ Column
The Spark SQL tanh function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#timestamp_micros(*cols) ⇒ Column
The Spark SQL timestamp_micros function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#timestamp_millis(*cols) ⇒ Column
The Spark SQL timestamp_millis function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#timestamp_seconds(*cols) ⇒ Column
The Spark SQL timestamp_seconds function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#to_date(col, fmt = nil) ⇒ Object
154 |
# File 'lib/spark_connect/functions.rb', line 154 def to_date(col, fmt = nil) = fmt ? Column.invoke("to_date", _col(col), lit(fmt)) : Column.invoke("to_date", _col(col)) |
#to_json(col, options = {}) ⇒ Object
186 187 188 189 |
# File 'lib/spark_connect/functions.rb', line 186 def to_json(col, = {}) args = [_col(col)] + .flat_map { |k, v| [lit(k.to_s), lit(v.to_s)] } Column.invoke("to_json", *args) end |
#to_timestamp(col, fmt = nil) ⇒ Object
155 |
# File 'lib/spark_connect/functions.rb', line 155 def (col, fmt = nil) = fmt ? Column.invoke("to_timestamp", _col(col), lit(fmt)) : Column.invoke("to_timestamp", _col(col)) |
#to_utc_timestamp(col, tz) ⇒ Object
171 |
# File 'lib/spark_connect/functions.rb', line 171 def (col, tz) = Column.invoke("to_utc_timestamp", _col(col), lit(tz)) |
#transform(col) {|element| ... } ⇒ Column
245 |
# File 'lib/spark_connect/functions.rb', line 245 def transform(col, &block) = Column.invoke("transform", _col(col), _lambda(block)) |
#transform_keys(col, &block) ⇒ Object
250 |
# File 'lib/spark_connect/functions.rb', line 250 def transform_keys(col, &block) = Column.invoke("transform_keys", _col(col), _lambda(block)) |
#transform_values(col, &block) ⇒ Object
251 |
# File 'lib/spark_connect/functions.rb', line 251 def transform_values(col, &block) = Column.invoke("transform_values", _col(col), _lambda(block)) |
#translate(col, matching, replace) ⇒ Column
Returns characters of col matching matching replaced per replace.
129 130 |
# File 'lib/spark_connect/functions.rb', line 129 def translate(col, matching, replace) = Column.invoke("translate", _col(col), lit(matching), lit(replace)) # @return [Column] the `idx`-th group of `pattern` matched in `col`. |
#trim(*cols) ⇒ Column
The Spark SQL trim function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#trunc(col, fmt) ⇒ Object
162 |
# File 'lib/spark_connect/functions.rb', line 162 def trunc(col, fmt) = Column.invoke("trunc", _col(col), lit(fmt)) |
#typeof(*cols) ⇒ Column
The Spark SQL typeof function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#ucase(*cols) ⇒ Column
The Spark SQL ucase function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#udf ⇒ Object
UDFs require a server-side execution environment (Python/Scala) and are not supported by the pure-Ruby client.
273 274 275 |
# File 'lib/spark_connect/functions.rb', line 273 def udf(*) raise NotImplementedError, "User-defined functions are not supported by the Ruby Spark Connect client" end |
#unbase64(*cols) ⇒ Column
The Spark SQL unbase64 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unhex(*cols) ⇒ Column
The Spark SQL unhex function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_date(*cols) ⇒ Column
The Spark SQL unix_date function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_micros(*cols) ⇒ Column
The Spark SQL unix_micros function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_millis(*cols) ⇒ Column
The Spark SQL unix_millis function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_seconds(*cols) ⇒ Column
The Spark SQL unix_seconds function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#unix_timestamp(col = nil, fmt = "yyyy-MM-dd HH:mm:ss") ⇒ Object
166 167 168 |
# File 'lib/spark_connect/functions.rb', line 166 def (col = nil, fmt = "yyyy-MM-dd HH:mm:ss") col.nil? ? Column.invoke("unix_timestamp") : Column.invoke("unix_timestamp", _col(col), lit(fmt)) end |
#upper(*cols) ⇒ Column
The Spark SQL upper function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#uuid ⇒ Column
The Spark SQL uuid function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#var_pop(*cols) ⇒ Column
The Spark SQL var_pop function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#var_samp(*cols) ⇒ Column
The Spark SQL var_samp function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#variance(*cols) ⇒ Column
The Spark SQL variance function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#version ⇒ Column
The Spark SQL version function (takes no arguments).
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#weekday(*cols) ⇒ Column
The Spark SQL weekday function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#weekofyear(*cols) ⇒ Column
The Spark SQL weekofyear function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#when(condition, value) ⇒ Column
Start a CASE WHEN expression. Chain Column#when / Column#otherwise.
51 52 53 |
# File 'lib/spark_connect/functions.rb', line 51 def when(condition, value) Column.invoke("when", condition, value) end |
#xxhash64(*cols) ⇒ Column
The Spark SQL xxhash64 function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |
#year(*cols) ⇒ Column
The Spark SQL year function. String arguments are treated as column names.
822 823 824 825 826 827 828 829 830 831 832 833 834 835 836 837 838 839 840 |
# File 'lib/spark_connect/functions.rb', line 822 UNIFORM = %w[ sum avg mean max min first last stddev stddev_samp stddev_pop variance var_samp var_pop skewness kurtosis collect_list collect_set first_value last_value max_by min_by corr covar_pop covar_samp median mode any_value every some bit_and bit_or bit_xor bool_and bool_or product count_if grouping abs acos acosh asin asinh atan atanh atan2 bin cbrt ceil ceiling cos cosh cot csc degrees exp expm1 factorial floor hypot ln log log2 log10 log1p negative negate positive pow power radians rint sec signum sin sinh sqrt tan tanh hex unhex pmod isnan isnull positive upper lower ltrim rtrim trim length char_length character_length octet_length bit_length reverse ascii base64 unbase64 initcap soundex crc32 md5 sha1 sha ucase lcase size cardinality array_distinct array_max array_min array_compact flatten explode explode_outer posexplode posexplode_outer inline inline_outer map_keys map_values map_entries map_from_entries array_sort shuffle arrays_zip map_concat concat greatest least hash xxhash64 array_union array_intersect array_except arrays_overlap year quarter month dayofmonth day dayofweek dayofyear hour minute second weekofyear last_day weekday unix_date unix_micros unix_millis unix_seconds timestamp_seconds timestamp_millis timestamp_micros date_from_unix_date bitwise_not bit_count typeof ].uniq.freeze |