Class: SparkConnect::GroupedData
- Inherits:
-
Object
- Object
- SparkConnect::GroupedData
- Defined in:
- lib/spark_connect/grouped_data.rb
Overview
The result of DataFrame#group_by / DataFrame#rollup / DataFrame#cube. Call an aggregate (#agg, #count, #sum, #avg, #max, #min, ...) to produce a new DataFrame, optionally after #pivot.
Constant Summary collapse
- Proto =
SparkConnect::Proto
Instance Method Summary collapse
-
#agg(*exprs) ⇒ DataFrame
Compute aggregate expressions.
-
#avg(*cols) ⇒ DataFrame
(also: #mean)
Mean of each numeric column.
-
#count ⇒ DataFrame
Count rows per group.
-
#initialize(df, grouping, group_type, pivot_col: nil, pivot_values: nil) ⇒ GroupedData
constructor
A new instance of GroupedData.
-
#max(*cols) ⇒ DataFrame
Maximum of each column.
-
#min(*cols) ⇒ DataFrame
Minimum of each column.
-
#pivot(pivot_col, values = nil) ⇒ GroupedData
Pivot a column into multiple output columns.
-
#sum(*cols) ⇒ DataFrame
Sum of each numeric column (or all numeric columns when none given).
Constructor Details
#initialize(df, grouping, group_type, pivot_col: nil, pivot_values: nil) ⇒ GroupedData
Returns a new instance of GroupedData.
19 20 21 22 23 24 25 |
# File 'lib/spark_connect/grouped_data.rb', line 19 def initialize(df, grouping, group_type, pivot_col: nil, pivot_values: nil) @df = df @grouping = grouping @group_type = group_type @pivot_col = pivot_col @pivot_values = pivot_values end |
Instance Method Details
#agg(*columns) ⇒ DataFrame #agg(hash) ⇒ DataFrame
Compute aggregate expressions.
35 36 37 38 39 40 41 42 43 |
# File 'lib/spark_connect/grouped_data.rb', line 35 def agg(*exprs) agg_exprs = if exprs.size == 1 && exprs.first.is_a?(Hash) exprs.first.map { |col, fn| Column.invoke(fn.to_s, Functions.col(col.to_s)).to_expr } else exprs.flatten.map { |c| Column.to_col(c).to_expr } end build(agg_exprs) end |
#avg(*cols) ⇒ DataFrame Also known as: mean
Mean of each numeric column.
57 |
# File 'lib/spark_connect/grouped_data.rb', line 57 def avg(*cols) = numeric_agg("avg", cols) |
#count ⇒ DataFrame
Count rows per group.
47 48 49 |
# File 'lib/spark_connect/grouped_data.rb', line 47 def count build([Column.invoke("count", Column.lit(1)).alias("count").to_expr]) end |
#max(*cols) ⇒ DataFrame
Maximum of each column.
62 |
# File 'lib/spark_connect/grouped_data.rb', line 62 def max(*cols) = numeric_agg("max", cols) |
#min(*cols) ⇒ DataFrame
Minimum of each column.
66 |
# File 'lib/spark_connect/grouped_data.rb', line 66 def min(*cols) = numeric_agg("min", cols) |
#pivot(pivot_col, values = nil) ⇒ GroupedData
Pivot a column into multiple output columns.
73 74 75 76 77 |
# File 'lib/spark_connect/grouped_data.rb', line 73 def pivot(pivot_col, values = nil) GroupedData.new(@df, @grouping, :GROUP_TYPE_PIVOT, pivot_col: Column.to_col(pivot_col.is_a?(String) ? Functions.col(pivot_col) : pivot_col), pivot_values: values) end |
#sum(*cols) ⇒ DataFrame
Sum of each numeric column (or all numeric columns when none given).
53 |
# File 'lib/spark_connect/grouped_data.rb', line 53 def sum(*cols) = numeric_agg("sum", cols) |