Class: SparkConnect::DataFrameNaFunctions
- Inherits:
-
Object
- Object
- SparkConnect::DataFrameNaFunctions
- Defined in:
- lib/spark_connect/na_functions.rb
Overview
Missing-data helpers, returned by SparkConnect::DataFrame#na. Mirrors PySpark's
DataFrame.na (DataFrameNaFunctions).
Constant Summary collapse
- Proto =
SparkConnect::Proto
Instance Method Summary collapse
-
#drop(how: :any, thresh: nil, subset: nil) ⇒ DataFrame
Drop rows containing null values.
-
#fill(value, subset: nil) ⇒ DataFrame
Replace null values.
-
#initialize(df) ⇒ DataFrameNaFunctions
constructor
A new instance of DataFrameNaFunctions.
-
#replace(to_replace, value = nil, subset: nil) ⇒ DataFrame
Replace specific values with others.
Constructor Details
#initialize(df) ⇒ DataFrameNaFunctions
Returns a new instance of DataFrameNaFunctions.
16 17 18 |
# File 'lib/spark_connect/na_functions.rb', line 16 def initialize(df) @df = df end |
Instance Method Details
#drop(how: :any, thresh: nil, subset: nil) ⇒ DataFrame
Drop rows containing null values.
27 28 29 30 31 32 33 34 35 36 37 |
# File 'lib/spark_connect/na_functions.rb', line 27 def drop(how: :any, thresh: nil, subset: nil) cols = Array(subset).map(&:to_s) min_non_nulls = thresh || (if how.to_sym == :all 1 else (cols.empty? ? nil : cols.size) end) nd = Proto::NADrop.new(input: @df.relation, cols: cols) nd.min_non_nulls = min_non_nulls if min_non_nulls @df.build(drop_na: nd) end |
#fill(value, subset: nil) ⇒ DataFrame #fill(value_map) ⇒ DataFrame
Replace null values.
46 47 48 49 50 51 52 53 54 55 56 57 |
# File 'lib/spark_connect/na_functions.rb', line 46 def fill(value, subset: nil) cols, values = if value.is_a?(Hash) [value.keys.map(&:to_s), value.values] else [Array(subset).map(&:to_s), Array(subset).empty? ? [value] : Array(subset).map { value }] end nf = Proto::NAFill.new( input: @df.relation, cols: cols, values: values.map { |v| na_literal(v) } ) @df.build(fill_na: nf) end |
#replace(to_replace, value = nil, subset: nil) ⇒ DataFrame
Replace specific values with others.
67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
# File 'lib/spark_connect/na_functions.rb', line 67 def replace(to_replace, value = nil, subset: nil) mapping = if to_replace.is_a?(Hash) to_replace else Array(to_replace).zip(Array(value)).to_h end replacements = mapping.map do |old, new_value| Proto::NAReplace::Replacement.new( old_value: na_literal(old), new_value: na_literal(new_value) ) end nr = Proto::NAReplace.new( input: @df.relation, cols: Array(subset).map(&:to_s), replacements: replacements ) @df.build(replace: nr) end |