Class: SparkConnect::Observation

Inherits:
Object
  • Object
show all
Defined in:
lib/spark_connect/observation.rb

Overview

Captures named aggregate metrics computed while a DataFrame is being materialised, without an extra pass over the data. Pair with DataFrame#observe.

Examples:

obs = SparkConnect::Observation.new("metrics")
df.observe(obs, F.count(F.lit(1)).alias("rows"), F.max("id").alias("max_id")).collect
obs.get  #=> {"rows"=>100, "max_id"=>99}

Class Attribute Summary collapse

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(name = nil) ⇒ Observation

Returns a new instance of Observation.

Parameters:

  • name (String, nil) (defaults to: nil)

    a unique name (auto-generated when omitted).



23
24
25
26
27
# File 'lib/spark_connect/observation.rb', line 23

def initialize(name = nil)
  Observation.counter += 1
  @name = name || "observation_#{Observation.counter}"
  @df = nil
end

Class Attribute Details

.counterObject

This method is part of a private API. You should avoid using this method if possible, as it may be removed or be changed in the future.



19
20
21
# File 'lib/spark_connect/observation.rb', line 19

def counter
  @counter
end

Instance Attribute Details

#nameString (readonly)

Returns the observation name.

Returns:

  • (String)

    the observation name.



14
15
16
# File 'lib/spark_connect/observation.rb', line 14

def name
  @name
end

Instance Method Details

#bind(df) ⇒ Object



30
31
32
33
# File 'lib/spark_connect/observation.rb', line 30

def bind(df)
  @df = df
  self
end

#getHash{String=>Object}

The observed metric values (forces execution if not yet materialised).

Returns:

  • (Hash{String=>Object})

Raises:



38
39
40
41
42
# File 'lib/spark_connect/observation.rb', line 38

def get
  raise IllegalArgumentError, "Observation has not been attached to a DataFrame yet" unless @df

  @metrics ||= fetch_metrics
end