Module: SparkConnect

Defined in:
lib/spark_connect.rb,
lib/spark_connect/row.rb,
lib/spark_connect/conf.rb,
lib/spark_connect/plan.rb,
lib/spark_connect/arrow.rb,
lib/spark_connect/types.rb,
lib/spark_connect/client.rb,
lib/spark_connect/column.rb,
lib/spark_connect/errors.rb,
lib/spark_connect/reader.rb,
lib/spark_connect/window.rb,
lib/spark_connect/writer.rb,
lib/spark_connect/catalog.rb,
lib/spark_connect/session.rb,
lib/spark_connect/version.rb,
lib/spark_connect/functions.rb,
lib/spark_connect/pipelines.rb,
lib/spark_connect/streaming.rb,
lib/spark_connect/data_frame.rb,
lib/spark_connect/observation.rb,
lib/spark_connect/grouped_data.rb,
lib/spark_connect/na_functions.rb,
lib/spark_connect/stat_functions.rb,
lib/spark_connect/channel_builder.rb

Overview

spark-connect is a pure-Ruby client for Apache Spark Connect, the gRPC-based decoupled client-server protocol for Apache Spark.

The public surface mirrors PySpark closely: a SparkSession is the entry point, DataFrame is the lazy, immutable relation builder, Column represents column expressions, and Functions (aliased as F) provides the standard function library.

Examples:

Connect and run a query

require "spark-connect"

spark = SparkConnect::SparkSession.builder
                                  .remote("sc://localhost:15002")
                                  .get_or_create
df = spark.range(10).select(SparkConnect::F.col("id") * 2)
df.show
spark.stop

Defined Under Namespace

Modules: ArrowConverter, Functions, PlanBuilder, Types, Window Classes: AnalysisError, Catalog, ChannelBuilder, Column, ConnectionError, DataFrame, DataFrameNaFunctions, DataFrameReader, DataFrameStatFunctions, DataFrameWriter, DataFrameWriterV2, DataStreamReader, DataStreamWriter, Error, GroupedData, IllegalArgumentError, NotImplementedError, Observation, OperationInterruptedError, ParseError, Pipeline, PipelineEvent, RetriesExceededError, Row, RuntimeConfig, SparkConnectClient, SparkConnectError, SparkSession, StreamingQuery, StreamingQueryManager, WindowSpec

Constant Summary collapse

VERSION =

The released version of the spark-connect gem.

"0.2.0"
SPARK_VERSION =

The Apache Spark version whose Spark Connect protocol definitions this client is generated against. The client aims to be wire-compatible with Spark Connect servers of this major/minor line and newer.

"4.1.0"
F =

Short alias for Functions: SparkConnect::F.col("x").

Functions

Class Method Summary collapse

Class Method Details

.builderSparkConnect::SparkSession::Builder

Convenience shortcut for SparkConnect::SparkSession.builder.



54
55
56
# File 'lib/spark_connect.rb', line 54

def builder
  SparkSession.builder
end