Installation
This page covers everything you need to get spark-connect running: the
prerequisites, installing the gem, verifying the install, and bringing up a
local Spark Connect server to connect to.
Prerequisites
Ruby >= 3.1
The gem requires Ruby 3.1 or newer (it is tested against 3.1, 3.2, 3.3, and 3.4). Check your version:
ruby --version
Apache Arrow GLib system libraries
spark-connect decodes query results using
red-arrow, which is a binding over the
Apache Arrow C++ / GLib libraries. Those native libraries must be installed
on your system before bundle install, and the installed Arrow version must
match the red-arrow gem version.
macOS (Homebrew)
brew install apache-arrow apache-arrow-glib
Ubuntu / Debian
These steps mirror the project’s CI (.github/workflows/ci.yml): they add the
official Apache Arrow APT repository and then install the GLib development
packages.
sudo apt-get update
sudo apt-get install -y -V ca-certificates lsb-release wget
wget "https://apache.jfrog.io/artifactory/arrow/$(lsb_release --id --short | tr 'A-Z' 'a-z')/apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb"
sudo apt-get install -y -V "./apache-arrow-apt-source-latest-$(lsb_release --codename --short).deb"
sudo apt-get update
sudo apt-get install -y -V libarrow-glib-dev libparquet-glib-dev
Matching versions matters.
red-arrowmust be the same major version as the Arrow GLib libraries it loads. If you install a specific Arrow version locally, pinred-arrowto match. The project’s developmentGemfiledemonstrates this with aSPARK_CONNECT_RED_ARROW_VERSIONoverride, for exampleSPARK_CONNECT_RED_ARROW_VERSION=22.0.0.
Installing the gem
Directly
gem install spark-connect
With Bundler
Add it to your Gemfile:
gem "spark-connect"
Then install:
bundle install
Verifying the install
A quick smoke test that the gem and its native Arrow dependency load:
require "spark-connect"
puts SparkConnect::VERSION # => the gem version, e.g. "0.1.0"
puts SparkConnect::SPARK_VERSION # => the Spark Connect protocol line, e.g. "4.1.0"
If require "spark-connect" raises a load error mentioning Arrow or GLib, the
native libraries above are missing or do not match the red-arrow gem version.
Starting a local Spark Connect server
To run anything beyond loading the gem you need a Spark Connect server to talk to. You do not need Spark on your client app machine – only wherever the server runs.
-
Download a Spark distribution (3.4+; this client is tested against 3.5.x and 4.x). Pick a release from the Spark downloads page, or fetch one directly:
mkdir -p ~/spark curl -fsSL "https://archive.apache.org/dist/spark/spark-4.0.0/spark-4.0.0-bin-hadoop3.tgz" \ | tar xz -C ~/spark --strip-components=1 -
Start the Connect server. The
sbin/start-connect-server.shscript launches the gRPC endpoint, by default on port 15002. Pass the matching Spark Connect package for your Spark version:~/spark/sbin/start-connect-server.sh \ --packages "org.apache.spark:spark-connect_2.13:4.1.0" \ --conf spark.log.level=WARN -
Confirm it is listening on the gRPC port:
bash -c "</dev/tcp/localhost/15002" && echo "Spark Connect is up"
To stop it later:
~/spark/sbin/stop-connect-server.sh
You are now ready to connect. Continue to Getting started.