Packaging & release¶
How the pyspark_connect_web wheel is built, how it is installed in the browser
via micropip, and the release checklist.
What ships¶
The distributable is a pure-Python wheel: pyspark_connect_web-<version>-py3-none-any.whl.
- No compiled extensions - it must import under Pyodide/WASM, so it is
py3-none-anyand depends on nothing native. In particular it does not depend ongrpcio-dependencies = []inpyproject.toml, andpyspark/pyarrow/pandas/protobufcome from the Pyodide environment. - The JS glue (
worker/*.js,jupyterlite/*) ships inside the wheel as package data so the JupyterLite build can reference it.
py.typed (PEP 561)¶
The package is fully type-hinted (typed Protocols in _contract.py, etc.) but
does not yet ship a py.typed marker, so downstream type-checkers treat it
as untyped. To publish the type information:
- Add an empty
pyspark_connect_web/py.typed(PEP 561 marker). - Ensure it is packaged. With setuptools +
[tool.setuptools.packages.find]this needs package-data inclusion, e.g. inpyproject.toml:
pyspark_connect_web/ is owned by the components / the integrator; the does not
add the marker unilaterally. This is flagged in CONTRIBUTING.md for the owner
to land. (Without it, the JS/notebook package data above should still be
declared so the wheel is complete - confirm with the integrator.)
Build the wheel¶
# in a dev venv (the browser does NOT use this venv)
python -m pip install build==1.2.2
python -m build --wheel --outdir dist # -> dist/pyspark_connect_web-*.whl
# or: make wheel
Validate the wheel is importable + grpcio-free before publishing:
python -m pip install dist/pyspark_connect_web-*.whl
python -c "import pyspark_connect_web; print(pyspark_connect_web.__name__)"
# grpcio must NOT be a transitive dependency:
python - <<'PY'
import importlib.metadata as m
reqs = m.requires("pyspark-connect-web") or []
assert not any("grpcio" in r for r in reqs), reqs
print("OK: no grpcio in wheel requirements")
PY
Install in the browser (Pyodide / JupyterLite)¶
In the Pyodide worker (see worker/worker_bootstrap.js), micropip installs the
wheel by URL alongside the pinned runtime deps:
import micropip
await micropip.install("protobuf>=7")
await micropip.install("googleapis-common-protos>=1.56.4")
# Slim Spark Connect client; deps=False (grpcio/grpcio-status are shimmed).
await micropip.install("https://<your-lite-origin>/pyspark_client-4.1.2-py3-none-any.whl", deps=False)
await micropip.install("https://<your-lite-origin>/pyspark_connect_web-<version>-py3-none-any.whl")
scripts/build_site.sh copies the freshly built wheel into the JupyterLite
output root so it is served from the same (cross-origin-isolated) origin as the
page - important under COEP credentialless (the worker cannot import a
cross-origin CDN wheel, so it must be served same-origin). worker_bootstrap.js
reads the wheel URL from self.PCW_WHEEL_URL (default: the wheel served at the
site root).
Then, in a notebook cell:
import pyspark_connect_web as pcw
pcw.install() # idempotent; monkey-patches the Connect stub
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://<host>:8081/;transport=grpcweb").getOrCreate()
Version pins (the load-bearing ones)¶
| Thing | Pin | Why |
|---|---|---|
pyspark (browser + dev) |
>=4.0 |
- reattachable execute present; install() raises outside the range |
| Pyodide | >=0.28 / Python 3.13 |
CONTRIBUTING.md; provides pyarrow>=22, pandas, numpy, protobuf>=7 |
build |
==1.2.2 |
wheel build (scripts/build_site.sh, Makefile) |
jupyterlite-core |
==0.6.4 |
jupyter lite CLI (scripts/build_site.sh) |
jupyterlite-pyodide-kernel |
==0.6.1 |
Pyodide kernel for the lite site |
| Envoy | envoyproxy/envoy:v1.31-latest |
grpc_web filter + v3 HttpProtocolOptions |
| Spark Connect server | apache/spark:4.1.2 |
matches the pyspark range |
Release checklist¶
Pre-release:
- [ ]
make testgreen (unit; no browser, no grpcio). - [ ]
make validate-deploygreen (YAML parse + COOP/COEP + no wildcard prod CORS). - [ ]
make lintgreen (ruff check + format). - [ ] Bump
versioninpyproject.toml(drop the.devNsuffix for a real release). - [ ]
appVersioninjupyterlite/jupyter-lite.jsonandPCW_WHEEL_URLdefault inworker/worker_bootstrap.jsreference the same version. - [ ]
make wheel+ the wheel-import / no-grpcio check above pass. - [ ]
make sitebuilds_outputwith the wheel +_headerspresent. - [ ] e2e against a live stack:
E2E_REQUIRE_STACK=1 make e2egreen (the full the design notes v0 matrix). Skip-only runs do not count as a release gate.
Publish:
- [ ] Tag the release (
vX.Y.Z); CI builds the wheel on the tag. - [ ] (If publishing to an index)
twine check dist/*then upload. The package name on the index ispyspark-connect-web; the import name ispyspark_connect_web. - [ ] Publish the built
_outputsite to the cross-origin-isolated host (Envoy/static, or a Pages host that honours_headers). - [ ] Smoke-test the published site: open it, confirm
crossOriginIsolated === true, run the demo notebook end to end against a reachable Connect server.
Post-release:
- [ ] Update
README.mdstatus if the milestone changed. - [ ] File a `the project notes note for anything that surprised you.