JupyterLite hosting¶
The JupyterLite site runs the PySpark Connect client in Pyodide. The single hard
requirement for any host is cross-origin isolation - without it,
SharedArrayBuffer is unavailable and the blocking .collect() bridge cannot
work.
Why cross-origin isolation is mandatory¶
SharedArrayBuffer and Atomics.wait (the backbone of the blocking bridge) only
exist when the page is cross-origin isolated, which requires it be served
with:
The demo notebook asserts crossOriginIsolated === true before importing, so a
misconfigured host fails loud and early instead of hanging on a non-shared
buffer. This is a hard invariant (the design notes #4).
Hosting matrix - which host needs what¶
| Host | Can set headers? | What to do |
|---|---|---|
Envoy / docker compose (local e2e) |
yes | Serves COOP/COEP directly (deploy/). Nothing extra. |
| Netlify / Cloudflare Pages | yes (via _headers) |
Ship _headers (already provided). No service worker needed. |
| GitHub Pages | no | Use coi-serviceworker.js - include it as a <script> before everything; it injects COOP/COEP via a service worker and reloads once so crossOriginIsolated becomes true. |
python -m http.server (dev) |
no | Use the serve_coi.py snippet below, or coi-serviceworker.js. |
Two different GitHub Pages sites
This documentation site is built with MkDocs and deployed to GitHub Pages by
.github/workflows/docs.yml - that is a plain static docs site and needs no
isolation headers. The JupyterLite app is a separate artifact; if you host
it on GitHub Pages, it needs the coi-serviceworker.js shim described above
because GitHub Pages cannot set COOP/COEP headers.
COEP caveat (all isolated hosts)¶
We serve Cross-Origin-Embedder-Policy: credentialless, which keeps the page
isolated while letting the cross-origin grpc-web fetch to Envoy through as a
no-credentials request (Envoy replies with Cross-Origin-Resource-Policy:
cross-origin). However, the Web Worker still cannot import Pyodide or the
wheels from a cross-origin CDN under COEP, so the build vendors them
same-origin: Pyodide into /pyodide/ and the wheels into the site root. The
worker reads self.PCW_PYODIDE_INDEX_URL (default /pyodide/) and
self.PCW_WHEEL_URL / self.PCW_PYSPARK_WHEEL_URL; only override them with
same-origin URLs.
Local dev server with the right headers¶
jupyter lite serve and python -m http.server do not set COOP/COEP by
default. For local dev, use a server that honours _headers, or this tiny
helper:
# serve_coi.py - python http server that sets COOP/COEP
import http.server, functools
class H(http.server.SimpleHTTPRequestHandler):
def end_headers(self):
self.send_header("Cross-Origin-Opener-Policy", "same-origin")
self.send_header("Cross-Origin-Embedder-Policy", "credentialless")
super().end_headers()
http.server.test(HandlerClass=H, port=8000)
How the bridge integrates with the kernel (no fork)¶
The @jupyterlite/pyodide-kernel runs Pyodide in its own ES-module Web
Worker, which cannot be replaced. Integration is non-invasive, in two halves:
- Page side -
pcw_kernel_bridge.js. Loaded before the JupyterLite app bundle, it wraps the globalWorkerconstructor so every kernel worker the app spawns gets aBridgeattached. The Bridge does the real cross-originfetchand writes response windows back into the SAB. It reacts only to the namespaced envelope{__pcw__:{...}}and ignores the kernel's own message framing, so the two coexist. - Worker side -
pyspark_connect_web.worker.kernel_bootstrap. Imported once inside the kernel (the demo'simport pyspark_connect_web; pcw.install()is enough).SabSyncChannelauto-detects it is in a kernel worker and usestransport="kernel": it allocates the SAB, posts the namespaced envelope, then parks onAtomics.wait. No notebook code beyondpcw.install()is needed.
Load order¶
pcw_kernel_bridge.js (and, on header-less hosts, coi-serviceworker.js) MUST
run before JupyterLite reads Worker off the global scope. Inject them as
<script> tags in the JupyterLite index.html template head, before the app
bundle:
<head>
<script src="./coi-serviceworker.js"></script> <!-- header-less hosts only -->
<script type="module" src="./pcw_kernel_bridge.js"></script>
<!-- JupyterLite app bundle loads after these -->
</head>
Building the site¶
See Packaging & release for the full jupyter lite build
flow (build the wheel, build the lite site, copy _headers + the wheel into the
output, serve with COOP/COEP).