API reference¶
The public surface of pyspark_connect_web is intentionally tiny: you
install() the patch once, then use ordinary PySpark. Everything else
(DataFrame, Column, functions, SparkSession) is upstream PySpark, documented by
the PySpark project.
import pyspark_connect_web as pcw
pcw.install()
from pyspark.sql import SparkSession
spark = SparkSession.builder.remote("sc://localhost:8081/;transport=grpcweb").getOrCreate()
All names below are importable both from the top-level package
(pyspark_connect_web) and from pyspark_connect_web.patch.
Functions¶
install() -> None¶
Monkey-patch pyspark.sql.connect so the Connect client speaks grpc-web over the
blocking browser bridge instead of grpcio. Idempotent - calling it more than
once is a no-op. Raises UnsupportedPySparkError if
the installed PySpark is outside SUPPORTED_PYSPARK_RANGE.
uninstall() -> None¶
Restore the original (unpatched) pyspark.sql.connect symbols. Idempotent.
is_installed() -> bool¶
Return True if the patch is currently installed.
set_stub_factory(factory: Optional[StubFactory]) -> None¶
Override the factory that builds the gRPC-service stub the patched client uses.
Pass None to restore the default (the grpc-web stub). Primarily a
pluggable hook for wiring an alternate transport or injecting a fake in tests.
set_channel_factory(factory: Optional[ChannelFactory]) -> None¶
Override the factory that builds the SyncChannel
backing the stub. Pass None to restore the default (the SAB bridge
channel). Used to inject a fake/loopback channel in tests and integration harnesses.
check_pyspark_version(version: Optional[str] = None) -> tuple[int, int]¶
Validate the PySpark version (defaults to the installed one) against
SUPPORTED_PYSPARK_RANGE, returning (major, minor).
Raises UnsupportedPySparkError if out of range.
Constants¶
SUPPORTED_PYSPARK_RANGE¶
str - the supported PySpark version specifier, currently ">=4.0".
__version__¶
str - the installed pyspark-connect-web version.
Exceptions¶
UnsupportedPySparkError¶
Subclass of RuntimeError, raised by install() / check_pyspark_version()
when the running PySpark is outside the supported range.
Protocols (transport seam)¶
These describe the seam the factories above plug into; see
Architecture and the transport contract for the full contract.
SyncChannel- a blocking byte transport withunary(...)andserver_stream(...)methods (implemented by the SAB bridge in the browser, or by a loopback in tests).StubFactory/ChannelFactory- callables that build the service stub and theSyncChannel, respectively.