A unified analytics engine for large-scale data processing.
projects
Open-source work I'm directly involved in. Star counts reflect a snapshot; see GitHub for current numbers.
Universal columnar format and multi-language toolbox for fast data interchange.
Enables Python programs to dynamically access arbitrary Java objects.
Pandas API on Apache Spark. Co-led; merged upstream into PySpark as the pandas-on-Spark API in Spark 3.2.
XML data source for Spark SQL and DataFrames.