projects

Open-source work I'm directly involved in. Star counts reflect a snapshot; see GitHub for current numbers.

A unified analytics engine for large-scale data processing.

Universal columnar format and multi-language toolbox for fast data interchange.

Enables Python programs to dynamically access arbitrary Java objects.

Pandas API on Apache Spark. Co-led; merged upstream into PySpark as the pandas-on-Spark API in Spark 3.2.