A beta release of a new ADBC driver for Apache Spark is now available via dbc. Run dbc install spark --pre to try it out today.

The driver supports query execution, bulk ingestion, and catalog metadata retrieval. It can connect via the HiveServer2 Thrift protocol (either over TCP, or HTTP/HTTPS), Spark Connect, or Apache Livy. Documentation can be found at docs.adbc-drivers.org. This is a preview release, and more features are actively being developed, so stay tuned.

The driver was developed by the ADBC Driver Foundry and is implemented in Go.

To get started, provide a connection URI with a spark:// scheme:

spark://host:port/?api=connect&auth=token

The driver can then be used like any other driver. For example, load it in Python with adbc-driver-manager:

from adbc_driver_manager import dbapi

with (
    dbapi.connect("spark://localhost:15002?auth_type=none&api=connect") as con,
    con.cursor() as cursor,
):
    cursor.execute("SELECT * FROM mytable")
    table = cursor.fetch_arrow_table()

Bug reports and feature requests are welcome through GitHub Issues in the spark repository in the ADBC Driver Foundry. You can also start a Discussion on GitHub or join the Columnar Community Slack.