Skip to content

Commit 278a33e

Browse files
authored
feat: add user defined table function support (#1113)
* Initial commit of udtf work * Add table provider capsule to PyTableProvider * working through functions to register udtf * Moving ffi-table-provider to datafusion-ffi-example since we will cover more examples than just table providers * Update example naming and split into different files for table provider and table function * Update udtf to handle python based table functions * Add python table function example and test * Update CI workflow for new file location * Add documentation * Add table decorator and unit test * pin arrow to 55.0.0 * Allow large error in order to match datafusion trait this is called for * Ruff formatting
1 parent 1e7494b commit 278a33e

File tree

19 files changed

+976
-275
lines changed

19 files changed

+976
-275
lines changed

.github/workflows/test.yaml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,9 +91,9 @@ jobs:
9191
9292
- name: FFI unit tests
9393
run: |
94-
cd examples/ffi-table-provider
94+
cd examples/datafusion-ffi-example
9595
uv run --no-project maturin develop --uv
96-
uv run --no-project pytest python/tests/_test_table_provider.py
96+
uv run --no-project pytest python/tests/_test*.py
9797
9898
- name: Cache the generated dataset
9999
id: cache-tpch-dataset

docs/source/user-guide/common-operations/udf-and-udfa.rst

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -242,3 +242,35 @@ determine which evaluate functions are called.
242242
})
243243
244244
df.select("a", exp_smooth(col("a")).alias("smooth_a")).show()
245+
246+
Table Functions
247+
---------------
248+
249+
User Defined Table Functions are slightly different than the other functions
250+
described here. These functions take any number of `Expr` arguments, but only
251+
literal expressions are supported. Table functions must return a Table
252+
Provider as described in the ref:`_io_custom_table_provider` page.
253+
254+
Once you have a table function, you can register it with the session context
255+
by using :py:func:`datafusion.context.SessionContext.register_udtf`.
256+
257+
There are examples of both rust backed and python based table functions in the
258+
examples folder of the repository. If you have a rust backed table function
259+
that you wish to expose via PyO3, you need to expose it as a ``PyCapsule``.
260+
261+
.. code-block:: rust
262+
263+
#[pymethods]
264+
impl MyTableFunction {
265+
fn __datafusion_table_function__<'py>(
266+
&self,
267+
py: Python<'py>,
268+
) -> PyResult<Bound<'py, PyCapsule>> {
269+
let name = cr"datafusion_table_function".into();
270+
271+
let func = self.clone();
272+
let provider = FFI_TableFunction::new(Arc::new(func), None);
273+
274+
PyCapsule::new(py, provider, Some(name))
275+
}
276+
}

0 commit comments

Comments
 (0)