Skip to content

Commit bd0e820

Browse files
authored
Doc/cross reference (#791)
* Update docstrings so that cross references work in online docs. Also switch from autosummary to autoapi in sphinx for building API reference documents * Update documentation to cross reference * Correct class names and internal attr * Revert changes that will end up coming in via PR #782 * Add autoapi to requirements file * Add git ignore for files retrieved during local site building * Remove unused portions of doc config * Reset substrait capitalization that was reverted during rebase * Small example changes
1 parent 1d61548 commit bd0e820

34 files changed

+370
-517
lines changed

docs/.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,2 @@
1+
pokemon.csv
2+
yellow_trip_data.parquet

docs/requirements.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,4 +22,5 @@ maturin
2222
jinja2
2323
ipython
2424
pandas
25-
pickleshare
25+
pickleshare
26+
sphinx-autoapi

docs/source/api.rst

Lines changed: 0 additions & 31 deletions
This file was deleted.

docs/source/api/dataframe.rst

Lines changed: 0 additions & 27 deletions
This file was deleted.

docs/source/api/execution_context.rst

Lines changed: 0 additions & 29 deletions
This file was deleted.

docs/source/api/expression.rst

Lines changed: 0 additions & 27 deletions
This file was deleted.

docs/source/api/functions.rst

Lines changed: 0 additions & 27 deletions
This file was deleted.

docs/source/api/object_store.rst

Lines changed: 0 additions & 27 deletions
This file was deleted.

docs/source/conf.py

Lines changed: 26 additions & 30 deletions
Original file line numberDiff line numberDiff line change
@@ -46,15 +46,11 @@
4646
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
4747
# ones.
4848
extensions = [
49-
"sphinx.ext.autodoc",
50-
"sphinx.ext.autosummary",
51-
"sphinx.ext.doctest",
52-
"sphinx.ext.ifconfig",
5349
"sphinx.ext.mathjax",
54-
"sphinx.ext.viewcode",
5550
"sphinx.ext.napoleon",
5651
"myst_parser",
5752
"IPython.sphinxext.ipython_directive",
53+
"autoapi.extension",
5854
]
5955

6056
source_suffix = {
@@ -70,33 +66,35 @@
7066
# This pattern also affects html_static_path and html_extra_path.
7167
exclude_patterns = []
7268

73-
# Show members for classes in .. autosummary
74-
autodoc_default_options = {
75-
"members": None,
76-
"undoc-members": None,
77-
"show-inheritance": None,
78-
"inherited-members": None,
79-
}
80-
81-
autosummary_generate = True
82-
69+
autoapi_dirs = ["../../python"]
70+
autoapi_ignore = ["*tests*"]
71+
autoapi_member_order = "groupwise"
72+
suppress_warnings = ["autoapi.python_import_resolution"]
73+
autoapi_python_class_content = "both"
8374

84-
def autodoc_skip_member(app, what, name, obj, skip, options):
85-
exclude_functions = "__init__"
86-
exclude_classes = ("Expr", "DataFrame")
8775

88-
class_name = ""
89-
if hasattr(obj, "__qualname__"):
90-
if obj.__qualname__ is not None:
91-
class_name = obj.__qualname__.split(".")[0]
76+
def autoapi_skip_member_fn(app, what, name, obj, skip, options):
77+
skip_contents = [
78+
# Re-exports
79+
("class", "datafusion.DataFrame"),
80+
("class", "datafusion.SessionContext"),
81+
("module", "datafusion.common"),
82+
# Deprecated
83+
("class", "datafusion.substrait.serde"),
84+
("class", "datafusion.substrait.plan"),
85+
("class", "datafusion.substrait.producer"),
86+
("class", "datafusion.substrait.consumer"),
87+
("method", "datafusion.context.SessionContext.tables"),
88+
("method", "datafusion.dataframe.DataFrame.unnest_column"),
89+
]
90+
if (what, name) in skip_contents:
91+
skip = True
9292

93-
should_exclude = name in exclude_functions and class_name in exclude_classes
93+
return skip
9494

95-
return True if should_exclude else None
9695

97-
98-
def setup(app):
99-
app.connect("autodoc-skip-member", autodoc_skip_member)
96+
def setup(sphinx):
97+
sphinx.connect("autoapi-skip-member", autoapi_skip_member_fn)
10098

10199

102100
# -- Options for HTML output -------------------------------------------------
@@ -106,9 +104,7 @@ def setup(app):
106104
#
107105
html_theme = "pydata_sphinx_theme"
108106

109-
html_theme_options = {
110-
"use_edit_page_button": True,
111-
}
107+
html_theme_options = {"use_edit_page_button": False, "show_toc_level": 2}
112108

113109
html_context = {
114110
"github_user": "apache",

docs/source/index.rst

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -104,5 +104,3 @@ Example
104104
:hidden:
105105
:maxdepth: 1
106106
:caption: API
107-
108-
api

docs/source/user-guide/basics.rst

Lines changed: 8 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
.. specific language governing permissions and limitations
1616
.. under the License.
1717
18+
.. _user_guide_concepts:
19+
1820
Concepts
1921
========
2022

@@ -52,7 +54,7 @@ The first statement group:
5254
# create a context
5355
ctx = datafusion.SessionContext()
5456
55-
creates a :code:`SessionContext`, that is, the main interface for executing queries with DataFusion. It maintains the state
57+
creates a :py:class:`~datafusion.context.SessionContext`, that is, the main interface for executing queries with DataFusion. It maintains the state
5658
of the connection between a user and an instance of the DataFusion engine. Additionally it provides the following functionality:
5759

5860
- Create a DataFrame from a CSV or Parquet data source.
@@ -72,9 +74,9 @@ The second statement group creates a :code:`DataFrame`,
7274
df = ctx.create_dataframe([[batch]])
7375
7476
A DataFrame refers to a (logical) set of rows that share the same column names, similar to a `Pandas DataFrame <https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html>`_.
75-
DataFrames are typically created by calling a method on :code:`SessionContext`, such as :code:`read_csv`, and can then be modified by
76-
calling the transformation methods, such as :meth:`.DataFrame.filter`, :meth:`.DataFrame.select`, :meth:`.DataFrame.aggregate`,
77-
and :meth:`.DataFrame.limit` to build up a query definition.
77+
DataFrames are typically created by calling a method on :py:class:`~datafusion.context.SessionContext`, such as :code:`read_csv`, and can then be modified by
78+
calling the transformation methods, such as :py:func:`~datafusion.dataframe.DataFrame.filter`, :py:func:`~datafusion.dataframe.DataFrame.select`, :py:func:`~datafusion.dataframe.DataFrame.aggregate`,
79+
and :py:func:`~datafusion.dataframe.DataFrame.limit` to build up a query definition.
7880

7981
The third statement uses :code:`Expressions` to build up a query definition.
8082

@@ -85,5 +87,5 @@ The third statement uses :code:`Expressions` to build up a query definition.
8587
col("a") - col("b"),
8688
)
8789
88-
Finally the :code:`collect` method converts the logical plan represented by the DataFrame into a physical plan and execute it,
89-
collecting all results into a list of `RecordBatch <https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html>`_.
90+
Finally the :py:func:`~datafusion.dataframe.DataFrame.collect` method converts the logical plan represented by the DataFrame into a physical plan and execute it,
91+
collecting all results into a list of `RecordBatch <https://arrow.apache.org/docs/python/generated/pyarrow.RecordBatch.html>`_.

docs/source/user-guide/common-operations/aggregations.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ Aggregation
1919
============
2020

2121
An aggregate or aggregation is a function where the values of multiple rows are processed together to form a single summary value.
22-
For performing an aggregation, DataFusion provides the :meth:`.DataFrame.aggregate`
22+
For performing an aggregation, DataFusion provides the :py:func:`~datafusion.dataframe.DataFrame.aggregate`
2323

2424
.. ipython:: python
2525

docs/source/user-guide/common-operations/basic-info.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -34,26 +34,26 @@ In this section, you will learn how to display essential details of DataFrames u
3434
})
3535
df
3636
37-
Use :meth:`.DataFrame.limit` to view the top rows of the frame:
37+
Use :py:func:`~datafusion.dataframe.DataFrame.limit` to view the top rows of the frame:
3838

3939
.. ipython:: python
4040
4141
df.limit(2)
4242
43-
Display the columns of the DataFrame using :meth:`.DataFrame.schema`:
43+
Display the columns of the DataFrame using :py:func:`~datafusion.dataframe.DataFrame.schema`:
4444

4545
.. ipython:: python
4646
4747
df.schema()
4848
49-
The method :meth:`.DataFrame.to_pandas` uses pyarrow to convert to pandas DataFrame, by collecting the batches,
49+
The method :py:func:`~datafusion.dataframe.DataFrame.to_pandas` uses pyarrow to convert to pandas DataFrame, by collecting the batches,
5050
passing them to an Arrow table, and then converting them to a pandas DataFrame.
5151

5252
.. ipython:: python
5353
5454
df.to_pandas()
5555
56-
:meth:`.DataFrame.describe` shows a quick statistic summary of your data:
56+
:py:func:`~datafusion.dataframe.DataFrame.describe` shows a quick statistic summary of your data:
5757

5858
.. ipython:: python
5959

docs/source/user-guide/common-operations/expressions.rst

Lines changed: 7 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,8 @@
1515
.. specific language governing permissions and limitations
1616
.. under the License.
1717
18+
.. _expressions:
19+
1820
Expressions
1921
===========
2022

@@ -26,16 +28,16 @@ concept shared across most compilers and databases.
2628
Column
2729
------
2830

29-
The first expression most new users will interact with is the Column, which is created by calling :func:`col`.
30-
This expression represents a column within a DataFrame. The function :func:`col` takes as in input a string
31+
The first expression most new users will interact with is the Column, which is created by calling :py:func:`~datafusion.col`.
32+
This expression represents a column within a DataFrame. The function :py:func:`~datafusion.col` takes as in input a string
3133
and returns an expression as it's output.
3234

3335
Literal
3436
-------
3537

3638
Literal expressions represent a single value. These are helpful in a wide range of operations where
37-
a specific, known value is of interest. You can create a literal expression using the function :func:`lit`.
38-
The type of the object passed to the :func:`lit` function will be used to convert it to a known data type.
39+
a specific, known value is of interest. You can create a literal expression using the function :py:func:`~datafusion.lit`.
40+
The type of the object passed to the :py:func:`~datafusion.lit` function will be used to convert it to a known data type.
3941

4042
In the following example we create expressions for the column named `color` and the literal scalar string `red`.
4143
The resultant variable `red_units` is itself also an expression.
@@ -62,7 +64,7 @@ Functions
6264
---------
6365

6466
As mentioned before, most functions in DataFusion return an expression at their output. This allows us to create
65-
a wide variety of expressions built up from other expressions. For example, :func:`.alias` is a function that takes
67+
a wide variety of expressions built up from other expressions. For example, :py:func:`~datafusion.expr.Expr.alias` is a function that takes
6668
as it input a single expression and returns an expression in which the name of the expression has changed.
6769

6870
The following example shows a series of expressions that are built up from functions operating on expressions.

0 commit comments

Comments
 (0)