Skip to content

Add Result.to_df to export records as pandas DataFrame #663

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Mar 31, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 9 additions & 5 deletions docs/source/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -804,14 +804,14 @@ A :class:`neo4j.Result` is attached to an active connection, through a :class:`n

.. automethod:: graph

**This is experimental.** (See :ref:`filter-warnings-ref`)

.. automethod:: value

.. automethod:: values

.. automethod:: data

.. automethod:: to_df

.. automethod:: closed

See https://neo4j.com/docs/python-manual/current/cypher-workflow/#python-driver-type-mapping for more about type mapping.
Expand Down Expand Up @@ -987,7 +987,7 @@ Path :class:`neo4j.graph.Path`
Node
====

.. autoclass:: neo4j.graph.Node()
.. autoclass:: neo4j.graph.Node

.. describe:: node == other

Expand Down Expand Up @@ -1022,6 +1022,8 @@ Node

.. autoattribute:: id

.. autoattribute:: element_id

.. autoattribute:: labels

.. automethod:: get
Expand All @@ -1036,7 +1038,7 @@ Node
Relationship
============

.. autoclass:: neo4j.graph.Relationship()
.. autoclass:: neo4j.graph.Relationship

.. describe:: relationship == other

Expand Down Expand Up @@ -1076,6 +1078,8 @@ Relationship

.. autoattribute:: id

.. autoattribute:: element_id

.. autoattribute:: nodes

.. autoattribute:: start_node
Expand All @@ -1097,7 +1101,7 @@ Relationship
Path
====

.. autoclass:: neo4j.graph.Path()
.. autoclass:: neo4j.graph.Path

.. describe:: path == other

Expand Down
4 changes: 2 additions & 2 deletions docs/source/async_api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -511,14 +511,14 @@ A :class:`neo4j.AsyncResult` is attached to an active connection, through a :cla

.. automethod:: graph

**This is experimental.** (See :ref:`filter-warnings-ref`)

.. automethod:: value

.. automethod:: values

.. automethod:: data

.. automethod:: to_df

.. automethod:: closed

See https://neo4j.com/docs/python-manual/current/cypher-workflow/#python-driver-type-mapping for more about type mapping.
129 changes: 128 additions & 1 deletion neo4j/_async/work/result.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,15 @@
from warnings import warn

from ..._async_compat.util import AsyncUtil
from ...data import DataDehydrator
from ...data import (
DataDehydrator,
RecordTableRowExporter,
)
from ...exceptions import (
ResultConsumedError,
ResultNotSingleError,
)
from ...meta import experimental
from ...work import ResultSummary
from ..io import ConnectionErrorHandler

Expand Down Expand Up @@ -455,6 +459,8 @@ async def graph(self):
was obtained has been closed or the Result has been explicitly
consumed.

**This is experimental.** (See :ref:`filter-warnings-ref`)

.. versionchanged:: 5.0
Can raise :exc:`ResultConsumedError`.
"""
Expand Down Expand Up @@ -519,6 +525,127 @@ async def data(self, *keys):
"""
return [record.data(*keys) async for record in self]

@experimental("pandas support is experimental and might be changed or "
"removed in future versions")
async def to_df(self, expand=False):
r"""Convert (the rest of) the result to a pandas DataFrame.

This method is only available if the `pandas` library is installed.

::

res = await tx.run("UNWIND range(1, 10) AS n RETURN n, n+1 as m")
df = await res.to_df()

for instance will return a DataFrame with two columns: ``n`` and ``m``
and 10 rows.

:param expand: if :const:`True`, some structures in the result will be
recursively expanded (flattened out into multiple columns) like so
(everything inside ``<...>`` is a placeholder):

* :class:`.Node` objects under any variable ``<n>`` will be
expanded into columns (the recursion stops here)

* ``<n>().prop.<property_name>`` (any) for each property of the
node.
* ``<n>().element_id`` (str) the node's element id.
See :attr:`.Node.element_id`.
* ``<n>().labels`` (frozenset of str) the node's labels.
See :attr:`.Node.labels`.

* :class:`.Relationship` objects under any variable ``<r>``
will be expanded into columns (the recursion stops here)

* ``<r>->.prop.<property_name>`` (any) for each property of the
relationship.
* ``<r>->.element_id`` (str) the relationship's element id.
See :attr:`.Relationship.element_id`.
* ``<r>->.start.element_id`` (str) the relationship's
start node's element id.
See :attr:`.Relationship.start_node`.
* ``<r>->.end.element_id`` (str) the relationship's
end node's element id.
See :attr:`.Relationship.end_node`.
* ``<r>->.type`` (str) the relationship's type.
See :attr:`.Relationship.type`.

* :const:`list` objects under any variable ``<l>`` will be expanded
into

* ``<l>[].0`` (any) the 1st list element
* ``<l>[].1`` (any) the 2nd list element
* ...

* :const:`dict` objects under any variable ``<d>`` will be expanded
into

* ``<d>{}.<key1>`` (any) the 1st key of the dict
* ``<d>{}.<key2>`` (any) the 2nd key of the dict
* ...

* :const:`list` and :const:`dict` objects are expanded recursively.
Example::

variable x: [{"foo": "bar", "baz": [42, 0]}, "foobar"]

will be expanded to::

{
"x[].0{}.foo": "bar",
"x[].0{}.baz[].0": 42,
"n[].0{}.baz[].1": 0,
"n[].1": "foobar"
}

* Everything else (including :class:`.Path` objects) will not
be flattened.

:const:`dict` keys and variable names that contain ``.`` or ``\``
will be escaped with a backslash (``\.`` and ``\\`` respectively).
:type expand: bool

:rtype: :py:class:`pandas.DataFrame`
:raises ImportError: if `pandas` library is not available.
:raises ResultConsumedError: if the transaction from which this result
was obtained has been closed or the Result has been explicitly
consumed.

**This is experimental.**
``pandas`` support might be changed or removed in future versions
without warning. (See :ref:`filter-warnings-ref`)
"""
import pandas as pd

if not expand:
return pd.DataFrame(await self.values(), columns=self._keys)
else:
df_keys = None
rows = []
async for record in self:
row = RecordTableRowExporter().transform(dict(record.items()))
if df_keys == row.keys():
rows.append(row.values())
elif df_keys is None:
df_keys = row.keys()
rows.append(row.values())
elif df_keys is False:
rows.append(row)
else:
# The rows have different keys. We need to pass a list
# of dicts to pandas
rows = [{k: v for k, v in zip(df_keys, r)} for r in rows]
df_keys = False
rows.append(row)
if df_keys is False:
return pd.DataFrame(rows)
else:
columns = df_keys or [
k.replace(".", "\\.").replace("\\", "\\\\")
for k in self._keys
]
return pd.DataFrame(rows, columns=columns)

def closed(self):
"""Return True if the result has been closed.

Expand Down
129 changes: 128 additions & 1 deletion neo4j/_sync/work/result.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,11 +20,15 @@
from warnings import warn

from ..._async_compat.util import Util
from ...data import DataDehydrator
from ...data import (
DataDehydrator,
RecordTableRowExporter,
)
from ...exceptions import (
ResultConsumedError,
ResultNotSingleError,
)
from ...meta import experimental
from ...work import ResultSummary
from ..io import ConnectionErrorHandler

Expand Down Expand Up @@ -455,6 +459,8 @@ def graph(self):
was obtained has been closed or the Result has been explicitly
consumed.

**This is experimental.** (See :ref:`filter-warnings-ref`)

.. versionchanged:: 5.0
Can raise :exc:`ResultConsumedError`.
"""
Expand Down Expand Up @@ -519,6 +525,127 @@ def data(self, *keys):
"""
return [record.data(*keys) for record in self]

@experimental("pandas support is experimental and might be changed or "
"removed in future versions")
def to_df(self, expand=False):
r"""Convert (the rest of) the result to a pandas DataFrame.

This method is only available if the `pandas` library is installed.

::

res = tx.run("UNWIND range(1, 10) AS n RETURN n, n+1 as m")
df = res.to_df()

for instance will return a DataFrame with two columns: ``n`` and ``m``
and 10 rows.

:param expand: if :const:`True`, some structures in the result will be
recursively expanded (flattened out into multiple columns) like so
(everything inside ``<...>`` is a placeholder):

* :class:`.Node` objects under any variable ``<n>`` will be
expanded into columns (the recursion stops here)

* ``<n>().prop.<property_name>`` (any) for each property of the
node.
* ``<n>().element_id`` (str) the node's element id.
See :attr:`.Node.element_id`.
* ``<n>().labels`` (frozenset of str) the node's labels.
See :attr:`.Node.labels`.

* :class:`.Relationship` objects under any variable ``<r>``
will be expanded into columns (the recursion stops here)

* ``<r>->.prop.<property_name>`` (any) for each property of the
relationship.
* ``<r>->.element_id`` (str) the relationship's element id.
See :attr:`.Relationship.element_id`.
* ``<r>->.start.element_id`` (str) the relationship's
start node's element id.
See :attr:`.Relationship.start_node`.
* ``<r>->.end.element_id`` (str) the relationship's
end node's element id.
See :attr:`.Relationship.end_node`.
* ``<r>->.type`` (str) the relationship's type.
See :attr:`.Relationship.type`.

* :const:`list` objects under any variable ``<l>`` will be expanded
into

* ``<l>[].0`` (any) the 1st list element
* ``<l>[].1`` (any) the 2nd list element
* ...

* :const:`dict` objects under any variable ``<d>`` will be expanded
into

* ``<d>{}.<key1>`` (any) the 1st key of the dict
* ``<d>{}.<key2>`` (any) the 2nd key of the dict
* ...

* :const:`list` and :const:`dict` objects are expanded recursively.
Example::

variable x: [{"foo": "bar", "baz": [42, 0]}, "foobar"]

will be expanded to::

{
"x[].0{}.foo": "bar",
"x[].0{}.baz[].0": 42,
"n[].0{}.baz[].1": 0,
"n[].1": "foobar"
}

* Everything else (including :class:`.Path` objects) will not
be flattened.

:const:`dict` keys and variable names that contain ``.`` or ``\``
will be escaped with a backslash (``\.`` and ``\\`` respectively).
:type expand: bool

:rtype: :py:class:`pandas.DataFrame`
:raises ImportError: if `pandas` library is not available.
:raises ResultConsumedError: if the transaction from which this result
was obtained has been closed or the Result has been explicitly
consumed.

**This is experimental.**
``pandas`` support might be changed or removed in future versions
without warning. (See :ref:`filter-warnings-ref`)
"""
import pandas as pd

if not expand:
return pd.DataFrame(self.values(), columns=self._keys)
else:
df_keys = None
rows = []
for record in self:
row = RecordTableRowExporter().transform(dict(record.items()))
if df_keys == row.keys():
rows.append(row.values())
elif df_keys is None:
df_keys = row.keys()
rows.append(row.values())
elif df_keys is False:
rows.append(row)
else:
# The rows have different keys. We need to pass a list
# of dicts to pandas
rows = [{k: v for k, v in zip(df_keys, r)} for r in rows]
df_keys = False
rows.append(row)
if df_keys is False:
return pd.DataFrame(rows)
else:
columns = df_keys or [
k.replace(".", "\\.").replace("\\", "\\\\")
for k in self._keys
]
return pd.DataFrame(rows, columns=columns)

def closed(self):
"""Return True if the result has been closed.

Expand Down
Loading