Skip to content

Commit c375533

Browse files
authored
ENH: Enhancement for spss kwargs (#56434)
* Bug fix for spss kwargs * Update to whatsnew with bug fix details * Black formatting fix * Fix for whatsnew rst pre-commit * Sort whatsnew alphabetically * Fixing alphabetical sorting to whatsnew * Sorting whatsnew * Fixing kwargs * Fixing tests * Test fixes * Minor change in kwargs * Resolving PR comments * Resolving mypy issue * Docstring update for kwargs mypy * Updates based on PR comments * Fixing kwargs types * Finalizing updates based on suggestions * Updating to try to fix other random test fails * Unrelated test fix * Test fixes for dims/sizes * Updates to whatsnew * Case fix * Update to reword to enhancements
1 parent 600bbd5 commit c375533

File tree

3 files changed

+30
-2
lines changed

3 files changed

+30
-2
lines changed

doc/source/whatsnew/v3.0.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,7 @@ Other enhancements
3232
- :class:`pandas.api.typing.SASReader` is available for typing the output of :func:`read_sas` (:issue:`55689`)
3333
- :func:`DataFrame.to_excel` now raises an ``UserWarning`` when the character count in a cell exceeds Excel's limitation of 32767 characters (:issue:`56954`)
3434
- :func:`pandas.merge` now validates the ``how`` parameter input (merge type) (:issue:`59435`)
35+
- :func:`read_spss` now supports kwargs to be passed to pyreadstat (:issue:`56356`)
3536
- :func:`read_stata` now returns ``datetime64`` resolutions better matching those natively stored in the stata format (:issue:`55642`)
3637
- :meth:`DataFrame.agg` called with ``axis=1`` and a ``func`` which relabels the result index now raises a ``NotImplementedError`` (:issue:`58807`).
3738
- :meth:`Index.get_loc` now accepts also subclasses of ``tuple`` as keys (:issue:`57922`)

pandas/io/spss.py

Lines changed: 13 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
from __future__ import annotations
22

3-
from typing import TYPE_CHECKING
3+
from typing import (
4+
TYPE_CHECKING,
5+
Any,
6+
)
47

58
from pandas._libs import lib
69
from pandas.compat._optional import import_optional_dependency
@@ -24,6 +27,7 @@ def read_spss(
2427
usecols: Sequence[str] | None = None,
2528
convert_categoricals: bool = True,
2629
dtype_backend: DtypeBackend | lib.NoDefault = lib.no_default,
30+
**kwargs: Any,
2731
) -> DataFrame:
2832
"""
2933
Load an SPSS file from the file path, returning a DataFrame.
@@ -47,6 +51,10 @@ def read_spss(
4751
nullable :class:`ArrowDtype` :class:`DataFrame`
4852
4953
.. versionadded:: 2.0
54+
**kwargs
55+
Additional keyword arguments that can be passed to :func:`pyreadstat.read_sav`.
56+
57+
.. versionadded:: 3.0
5058
5159
Returns
5260
-------
@@ -74,7 +82,10 @@ def read_spss(
7482
usecols = list(usecols) # pyreadstat requires a list
7583

7684
df, metadata = pyreadstat.read_sav(
77-
stringify_path(path), usecols=usecols, apply_value_formats=convert_categoricals
85+
stringify_path(path),
86+
usecols=usecols,
87+
apply_value_formats=convert_categoricals,
88+
**kwargs,
7889
)
7990
df.attrs = metadata.__dict__
8091
if dtype_backend is not lib.no_default:

pandas/tests/io/test_spss.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -65,6 +65,22 @@ def test_spss_labelled_str(datapath):
6565
tm.assert_frame_equal(df, expected)
6666

6767

68+
@pytest.mark.filterwarnings("ignore::pandas.errors.ChainedAssignmentError")
69+
@pytest.mark.filterwarnings("ignore:ChainedAssignmentError:FutureWarning")
70+
def test_spss_kwargs(datapath):
71+
# test file from the Haven project (https://haven.tidyverse.org/)
72+
# Licence at LICENSES/HAVEN_LICENSE, LICENSES/HAVEN_MIT
73+
fname = datapath("io", "data", "spss", "labelled-str.sav")
74+
75+
df = pd.read_spss(fname, convert_categoricals=True, row_limit=1)
76+
expected = pd.DataFrame({"gender": ["Male"]}, dtype="category")
77+
tm.assert_frame_equal(df, expected)
78+
79+
df = pd.read_spss(fname, convert_categoricals=False, row_offset=1)
80+
expected = pd.DataFrame({"gender": ["F"]})
81+
tm.assert_frame_equal(df, expected)
82+
83+
6884
@pytest.mark.filterwarnings("ignore::pandas.errors.ChainedAssignmentError")
6985
@pytest.mark.filterwarnings("ignore:ChainedAssignmentError:FutureWarning")
7086
def test_spss_umlauts(datapath):

0 commit comments

Comments
 (0)