Skip to content

Na scalar string #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TomAugspurger
Copy link

With this 6 tests are failing in test_string.py

pandas/tests/test_strings.py::test_string_array[count] FAILED                                                                                                                [ 16%]
pandas/tests/test_strings.py::test_string_array[find] FAILED                                                                                                                 [ 33%]
pandas/tests/test_strings.py::test_string_array[index] FAILED                                                                                                                [ 50%]
pandas/tests/test_strings.py::test_string_array[rfind] FAILED                                                                                                                [ 66%]
pandas/tests/test_strings.py::test_string_array[rindex] FAILED                                                                                                               [ 83%]
pandas/tests/test_strings.py::test_string_array[len] FAILED                                                                                                                  [100%]

They're all of the form

(Pdb) pp result
0      1
1      0
2    NaN
3      0
dtype: object
(Pdb) pp expected
0    1.0
1    0.0
2    NaN
3    0.0
dtype: float64

@TomAugspurger
Copy link
Author

I'll try to fix those up before we merge this into your PR.

@TomAugspurger
Copy link
Author

TomAugspurger commented Nov 13, 2019

Fixed those failures in 004e42f, though we may want to update the test to change behavior (and return an IntegerArray there).

In [6]: a = pd.Series(['a', 'bb', None], dtype="string")

In [7]: a.str.count('a')
Out[7]:
0    1.0
1    0.0
2    NaN
dtype: float64

so that would be an Int64.

Copy link
Owner

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool!

@@ -1500,7 +1500,7 @@ cdef class Validator:
f'must define is_value_typed')

cdef bint is_valid_null(self, object value) except -1:
return value is None or util.is_nan(value)
return value is None or value is C_NA or util.is_nan(value)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is to have inferring and the validation in StringArray working with NA?

One thing I have been thinking about is that it could be an option to let pd.NA play actually a somewhat different role than np.nan or None in construction / type inference. Eg so that if someone does pd.Series([1, 2, pd.NA]) it automatically becomes a nullable integer dtype instead of float (or object). Since pd.NA is new, we can actually do this without breaking backwards compatibility.
(now, not sure if that idea relates to the code here, and it can also be done later)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this is to have inferring and the validation in StringArray working with NA?

Correct.

let pd.NA play actually a somewhat different role than np.nan or None in construction / type inference.

That may be in infer_dtype

arr = self._ndarray.copy()
mask = self.isna()
arr[mask] = -1
return arr, -1

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can't specify pd.NA here as the indicator? (since that is already in the values)
Or does the algo code does not like that?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Algos didn't like it. Somewhere in there we do a value == na_value, which raises.

We may need / want to rewrite things to be masked based.

@jorisvandenbossche
Copy link
Owner

BTW, should probably merge so can be reviewed together?

@jorisvandenbossche
Copy link
Owner

, though we may want to update the test to change behavior (and return an IntegerArray there).

+1

@TomAugspurger
Copy link
Author

BTW, should probably merge so can be reviewed together?

I think so.

I added a failing test case for .str ops returning numeric values. We can discuss it in the main PR I think. I can probably implement it today or tomorrow if there's agreement.

@TomAugspurger
Copy link
Author

I think this can be merged in anytime. There are probably a few things to fix up, but this gives a good idea of what this will look like.

@jorisvandenbossche jorisvandenbossche merged commit c72e3ee into jorisvandenbossche:NA-scalar Nov 14, 2019
jorisvandenbossche pushed a commit that referenced this pull request Mar 13, 2023
* Removed PLR5501

* 7 elif changed

* 8 more

* more fixess

* moreeeeeeeee changes

* fix python_parser.py (try #1)

* try no. 2

* fix python_parser.py (try no. 3)

* v2.1.0 (fix try 1)

* try no. 2

* Removed extra lines
jorisvandenbossche pushed a commit that referenced this pull request Oct 10, 2024
* second item in tuple is no longer truncated at first colon

pandas-dev#59623

* added testcase for maybe_convert_css_to_tuples

pandas-dev#59623

* maybe_convert_css_to_tuples() raises on strings without ":"

* fixed implicit str concatination

* Fixed raise on empty string

* Update test_style.py

* attr:; -> ("attr","")

Same behavior as before patch

* add test for "attr:;", ie empty value

* str concatenation in the test broke mypy

* revert explicit str concat

* Invalidarg patch black (#1)

* black test_style

* Update style_render.py

---------

Co-authored-by: Matthew Roeschke <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants