-
Notifications
You must be signed in to change notification settings - Fork 0
Na scalar string #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Na scalar string #1
Conversation
I'll try to fix those up before we merge this into your PR. |
Fixed those failures in 004e42f, though we may want to update the test to change behavior (and return an IntegerArray there). In [6]: a = pd.Series(['a', 'bb', None], dtype="string")
In [7]: a.str.count('a')
Out[7]:
0 1.0
1 0.0
2 NaN
dtype: float64 so that would be an Int64. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool!
@@ -1500,7 +1500,7 @@ cdef class Validator: | |||
f'must define is_value_typed') | |||
|
|||
cdef bint is_valid_null(self, object value) except -1: | |||
return value is None or util.is_nan(value) | |||
return value is None or value is C_NA or util.is_nan(value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is to have inferring and the validation in StringArray working with NA?
One thing I have been thinking about is that it could be an option to let pd.NA
play actually a somewhat different role than np.nan or None in construction / type inference. Eg so that if someone does pd.Series([1, 2, pd.NA])
it automatically becomes a nullable integer dtype instead of float (or object). Since pd.NA is new, we can actually do this without breaking backwards compatibility.
(now, not sure if that idea relates to the code here, and it can also be done later)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this is to have inferring and the validation in StringArray working with NA?
Correct.
let pd.NA play actually a somewhat different role than np.nan or None in construction / type inference.
That may be in infer_dtype
arr = self._ndarray.copy() | ||
mask = self.isna() | ||
arr[mask] = -1 | ||
return arr, -1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can't specify pd.NA here as the indicator? (since that is already in the values)
Or does the algo code does not like that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Algos didn't like it. Somewhere in there we do a value == na_value
, which raises.
We may need / want to rewrite things to be masked based.
BTW, should probably merge so can be reviewed together? |
+1 |
I think so. I added a failing test case for |
I think this can be merged in anytime. There are probably a few things to fix up, but this gives a good idea of what this will look like. |
* Removed PLR5501 * 7 elif changed * 8 more * more fixess * moreeeeeeeee changes * fix python_parser.py (try #1) * try no. 2 * fix python_parser.py (try no. 3) * v2.1.0 (fix try 1) * try no. 2 * Removed extra lines
* second item in tuple is no longer truncated at first colon pandas-dev#59623 * added testcase for maybe_convert_css_to_tuples pandas-dev#59623 * maybe_convert_css_to_tuples() raises on strings without ":" * fixed implicit str concatination * Fixed raise on empty string * Update test_style.py * attr:; -> ("attr","") Same behavior as before patch * add test for "attr:;", ie empty value * str concatenation in the test broke mypy * revert explicit str concat * Invalidarg patch black (#1) * black test_style * Update style_render.py --------- Co-authored-by: Matthew Roeschke <[email protected]>
With this 6 tests are failing in
test_string.py
They're all of the form