Na scalar string #1

TomAugspurger · 2019-11-13T22:03:19Z

With this 6 tests are failing in test_string.py

pandas/tests/test_strings.py::test_string_array[count] FAILED                                                                                                                [ 16%]
pandas/tests/test_strings.py::test_string_array[find] FAILED                                                                                                                 [ 33%]
pandas/tests/test_strings.py::test_string_array[index] FAILED                                                                                                                [ 50%]
pandas/tests/test_strings.py::test_string_array[rfind] FAILED                                                                                                                [ 66%]
pandas/tests/test_strings.py::test_string_array[rindex] FAILED                                                                                                               [ 83%]
pandas/tests/test_strings.py::test_string_array[len] FAILED                                                                                                                  [100%]

They're all of the form

(Pdb) pp result
0      1
1      0
2    NaN
3      0
dtype: object
(Pdb) pp expected
0    1.0
1    0.0
2    NaN
3    0.0
dtype: float64

…calar-string

TomAugspurger · 2019-11-13T22:06:59Z

I'll try to fix those up before we merge this into your PR.

TomAugspurger · 2019-11-13T22:17:58Z

Fixed those failures in 004e42f, though we may want to update the test to change behavior (and return an IntegerArray there).

In [6]: a = pd.Series(['a', 'bb', None], dtype="string")

In [7]: a.str.count('a')
Out[7]:
0    1.0
1    0.0
2    NaN
dtype: float64

so that would be an Int64.

jorisvandenbossche

Cool!

jorisvandenbossche · 2019-11-14T07:47:19Z

pandas/_libs/lib.pyx

@@ -1500,7 +1500,7 @@ cdef class Validator:
                                  f'must define is_value_typed')

    cdef bint is_valid_null(self, object value) except -1:
-        return value is None or util.is_nan(value)
+        return value is None or value is C_NA or util.is_nan(value)


I suppose this is to have inferring and the validation in StringArray working with NA?

One thing I have been thinking about is that it could be an option to let pd.NA play actually a somewhat different role than np.nan or None in construction / type inference. Eg so that if someone does pd.Series([1, 2, pd.NA]) it automatically becomes a nullable integer dtype instead of float (or object). Since pd.NA is new, we can actually do this without breaking backwards compatibility.
(now, not sure if that idea relates to the code here, and it can also be done later)

I suppose this is to have inferring and the validation in StringArray working with NA?

Correct.

let pd.NA play actually a somewhat different role than np.nan or None in construction / type inference.

That may be in infer_dtype

pandas/core/arrays/string_.py

jorisvandenbossche · 2019-11-14T07:51:58Z

pandas/core/arrays/string_.py

+        arr = self._ndarray.copy()
+        mask = self.isna()
+        arr[mask] = -1
+        return arr, -1


You can't specify pd.NA here as the indicator? (since that is already in the values)
Or does the algo code does not like that?

Algos didn't like it. Somewhere in there we do a value == na_value, which raises.

We may need / want to rewrite things to be masked based.

pandas/core/arrays/string_.py

jorisvandenbossche · 2019-11-14T07:55:48Z

BTW, should probably merge so can be reviewed together?

jorisvandenbossche · 2019-11-14T07:56:32Z

, though we may want to update the test to change behavior (and return an IntegerArray there).

+1

TomAugspurger · 2019-11-14T13:45:15Z

BTW, should probably merge so can be reviewed together?

I think so.

I added a failing test case for .str ops returning numeric values. We can discuss it in the main PR I think. I can probably implement it today or tomorrow if there's agreement.

TomAugspurger · 2019-11-14T13:46:33Z

I think this can be merged in anytime. There are probably a few things to fix up, but this gives a good idea of what this will look like.

* Removed PLR5501 * 7 elif changed * 8 more * more fixess * moreeeeeeeee changes * fix python_parser.py (try #1) * try no. 2 * fix python_parser.py (try no. 3) * v2.1.0 (fix try 1) * try no. 2 * Removed extra lines

* second item in tuple is no longer truncated at first colon pandas-dev#59623 * added testcase for maybe_convert_css_to_tuples pandas-dev#59623 * maybe_convert_css_to_tuples() raises on strings without ":" * fixed implicit str concatination * Fixed raise on empty string * Update test_style.py * attr:; -> ("attr","") Same behavior as before patch * add test for "attr:;", ie empty value * str concatenation in the test broke mypy * revert explicit str concat * Invalidarg patch black (#1) * black test_style * Update style_render.py --------- Co-authored-by: Matthew Roeschke <[email protected]>

TomAugspurger added 3 commits November 13, 2019 15:32

wip

266603f

Merge remote-tracking branch 'jorisvandenbossche/NA-scalar' into NA-s…

bf12ba8

…calar-string

update

e83a8a9

fixup

004e42f

jorisvandenbossche reviewed Nov 14, 2019

View reviewed changes

TomAugspurger added 2 commits November 14, 2019 07:39

simplify

1660769

add note and failing test

c357f8c

jorisvandenbossche merged commit c72e3ee into jorisvandenbossche:NA-scalar Nov 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Na scalar string #1

Na scalar string #1

Uh oh!

TomAugspurger commented Nov 13, 2019

Uh oh!

TomAugspurger commented Nov 13, 2019

Uh oh!

TomAugspurger commented Nov 13, 2019 •

edited

Loading

Uh oh!

jorisvandenbossche left a comment

Uh oh!

jorisvandenbossche Nov 14, 2019

Uh oh!

TomAugspurger Nov 14, 2019

Uh oh!

Uh oh!

jorisvandenbossche Nov 14, 2019

Uh oh!

TomAugspurger Nov 14, 2019

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 14, 2019

Uh oh!

jorisvandenbossche commented Nov 14, 2019

Uh oh!

TomAugspurger commented Nov 14, 2019

Uh oh!

TomAugspurger commented Nov 14, 2019

Uh oh!

Uh oh!

Na scalar string #1

Na scalar string #1

Uh oh!

Conversation

TomAugspurger commented Nov 13, 2019

Uh oh!

TomAugspurger commented Nov 13, 2019

Uh oh!

TomAugspurger commented Nov 13, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jorisvandenbossche left a comment

Choose a reason for hiding this comment

Uh oh!

jorisvandenbossche Nov 14, 2019

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Nov 14, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorisvandenbossche Nov 14, 2019

Choose a reason for hiding this comment

Uh oh!

TomAugspurger Nov 14, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jorisvandenbossche commented Nov 14, 2019

Uh oh!

jorisvandenbossche commented Nov 14, 2019

Uh oh!

TomAugspurger commented Nov 14, 2019

Uh oh!

TomAugspurger commented Nov 14, 2019

Uh oh!

Uh oh!

TomAugspurger commented Nov 13, 2019 •

edited

Loading