Skip to content

fix: type checking #993

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 8, 2025
Merged

fix: type checking #993

merged 10 commits into from
Mar 8, 2025

Conversation

chenkovsky
Copy link
Contributor

@chenkovsky chenkovsky commented Jan 12, 2025

Which issue does this PR close?

No

Rationale for this change

pyo3 native code has no python type, without type hint, it's error prone.

What changes are included in this PR?

  • fix several bugs found by type hint.

Are there any user-facing changes?

The original Rust-wrapped table class was directly exposed. This PR changes it to expose a wrapper class instead of the Rust-wrapped one.

Copy link
Contributor

@kylebarron kylebarron left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a massive amount of new code that will only serve to present additional overhead when developing. The internal API is not public. A decision was made in #750 to use Python wrapper classes as the method for public-facing type checking.

@chenkovsky
Copy link
Contributor Author

This is a massive amount of new code that will only serve to present additional overhead when developing. The internal API is not public. A decision was made in #750 to use Python wrapper classes as the method for public-facing type checking.

internal api type annotation is not for user of this library,it's useful for developer of this library. automated tools can be used to gurarantee code quality. i annotated the intetnal api,then i can easily find several bugs via pyright.

@kylebarron
Copy link
Contributor

There's no guarantee that those internal types will match the actual runtime types from Rust. So in effect this adds another layer of types that must be manually kept up to date on every PR. In that way, I don't see where the benefit comes in. We already have developer type checking on the Rust side.

@chenkovsky
Copy link
Contributor Author

There's no guarantee that those internal types will match the actual runtime types from Rust. So in effect this adds another layer of types that must be manually kept up to date on every PR. In that way, I don't see where the benefit comes in. We already have developer type checking on the Rust side.

ok. if you dont think its necessary, i will remove the annotation and only commit the bug fix.

@kylebarron
Copy link
Contributor

ok. if you dont think its necessary, i will remove the annotation and only commit the bug fix.

Let's let another developer chime in as well. This is just my opinion.

What are the bug fix(es)?

@@ -66,9 +66,10 @@ def __init__(self, table: df_internal.Table) -> None:
"""This constructor is not typically called by the end user."""
self.table = table

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is changing the user-facing API, no?

Copy link
Contributor Author

@chenkovsky chenkovsky Jan 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i checked the test,i think we should use @ property, but previously we didnt do wrapping somewhere, raw table is returned that's why test passed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've become convinced that the current code is broken and that this is the right change. Thank you @chenkovsky for doing this. I do think we need to put some notes to that effect in the section of user facing changes of the PR description.

@chenkovsky
Copy link
Contributor Author

most bugs are wrapping or unwrapping missing,since we have wrap classes.

@chenkovsky
Copy link
Contributor Author

@kylebarron should I remove type annotation, and review bug fix first?

@kylebarron
Copy link
Contributor

kylebarron commented Feb 12, 2025

Let's cc @timsaucer for thoughts. My opinion is that it would be simpler to first review a PR that contains only the bug fixes.

@timsaucer
Copy link
Contributor

It would be nice for separation of concerns to do bug fix in one PR and type checking in another. Overall the size of the PR isn't that unmanageable but it does touch a lot of points that I'd want to try out also with my internal tools that build on top of datafusion-python. I'll try to do this tomorrow or the day after.

@chenkovsky
Copy link
Contributor Author

chenkovsky commented Feb 16, 2025

Let's cc @timsaucer for thoughts. My opinion is that it would be simpler to first review a PR that contains only the bug fixes.

@kylebarron I have removed pyi files. please review it again

Copy link
Contributor

@timsaucer timsaucer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a few questions, but overall this looks like a very good improvement. Thank you for putting in the effort. It is much appreciated!

@@ -66,9 +66,10 @@ def __init__(self, table: df_internal.Table) -> None:
"""This constructor is not typically called by the end user."""
self.table = table

@property
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've become convinced that the current code is broken and that this is the right change. Thank you @chenkovsky for doing this. I do think we need to put some notes to that effect in the section of user facing changes of the PR description.

@@ -783,7 +783,9 @@ def register_parquet(
file_extension,
skip_metadata,
schema,
file_sort_order,
[[expr.raw_sort for expr in exprs] for exprs in file_sort_order]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use the same helper function sort_list_to_raw_sort_list that you use below to clean it up a tiny bit?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

updated

@@ -182,7 +182,7 @@ class AggregateUDF:

def __init__(
self,
name: Optional[str],
name: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the Optional[str] if we still have the logic below to allow None?

Copy link
Contributor Author

@chenkovsky chenkovsky Feb 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems that in rust binding, it's not optional

name: &str,

@@ -158,7 +158,7 @@ def state(self) -> List[pyarrow.Scalar]:
pass

@abstractmethod
def update(self, *values: pyarrow.Array) -> None:
def update(self, values: pyarrow.Array) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks wrong to me. I think we want/need to allow a variable number of values passed for udafs that have multiple state elements.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, My mistake

@chenkovsky
Copy link
Contributor Author

@timsaucer could you please review it again?

@timsaucer timsaucer merged commit 9027b4d into apache:main Mar 8, 2025
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants