Skip to content

add unit tests for expression functions #1121

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 17, 2025

Conversation

timsaucer
Copy link
Contributor

Which issue does this PR close?

Follow on to #1116

Rationale for this change

We don't have code coverage for these expression functions

What changes are included in this PR?

Unit tests

Are there any user-facing changes?

None

@timsaucer
Copy link
Contributor Author

FYI @deanm0000

@timsaucer
Copy link
Contributor Author

@mesejo @kosiew @chenkovsky @kevinjqliu Would any of you mind doing a review? We haven't released datafusion-python in a while and this is one of the PRs waiting to go in. 3/3

Copy link
Contributor

@kosiew kosiew left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

I added some suggestions.

Comment on lines 719 to 720
def test_expr_functions(ctx, function, expected_result):
ctx = SessionContext()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second line can be deleted since ctx is available as a fixture

),
pytest.param(
# Valid values of acosh must be >= 1.0
functions.round((col("a").abs() + lit(1.0)).abs().acosh(), lit(4)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second abs() is redundant since lit(1.0)) is already ≥ 0.
This can be simplified to:
functions.round((col("a").abs() + lit(1.0)).acosh(), lit(4))

result = df.collect()

assert len(result) == 1
assert result[0].column(0) == expected_result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directly using assert result[0].column(0) == expected_result can be brittle for NaN or infinite values. Prefer Arrow’s built-in equality check:

assert result[0].column(0).equals(expected_result)

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

],
)
def test_expr_functions(ctx, function, expected_result):
ctx = SessionContext()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx = SessionContext()

result = df.collect()

assert len(result) == 1
assert result[0].column(0) == expected_result
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
assert result[0].column(0) == expected_result
assert result[0].column(0).equals(expected_result)

@timsaucer
Copy link
Contributor Author

Thank you all! Great suggestions.

@timsaucer timsaucer merged commit 1e7494b into apache:main May 17, 2025
16 of 17 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants