Skip to content

Add remaining non-wrapped functions #767

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
timsaucer opened this issue Jul 22, 2024 · 4 comments
Open

Add remaining non-wrapped functions #767

timsaucer opened this issue Jul 22, 2024 · 4 comments
Labels
enhancement New feature or request

Comments

@timsaucer
Copy link
Contributor

Is your feature request related to a problem or challenge? Please describe what you are trying to do.

We still have a few classes that do not yet have wrapper functions. Namely datafusion.object_store and datafusion.common. Additionally in datafusion.substrait we reference LogicalPlan that is not exposed.

Also worth doing is reviewing the excellent PR #751 to see how it now fits in with the updated python wrappers.

Describe the solution you'd like
Add missing wrappers and validate namespace corrections

Describe alternatives you've considered
None

Additional context
This is follow on work to #750

@timsaucer timsaucer added the enhancement New feature or request label Jul 22, 2024
@Michael-J-Ward
Copy link
Contributor

Question: Have you ever used or do you know of a tool to run queries over python / rust codebases?

It would be nice if we could generate a concrete report of what is not exposed.

@timsaucer
Copy link
Contributor Author

timsaucer commented Jul 26, 2024

No, but I did write a small script to check and this is what I see missing:

Missing attribute. Object name: datafusion, Attribute name: Catalog
Missing attribute. Object name: datafusion, Attribute name: Database
Missing attribute. Object name: datafusion, Attribute name: ExecutionPlan
Missing attribute. Object name: datafusion, Attribute name: LogicalPlan
Missing attribute. Object name: datafusion, Attribute name: RecordBatch
Missing attribute. Object name: datafusion, Attribute name: RecordBatchStream
Missing attribute. Object name: datafusion, Attribute name: Table
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: runtime
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: Catalog
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: Database
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: Table
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: AggregateUDF
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: LogicalPlan
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: ExecutionPlan
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: RecordBatch
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: RecordBatchStream
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: common
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: expr
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: functions
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: object_store
Missing value in list. Object name: datafusion, Attribute name: __all__, Value: substrait
Missing attribute. Object name: datafusion.common, Attribute name: DFSchema
Missing attribute. Object name: datafusion.common, Attribute name: DataType
Missing attribute. Object name: datafusion.common, Attribute name: DataTypeMap
Missing attribute. Object name: datafusion.common, Attribute name: NullTreatment
Missing attribute. Object name: datafusion.common, Attribute name: PythonType
Missing attribute. Object name: datafusion.common, Attribute name: RexType
Missing attribute. Object name: datafusion.common, Attribute name: SqlFunction
Missing attribute. Object name: datafusion.common, Attribute name: SqlSchema
Missing attribute. Object name: datafusion.common, Attribute name: SqlStatistics
Missing attribute. Object name: datafusion.common, Attribute name: SqlTable
Missing attribute. Object name: datafusion.common, Attribute name: SqlType
Missing attribute. Object name: datafusion.common, Attribute name: SqlView
Missing attribute. Object name: datafusion.common, Attribute name: __all__
Missing attribute. Object name: datafusion.expr, Attribute name: EmptyRelation
Missing attribute. Object name: Expr, Attribute name: __radd__
Missing attribute. Object name: Expr, Attribute name: __rand__
Missing attribute. Object name: Expr, Attribute name: __rmod__
Missing attribute. Object name: Expr, Attribute name: __rmul__
Missing attribute. Object name: Expr, Attribute name: __ror__
Missing attribute. Object name: Expr, Attribute name: __rsub__
Missing attribute. Object name: Expr, Attribute name: __rtruediv__
Missing attribute. Object name: datafusion.expr, Attribute name: IsNull
Missing attribute. Object name: datafusion.expr, Attribute name: Unnest
Missing attribute. Object name: datafusion.expr, Attribute name: Window
Missing attribute. Object name: datafusion.expr, Attribute name: __all__
Missing attribute. Object name: datafusion.functions, Attribute name: __all__
Missing attribute. Object name: datafusion.object_store, Attribute name: AmazonS3
Missing attribute. Object name: datafusion.object_store, Attribute name: GoogleCloud
Missing attribute. Object name: datafusion.object_store, Attribute name: LocalFileSystem
Missing attribute. Object name: datafusion.object_store, Attribute name: MicrosoftAzure
Missing attribute. Object name: datafusion.object_store, Attribute name: __all__
Missing attribute. Object name: datafusion, Attribute name: runtime
Missing attribute. Object name: datafusion.substrait, Attribute name: __all__

Code to generate:

import datafusion
import datafusion.functions
import datafusion.object_store
import datafusion.substrait

def missing_exports(internal_obj, wrapped_obj):
    for attr in dir(internal_obj):
        if attr not in dir(wrapped_obj):
            print(f"Missing attribute. Object name: {wrapped_obj.__name__}, Attribute name: {attr}")
            continue
        internal_attr = getattr(internal_obj, attr)
        wrapped_attr = getattr(wrapped_obj, attr)
        if internal_attr is not None and wrapped_attr is None:
            print(f"Attribute exists but is None. Object name: {wrapped_obj.__name__}, Attribute name: {attr}")
        
        if attr in ["__self__", "__class__"]:
            continue
        if isinstance(internal_attr, list):
            for val in internal_attr:
                if val not in wrapped_attr:
                    print(f"Missing value in list. Object name: {wrapped_obj.__name__}, Attribute name: {attr}, Value: {val}")
        elif hasattr(internal_attr, '__dict__'):
            missing_exports(internal_attr, wrapped_attr)

missing_exports(datafusion._internal, datafusion)

I can work on adding these tomorrow morning and I can also add this code as a unit test.

@timsaucer
Copy link
Contributor Author

FWIW I don't know if all of these need to be exported. It's probably worth looking through each one.

@Spaarsh
Copy link
Contributor

Spaarsh commented Mar 6, 2025

Just an update on this one. Upon running the script, this is the output:

Missing attribute. Object name: DataFrame, Attribute name: format_column_name
Missing attribute. Object name: NullTreatment, Attribute name: __delattr__
Missing attribute. Object name: NullTreatment, Attribute name: __dir__
Missing attribute. Object name: NullTreatment, Attribute name: __eq__
Missing attribute. Object name: NullTreatment, Attribute name: __format__
Missing attribute. Object name: NullTreatment, Attribute name: __ge__
Missing attribute. Object name: NullTreatment, Attribute name: __getattribute__
Missing attribute. Object name: NullTreatment, Attribute name: __gt__
Missing attribute. Object name: NullTreatment, Attribute name: __hash__
Missing attribute. Object name: NullTreatment, Attribute name: __init__
Missing attribute. Object name: NullTreatment, Attribute name: __init_subclass__
Missing attribute. Object name: NullTreatment, Attribute name: __int__
Missing attribute. Object name: NullTreatment, Attribute name: __le__
Missing attribute. Object name: NullTreatment, Attribute name: __lt__
Missing attribute. Object name: NullTreatment, Attribute name: __ne__
Missing attribute. Object name: NullTreatment, Attribute name: __new__
Missing attribute. Object name: NullTreatment, Attribute name: __reduce__
Missing attribute. Object name: NullTreatment, Attribute name: __reduce_ex__
Missing attribute. Object name: NullTreatment, Attribute name: __repr__
Missing attribute. Object name: NullTreatment, Attribute name: __setattr__
Missing attribute. Object name: NullTreatment, Attribute name: __sizeof__
Missing attribute. Object name: NullTreatment, Attribute name: __str__
Missing attribute. Object name: NullTreatment, Attribute name: __subclasshook__

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants