-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN: Use dedup_names in all instances where duplicate column names are renamed #50371
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
take |
Hi @muddi900 how are you going with this issue? |
You can take over if the maintainers allow. |
take |
Hi @RhysJohnLewis how are you going with this issue? |
@shteken please do. I have not had the time. |
take |
Hello @datapythonista. Is there any other files to change duplicate method names as you mentioned in #50370? Would you name a file for me to start working on it? |
It's possible that only that other function exists that do that. You can have a look at pandas functions that read data with possibly duplicated columns, or try to find a similar function with grep. But I'd start by just unifying those two functions for now. |
Ok. I search and if I find some functions, will show here if it will be ok or not. |
I took a look at folder pandas/io folder and think that I have to look at here. In excel and xml file formats may not be some duplicate columns, but when importing data from other formats, some duplicates may be found. Am I in the right direction? |
Can you simply unify the two existing implementations identified in the issue description into one for now? Probably there is nothing else to do for this issue, but even if another one exists, we can leave that for later. |
I read #50370 codes and comments. |
I reviewed some parts of code and debugged 'test_frame_non_unique_columns' and Should I look forward to other functions and check if |
Sorry the description is not clear enough, feel free to ask anything you need. If you check the diff in https://github.com/pandas-dev/pandas/pull/50370/files you will see a There may be small differences in both implementations (or maybe not), we can discuss after you give it a try and see the exact problems/differences if any. |
I added a sample test and check the code. I did not pushed to make my code cleaner and ask you if it is ok.
and also commented those lines you mentioned before and check the results and added calling
|
Can I take a jab at this? @datapythonista @hamedgibago |
Certainly, no problem. I commented the code above and added new line as you can see in the first line, despite results of current tests were ok, but others failed. I should spend some time to debug. |
Thanks @hamedgibago , I'll try independently when I get some time :) |
@datapythonista the two implementations are definitely different. One approach names columns as [col, col.1, col.1.1] while the other one names it as [col, col.1, col.2] . Need your input. Should we make changes in all the tests or do we change the implementation of dedup_names ? |
As far as I know, we are not make any changes to existing tests unless we find a bug and inform it to maintainer. After changing the code, we can add new tests and also make sure all other tests will pass. |
@hamedgibago I think it would really depend. Some tests are already present that consider the output from the custom method and not |
@datapythonista What is your idea? |
@hamedgibago @datapythonista Is this still in the works? Is this free? |
Its long time I do not working on it. I have to check it. |
merry christmas everyone
…On Sat, Dec 21, 2024 at 2:41 PM Reza Akraminejad ***@***.***> wrote:
@hamedgibago <https://github.com/hamedgibago> @datapythonista
<https://github.com/datapythonista> Is this still in the works? Is this
free?
Its long time I do not working on it. I have to check it.
—
Reply to this email directly, view it on GitHub
<#50371 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AWSN7OQ77H2V6YXSSGAR2LL2GUEQBAVCNFSM6AAAAABTTNDWGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDKNJYGAZDGNBVG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
@hamedgibago @datapythonista Is this issue available to be taken? This would be my first issue to work on. It looks like the function was made however it needs to be called as a replacement for some other code attempting to do the same thing elsewhere as noted with 'TODO' in #50370 |
In #50370 the function
dedup_names
has been moved topandas.io.common
so it can be reused by any reader dealing with duplicate column names. The function can be expanded in the future to allow custom renaming patterns, so it should be used by any reader, to make sure we keep consistency with the behavior (as well as avoid duplicate code). There is at least one instance identified in #50370 where a different implementation is used to rename the duplicate columns. We should calldedup_names
instead, and in case other alternative implementations exist, find them and also calldedup_names
.The text was updated successfully, but these errors were encountered: