BUG: Using pd.concat(axis="columns") on differently sized MultiIndexed DataFrames with a datetime index level containing exclusively NaT values causes the level in the returned DataFrame to be a float instead of a datetime #44900

Xnot · 2021-12-15T14:06:24Z

I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
(optional) I have confirmed this bug exists on the master branch of pandas.

import pandas as pd

print(pd.__version__)
df_a = pd.DataFrame({"a": range(5), "idx1": range(5), "idx2": [pd.NaT] * 5}).set_index(["idx1", "idx2"])
df_b = pd.DataFrame({"b": range(6), "idx1": range(6), "idx2": [pd.NaT] * 6}).set_index(["idx1", "idx2"])
df = pd.concat([df_a, df_b], axis="columns")
df.reset_index().info()

On pandas 1.3.5:

1.3.5
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   idx1    6 non-null      int64  
 1   idx2    0 non-null      float64
 2   a       5 non-null      float64
 3   b       6 non-null      int64  
dtypes: float64(2), int64(2)
memory usage: 320.0 bytes

On pandas 1.2.5:

1.2.5
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 6 entries, 0 to 5
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   idx1    6 non-null      int64         
 1   idx2    0 non-null      datetime64[ns]
 2   a       5 non-null      float64       
 3   b       6 non-null      int64         
dtypes: datetime64[ns](1), float64(1), int64(2)
memory usage: 320.0 bytes

1.2.5 has the correct expected output. In 1.3.5, the datetime NaTs index has been converted to float NaNs. The bug only occurs if the DataFrames have different lengths and the index level in question contains only NaTs.

simonjayhawkins · 2021-12-16T12:25:16Z

1.2.5 has the correct expected output.

first bad commit: [0b671be] REF: unify casting logic in Categorical.init (#40097)

phofl mentioned this issue Dec 15, 2021

Bug in concat casting all na levels to float #44902

Merged

4 tasks

rhshadrach added Bug Regression Functionality that used to work in a prior pandas version Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 16, 2021

rhshadrach added this to the 1.4 milestone Dec 16, 2021

simonjayhawkins added a commit to simonjayhawkins/pandas that referenced this issue Dec 16, 2021

code sample for pandas-dev#44900

05b8503

jreback closed this as completed in #44902 Dec 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Using pd.concat(axis="columns") on differently sized MultiIndexed DataFrames with a datetime index level containing exclusively NaT values causes the level in the returned DataFrame to be a float instead of a datetime #44900

BUG: Using pd.concat(axis="columns") on differently sized MultiIndexed DataFrames with a datetime index level containing exclusively NaT values causes the level in the returned DataFrame to be a float instead of a datetime #44900

Xnot commented Dec 15, 2021

simonjayhawkins commented Dec 16, 2021

BUG: Using pd.concat(axis="columns") on differently sized MultiIndexed DataFrames with a datetime index level containing exclusively NaT values causes the level in the returned DataFrame to be a float instead of a datetime #44900

BUG: Using pd.concat(axis="columns") on differently sized MultiIndexed DataFrames with a datetime index level containing exclusively NaT values causes the level in the returned DataFrame to be a float instead of a datetime #44900

Comments

Xnot commented Dec 15, 2021

simonjayhawkins commented Dec 16, 2021