这是我数据框中的两行:
>>> test.loc[test.index.year == 2009]
0 1 2 3 4 \
date
2009-01-01 252.855283 353.6261 556.295659 439.558188 432.936844
5 6 employment
date
2009-01-01 439.437132 433.269903 64.116667
>>> test.loc[test.index.year == 2007]
0 1 2 3 4 \
date
2007-01-01 269.277757 380.608002 401.765546 491.893821 433.864499
5 6 employment
date
2007-01-01 492.396073 489.260588 69.1375
当我尝试divide
时,我得到了
>>> test.loc[test.index.year == 2009].divide(test.loc[test.index.year == 2007])
0 1 2 3 4 5 6 employment
date
2007-01-01 NaN NaN NaN NaN NaN NaN NaN NaN
2009-01-01 NaN NaN NaN NaN NaN NaN NaN NaN
它来自pandas
试图划分比较索引的列。但是,axis=
中的任何选项都没有帮助我。我可以得到正确的结果
test.loc[test.index.year == 2009].values / test.loc[test.index.year == 2007].values
array([[ 0.93901288, 0.92910842, 1.38462759, 0.8936038 , 0.99786188,
0.89244646, 0.88556061, 0.92737902]])
有没有更好的方法可以做到这一点?我想保留与记录相对应的索引2007-01-01
-当然,我可以将其重新附加到值上,但是通常当我尝试执行此类操作时,我就有办法了,然后才有正确的办法。那么:我还能做什么?
python大神给出的解决方案
如果您想保留2007年的索引,我认为您可以这样做:
df.loc[df.index.year == 2007]/df.loc[df.index.year == 2009].values
df.loc[df.index.year == 2007]/df.loc[df.index.year == 2009]
或df.loc[df.index.year == 2007].divide(df.loc[df.index.year == 2009])
不起作用的原因是pandas
试图通过其索引对齐数据。在这种情况下,将发生的情况是将2007年的数据除以索引值为2007年的数据(适用于2009年)。这就是为什么要获得2,而不仅仅是1行Nan
的原因。
因此,我们需要将其中一个强制转换为它们各自的np.array
,以使其正常工作。 (df.loc[df.index.year == 2007]/df.loc[df.index.year == 2009].values
)。由于分子的索引未被触及,因此保留。
@EdChum,我不认为这是一个错误,考虑到以下因素,我认为这是布尔索引的预期行为:
df.iloc[df.index.year>=2007]/df.loc[df.index.year == 2007]
0 1 2 3 4 5 6 employment
date
2007-01-01 1 1 1 1 1 1 1 1
2009-01-01 NaN NaN NaN NaN NaN NaN NaN NaN
但是您应该谨慎使用此方法,因为您可能从布尔索引中获得多个行,请参见以下两个示例:
In [128]:
print df
0 1 2 3 4 \
2007-12-31 252.855283 353.626100 556.295659 439.558188 432.936844
2008-12-31 269.277757 380.608002 401.765546 491.893821 433.864499
2009-12-31 269.277757 380.608002 401.765546 491.893821 433.864499
5 6 7
2007-12-31 439.437132 433.269903 64.116667
2008-12-31 492.396073 489.260588 69.137500
2009-12-31 492.396073 489.260588 69.137500
In [130]:
print df.iloc[df.index.year==2007]/df.loc[df.index.year >= 2007]
#divide one row by 3 rows? Dimension mismatch? No, it will work just fine.
0 1 2 3 4 5 6 7
2007-12-31 1 1 1 1 1 1 1 1
2008-12-31 NaN NaN NaN NaN NaN NaN NaN NaN
2009-12-31 NaN NaN NaN NaN NaN NaN NaN NaN
In [131]:
df.iloc[df.index.year==2007]/df.loc[df.index.year >= 2007].values
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
**************
ValueError: Shape of passed values is (8, 3), indices imply (8, 1)
#basically won't work due to dimension mismatch