熊猫:彼此分隔两行 - python

这是我数据框中的两行:

>>> test.loc[test.index.year == 2009]
                     0         1           2           3           4  \
date                                                                   
2009-01-01  252.855283  353.6261  556.295659  439.558188  432.936844   

                     5           6  employment  
date                                            
2009-01-01  439.437132  433.269903   64.116667 

>>> test.loc[test.index.year == 2007]
                     0           1           2           3           4  \
date                                                                     
2007-01-01  269.277757  380.608002  401.765546  491.893821  433.864499   

                     5           6  employment  
date                                            
2007-01-01  492.396073  489.260588     69.1375

当我尝试divide时，我得到了

>>> test.loc[test.index.year == 2009].divide(test.loc[test.index.year == 2007])
             0   1   2   3   4   5   6  employment
date                                              
2007-01-01 NaN NaN NaN NaN NaN NaN NaN         NaN
2009-01-01 NaN NaN NaN NaN NaN NaN NaN         NaN

它来自pandas试图划分比较索引的列。但是，axis=中的任何选项都没有帮助我。我可以得到正确的结果

test.loc[test.index.year == 2009].values / test.loc[test.index.year == 2007].values
array([[ 0.93901288,  0.92910842,  1.38462759,  0.8936038 ,  0.99786188,
         0.89244646,  0.88556061,  0.92737902]])

有没有更好的方法可以做到这一点？我想保留与记录相对应的索引2007-01-01-当然，我可以将其重新附加到值上，但是通常当我尝试执行此类操作时，我就有办法了，然后才有正确的办法。那么:我还能做什么？

python大神给出的解决方案

如果您想保留2007年的索引，我认为您可以这样做:

df.loc[df.index.year == 2007]/df.loc[df.index.year == 2009].values

df.loc[df.index.year == 2007]/df.loc[df.index.year == 2009]或df.loc[df.index.year == 2007].divide(df.loc[df.index.year == 2009])不起作用的原因是pandas试图通过其索引对齐数据。在这种情况下，将发生的情况是将2007年的数据除以索引值为2007年的数据(适用于2009年)。这就是为什么要获得2，而不仅仅是1行Nan的原因。

因此，我们需要将其中一个强制转换为它们各自的np.array，以使其正常工作。 (df.loc[df.index.year == 2007]/df.loc[df.index.year == 2009].values)。由于分子的索引未被触及，因此保留。

@EdChum，我不认为这是一个错误，考虑到以下因素，我认为这是布尔索引的预期行为:

df.iloc[df.index.year>=2007]/df.loc[df.index.year == 2007]
             0   1   2   3   4   5   6  employment
date                                              
2007-01-01   1   1   1   1   1   1   1           1
2009-01-01 NaN NaN NaN NaN NaN NaN NaN         NaN

但是您应该谨慎使用此方法，因为您可能从布尔索引中获得多个行，请参见以下两个示例:

In [128]:

print df
                     0           1           2           3           4  \
2007-12-31  252.855283  353.626100  556.295659  439.558188  432.936844   
2008-12-31  269.277757  380.608002  401.765546  491.893821  433.864499   
2009-12-31  269.277757  380.608002  401.765546  491.893821  433.864499   

                     5           6          7  
2007-12-31  439.437132  433.269903  64.116667  
2008-12-31  492.396073  489.260588  69.137500  
2009-12-31  492.396073  489.260588  69.137500  
In [130]:

print df.iloc[df.index.year==2007]/df.loc[df.index.year >= 2007]
#divide one row by 3 rows? Dimension mismatch? No, it will work just fine.
             0   1   2   3   4   5   6   7
2007-12-31   1   1   1   1   1   1   1   1
2008-12-31 NaN NaN NaN NaN NaN NaN NaN NaN
2009-12-31 NaN NaN NaN NaN NaN NaN NaN NaN
In [131]:

df.iloc[df.index.year==2007]/df.loc[df.index.year >= 2007].values
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
**************
ValueError: Shape of passed values is (8, 3), indices imply (8, 1)
#basically won't work due to dimension mismatch

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在…

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在看。自己强行看了两个月，全部给看完了。感觉这文笔也就我读初中的水平……而且写着国内的一些情况，外国人能理解吗？这书为什么会这么火？这水平我也可以去写呀[笑哭][笑哭][笑哭] 招商银行员工：可以写赶紧写一个啊，能拿科幻文学雨果奖。包清白：哦楼主：pei ！tui ！你也配姓龙楼主：@赵龙王呵呵 […]