如果有数字间隔，则计算分组中位数 - python

这是我的数据帧，其中有间隔编号（类）。

     df = pd.DataFrame({'Class': [1,2,3,4,5,6,7,8,9,10,11], 
               'Class Interval': ['16.25-18.75', '18.75-21.25', '21.25-23.75', 
                                  '23.75-26.25', '26.25-28.75', '28.75-31.25',
                                  '31.25-33.75', '33.75-36.25', '36.25-38.75',
                                  '38.75-41.25', '41.25-43.75'],
              '??' : [2,7,7,14,17,24,11,11,3,3,1],
              'Cumulative ??': [2,9,16,30,47,71,82,93,96,99,100],
              '??/n' : [.02,.07,.07,.14,.17,.24,.11,.11,.03,.03,.01],
              'Cumulative ??/n' : [.02, .09,.16,.30,.47,.71,.82,.93,.96,.99,1.00]})
    df

        Class   Class Interval   ??   Cumulative  ??     ??/?    Cumulative  ??/? 
    0   1       16.25-18.75      2          2            0.02   0.02
    1   2       18.75-21.25      7          9            0.07   0.09
    2   3       21.25-23.75      7         16            0.07   0.16
    3   4       23.75-26.25     14         30            0.14   0.30
    4   5       26.25-28.75     17         47            0.17   0.47
    5   6       28.75-31.25     24         71            0.24   0.71
    6   7       31.25-33.75     11         82            0.11   0.82
    7   8       33.75-36.25     11         93            0.11   0.93
    8   9       36.25-38.75     3          96            0.03   0.96
    9   10      38.75-41.25     3          99            0.03   0.99
    10  11      41.25-43.75     1         100            0.01   1.00

问题：如何使用python计算该数据帧的分组中位数？

可以手动完成，结果为29.06。

我尝试过'median_grouped'：

    # importing median_grouped from the statistics module 
    from statistics import median_grouped

    # printing median_grouped for the set 
    print("Grouped Median is %s" %(median_grouped(df['Class Interval'])))

但是我得到了错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-491000133032> in <module>
      4 
      5 # printing median_grouped for the set
----> 6 print("Grouped Median is %s" %(median_grouped(df['Class Interval'])))

~\Anaconda3\ANACONDA\lib\statistics.py in median_grouped(data, interval)
    463     for obj in (x, interval):
    464         if isinstance(obj, (str, bytes)):
--> 465             raise TypeError('expected number but got %r' % obj)
    466     try:
    467         L = x - interval/2  # The lower limit of the median interval.

TypeError: expected number but got '28.75-31.25'

比我尝试制作两列（一列的下限高，一列的上限高），但他只给了我下限（28.75）/上限中位数（31.25）。我也只尝试了下限，但是当然比他给的我也高28.75。

我没有间隔内的值，因此我无法重新创建要使用pd.cut剪切的值列表，并像这样正确地尝试（我不想猜测），但我也尝试过手动制作类间隔进入垃圾箱（例如16.25-18.25大于（16.25,18.25]，但后来我得到了错误消息：TypeError：无法排序的类型：Interval（）<float（）

是否有可能使间隔数字的列代替字符串，从而能够使用Python自动计算分组的Median？

参考方案

您可以重新创建包含相同统计信息（每个间隔的中间值*间隔的fi）的人工数据点的列表，然后在其中运行mean_grouped函数：

# Obtaining lower, upper and middle interval value
df['lower'] = df['Class Interval'].str.split('-', expand=True)[0].astype(float)
df['upper'] = df['Class Interval'].str.split('-', expand=True)[1].astype(float)
df['middle'] = (df['lower'] + df['upper'] ) / 2

# Generating an artificial list of values with the same statistical info
artificial_data_list = []
for index, row in df.iterrows():
  artificial_data_list.append([row['middle']]*row['??'])
flat_list = [item for sublist in artificial_data_list for item in sublist]

# Calcuating the right median with the statistics.mean_grouped function
median_grouped(flat_list,interval=2.5)   # Attention to the interval size!
# => 29.0625

R'relaimpo'软件包的Python端口 - python

我需要计算Lindeman-Merenda-Gold（LMG）分数，以进行回归分析。我发现R语言的relaimpo包下有该文件。不幸的是，我对R没有任何经验。我检查了互联网，但找不到。这个程序包有python端口吗？如果不存在，是否可以通过python使用该包？ python参考方案最近，我遇到了pingouin库。

将字符串分配给numpy.zeros数组[重复] - python

This question already has answers here: Weird behaviour initializing a numpy array of string data （4个答案） …

Python:传递记录器是个好主意吗？ - python

我的Web服务器的API日志如下：started started succeeded failed 那是同时收到的两个请求。很难说哪一个成功或失败。为了彼此分离请求，我为每个请求创建了一个随机数，并将其用作记录器的名称logger = logging.getLogger(random_number) 日志变成[111] started [222] start…

Python-Excel导出 - python

我有以下代码：import pandas as pd import requests from bs4 import BeautifulSoup res = requests.get("https://www.bankier.pl/gielda/notowania/akcje") soup = BeautifulSoup(res.cont…

Python:如何根据另一列元素明智地查找一列中的空单元格计数？ - python

df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice','Jane', 'Alice','Bob', 'Alice'], 'income…

如果有数字间隔，则计算分组中位数 - python

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在…