熊猫灵活确定指标 - python

假设我们在熊猫中有不同的数据帧结构

# creating the first dataframe 
df1 = pd.DataFrame({
  "width": [1, 5], 
  "height": [5, 8]})

# creating second dataframe
df2 = pd.DataFrame({
  "a": [7, 8], 
  "b": [11, 23],
  "c": [1, 3]})

# creating second dataframe
df3 = pd.DataFrame({
  "radius": [7, 8], 
  "height": [11, 23]})

通常，可能有两个以上的数据帧。现在，我想创建一个将列名称映射到特定函数的逻辑，以创建新的列“度量”（将其视为两列的面积和三列的体积）。我想指定列名称集成

column_name_ensembles = {
    "1": {
       "ensemble": ['height', 'width'],
       "method": area},
    "2": {
       "ensemble": ['a', 'b', 'c'],
       "method": volume_cube},
    "3": {
       "ensemble": ['radius', 'height'],
       "method": volume_cylinder}}

def area(width, height):
    return width * height

def volume_cube(a, b, c):
    return a * b * c

def volume_cylinder(radius, height):
    return (3.14159 * radius ** 2) * height

现在，area函数将为数据框df1['metric'] = df1['height'] * df2['widht']创建一个新列，而volumen函数将为数据框df2['metic'] = df2['a'] * df2['b'] * df2['c']创建一个新列。请注意，函数可以具有任意形式，但是它将集合作为参数。所需的功能metric(df, column_name_ensembles)应采用任意数据帧作为输入，并通过检查列名来决定应应用哪个功能。

输入输出行为示例

df1_with_metric = metric(df1, column_name_ensembles)
print(df1_with_metric)
# output
#    width height metric
#  0 1     5      5 
#  1 5     8      40
df2_with_metric = metric(df2, column_name_ensembles)
print(df2_with_metric)
# output
#    a  b  c  metric
#  0 7  11 1  77
#  1 8  23 3  552
df3_with_metric = metric(df3, column_name_ensembles)
print(df3_with_metric)
# output
#    radius  height  metric
#  0 7       11      1693.31701
#  1 8       23      4624.42048

理想的解决方案是将数据框和column_name_ensembles作为参数并返回添加了适当“度量”的数据框的函数。

我知道可以通过多个if和else语句来实现，但这似乎并不是最明智的解决方案。也许有一种设计模式可以解决这个问题，但是我不是设计模式的专家。

感谢您阅读我的问题！我期待着您的好答案。

参考方案

您可以使用inspect模块自动提取参数名称，然后将参数名称的frozenset直接映射到度量标准函数：

import inspect

metrics = {
    frozenset(inspect.signature(f).parameters): f
    for f in (area, volume_cube, volume_cylinder)
}

然后对于给定的数据帧，如果保证所有列都是相关度量的参数，则可以简单地查询该字典：

def apply_metric(df, metrics):
    metric = metrics[frozenset(df.columns)]
    args = tuple(df[p] for p in inspect.signature(metric).parameters)
    df['metric'] = metric(*args)
    return df

如果输入数据框的列数超过度量标准功能所需的列，则可以使用集合交集查找相关度量标准：

def apply_metric(df, metrics):
    for parameters, metric in metrics.items():
        if parameters & set(df.columns) == parameters:
            args = tuple(df[p] for p in inspect.signature(metric).parameters)
            df['metric'] = metric(*args)
            break
    else:
        raise ValueError(f'No metric found for columns {df.columns}')
    return df

用大写字母拆分字符串，但忽略AAA Python Regex - python

我的正则表达式：vendor = "MyNameIsJoe. I'mWorkerInAAAinc." ven = re.split(r'(?<=[a-z])[A-Z]|[A-Z](?=[a-z])', vendor) 以大写字母分割字符串，例如：'我的名字是乔。 I'mWorkerInAAAinc”变成…

R'relaimpo'软件包的Python端口 - python

我需要计算Lindeman-Merenda-Gold（LMG）分数，以进行回归分析。我发现R语言的relaimpo包下有该文件。不幸的是，我对R没有任何经验。我检查了互联网，但找不到。这个程序包有python端口吗？如果不存在，是否可以通过python使用该包？ python参考方案最近，我遇到了pingouin库。

查找字符串中的行数 - python

我正在创建一个python电影播放器/制作器，我想在多行字符串中找到行数。我想知道是否有任何内置函数或可以编写代码的函数来做到这一点：x = """ line1 line2 """ getLines(x) python大神给出的解决方案如果换行符是'\n'，则nlines …

字符串文字中的正斜杠表现异常 - python

为什么S1和S2在撇号位置方面表现不同？S1="1/282/03/10" S2="4/107/03/10" R1="".join({"N\'" ,S1,"\'" }) R2="".join({"N\'…

如何用'-'解析字符串到节点js本地脚本？ - python

我正在使用本地节点js脚本来处理字符串。我陷入了将'-'字符串解析为本地节点js脚本的问题。render.js：#! /usr/bin/env -S node -r esm let argv = require('yargs') .usage('$0 [string]') .argv; console.log(argv…

熊猫灵活确定指标 - python

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在…