遍历特定列表的最快方法？ - python

假设我有一个清单：

list=['plu;ean;price;quantity','plu1;ean1;price1;quantity1']

我想遍历列表+用“;”分隔列表并放置一个if子句，像这样：

for item in list:
    split_item=item.split(";")
    if split_item[0] == "string_value" or split_item[1] == "string_value":
        do something.....

我想知道这是否是最快的方法？假设我的初始列表要大得多（列表项很多）。我尝试了列表理解：

item=[item.split(";") for item in list if item.split(";")[0] == "string_value" or item.split(";")[1] == "string_value"]

但这实际上给了我较慢的结果。第一种情况平均给我90毫秒，第二种情况平均给我130毫秒。
我对清单的理解是否做错了？有更快的解决方案吗？

参考方案

编辑：事实证明，正则表达式缓存对竞争对手有点不公平。我的错。正则表达式仅快一小部分。

如果您正在寻找速度，hcwhsa的答案应该足够好。如果需要更多，请查看re。

import re
from itertools import chain

lis = ['plu;ean;price;quantity'*1000, 'plu1;ean1;price1;quantity1'*100]*1000

matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match
[l.split(';') for l in lis if matcher(l)]

主要用于正面结果的时间（aka。split是缓慢的主要原因）：

SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match

lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)] + ['plu;ean;price;quantity' for i in range(10000)]
"

python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"

我们看到我的快了一点。

10 loops, best of 3: 55 msec per loop
10 loops, best of 3: 49.5 msec per loop

对于大多数负面结果（大多数内容已过滤）：

SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match

lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(1000)] + ['plu;ean;price;quantity' for i in range(10000)]
"

python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"

领先优势更高。

10 loops, best of 3: 40.9 msec per loop
10 loops, best of 3: 35.7 msec per loop

如果结果将始终是唯一的，请使用

next([x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean')

或更快的Regex版本

next(filter(matcher, lis)).split(';')

（在Python 2上使用itertools.ifilter）。

时间：

SETUP="
import re
from itertools import chain
matcher = re.compile('^(?:plu(?:;|$)|[^;]*;ean(?:;|$))').match

lis = ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)] + ['plu;ean;price;quantity'] + ['plu1;ean1;price1;quantity1'+chr(i) for i in range(10000)]
"

python -m timeit -s "$SETUP" "[[x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean']"
python -m timeit -s "$SETUP" "next([x] + [y] + z.split(';') for x, y, z in (item.split(';', 2) for item in lis) if x== 'plu' or y=='ean')"

python -m timeit -s "$SETUP" "[l.split(';') for l in lis if matcher(l)]"
python -m timeit -s "$SETUP" "next(filter(matcher, lis)).split(';')"

结果：

10 loops, best of 3: 31.3 msec per loop
100 loops, best of 3: 15.2 msec per loop
10 loops, best of 3: 28.8 msec per loop
100 loops, best of 3: 14.1 msec per loop

因此，这极大地促进了这两种方法。

用大写字母拆分字符串，但忽略AAA Python Regex - python

我的正则表达式：vendor = "MyNameIsJoe. I'mWorkerInAAAinc." ven = re.split(r'(?<=[a-z])[A-Z]|[A-Z](?=[a-z])', vendor) 以大写字母分割字符串，例如：'我的名字是乔。 I'mWorkerInAAAinc”变成…

在返回'Response'(Python)中传递多个参数 - python

我在Angular工作，正在使用Http请求和响应。是否可以在“响应”中发送多个参数。角度文件：this.http.get("api/agent/applicationaware").subscribe((data:any)... python文件：def get(request): ... return Response(seriali…

R'relaimpo'软件包的Python端口 - python

我需要计算Lindeman-Merenda-Gold（LMG）分数，以进行回归分析。我发现R语言的relaimpo包下有该文件。不幸的是，我对R没有任何经验。我检查了互联网，但找不到。这个程序包有python端口吗？如果不存在，是否可以通过python使用该包？ python参考方案最近，我遇到了pingouin库。

字符串文字中的正斜杠表现异常 - python

为什么S1和S2在撇号位置方面表现不同？S1="1/282/03/10" S2="4/107/03/10" R1="".join({"N\'" ,S1,"\'" }) R2="".join({"N\'…

查找字符串中的行数 - python

我正在创建一个python电影播放器/制作器，我想在多行字符串中找到行数。我想知道是否有任何内置函数或可以编写代码的函数来做到这一点：x = """ line1 line2 """ getLines(x) python大神给出的解决方案如果换行符是'\n'，则nlines …

遍历特定列表的最快方法？ - python

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在…