为什么在for循环中将单词从复数形式转换为单数形式会花费这么长时间(Python 3)？ - python

这是我的代码，用于从CSV文件读取文本并将一列中的所有单词从复数形式转换为单数形式：

import pandas as pd
from textblob import TextBlob as tb
data = pd.read_csv(r'path\to\data.csv')

for i in range(len(data)):
    blob = tb(data['word'][i])
    singular = blob.words.singularize()  # This makes singular a list
    data['word'][i] = ''.join(singular)  # Converting the list back to a string

但是这段代码现在已经运行了几分钟（如果我不停止的话，可能还要运行几个小时？）！这是为什么？当我逐个检查几个单词时，转换立即发生-完全不需要任何时间。文件中只有1060行（要转换的字）。

编辑：它在大约10-12分钟内完成运行。

以下是一些示例数据：

输入：

word
development
investment
funds
slow
company
commit
pay
claim
finances
customers
claimed
insurance
comment
rapid
bureaucratic
affairs
reports
policyholders
detailed

输出：

word
development
investment
fund
slow
company
commit
pay
claim
finance
customer
claimed
insurance
comment
rapid
bureaucratic
affair
report
policyholder
detailed

参考方案

那这个呢？

In [1]: import pandas as pd

In [2]: from textblob import Word

In [3]: s = pd.read_csv('text', squeeze=True, memory_map=True)

In [4]: type(s)
Out[4]: pandas.core.series.Series

In [5]: s = s.apply(lambda w: Word(w).singularize())

In [6]: s
Out[6]:
0      development
1       investment
2             fund
3             slow
4          company
5           commit
6              pay
7            claim
8          finance
9         customer
10         claimed
11       insurance
12         comment
13           rapid
14    bureaucratic
15          affair
16          report
17    policyholder
18        detailed
Name: word, dtype: object

我在这里使用squeeze让read_csv返回Series而不是DataFrame，因为word文件只有一列。此外，如果单词文件很大，可以使用memory_map。

您可以使用数据测试性能吗？

如何在Linux上安装2个Anacondas(Python 2.7和3.5)？ - python

我想使用Python 2和3版本。我已经读过有关conda环境的用法，但是不断向终端source (de)activate py27写入内容似乎不方便。如picture所示，如何使用命令选择内核版本？参考方案您在该图像中寻找的是Jupyter Notebook。您需要使用Jupyter和所需的python版本创建环境：conda create -n py…

Python GPU资源利用 - python

我有一个Python脚本在某些深度学习模型上运行推理。有什么办法可以找出GPU资源的利用率水平？例如，使用着色器，float16乘法器等。我似乎在网上找不到太多有关这些GPU资源的文档。谢谢！参考方案您可以尝试在像Renderdoc这样的GPU分析器中运行pyxthon应用程序。它将分析您的跑步情况。您将能够获得有关已使用资源，已用缓冲区，不同渲染状态上…

Python:图像处理可产生皱纹纸效果 - python

也许很难描述我的问题。我正在寻找Python中的算法，以在带有某些文本的白色图像上创建皱纹纸效果。我的第一个尝试是在带有文字的图像上添加一些真实的皱纹纸图像（具有透明度）。看起来不错，但副作用是文本没有真正起皱。所以我正在寻找更好的解决方案，有什么想法吗？谢谢参考方案除了使用透明性之外，假设您有两张相同尺寸的图像，一张在皱纹纸上明亮，一张在白色背景上有深…

如何在“后台”中运行脚本的一部分(单个函数)？ - python

我在具有以下基本结构（伪代码）的服务器上运行python脚本：for data_item in data_items: processed_result=process_data(data_item); #this takes time T0 upload_result_to_site(processed_result) #this takes time T…

Python uuid4，如何限制唯一字符的长度 - python

在Python中，我正在使用uuid4（）方法创建唯一的字符集。但是我找不到将其限制为10或8个字符的方法。有什么办法吗？uuid4()ffc69c1b-9d87-4c19-8dac-c09ca857e3fc谢谢。参考方案尝试：x = uuid4() str(x)[:8] 输出："ffc69c1b" Is there a way to…

为什么在for循环中将单词从复数形式转换为单数形式会花费这么长时间(Python 3)？ - python

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在…