您如何将一个迭代器一分为二而不进行两次迭代或使用额外的内存来存储所有数据?
可以将所有内容存储在内存中的解决方案:
l = [{'a': i, 'b': i * 2} for i in range(10)]
def a(iterator):
for item in iterator:
print(item)
def b(iterator):
for item in iterator:
print(item)
a([li['a'] for li in l])
b([li['b'] for li in l])
或者如果您可以重复两次,
class SomeIterable(object):
def __iter__(self):
for i in range(10):
yield {'a': i, 'b': i * 2}
def a(some_iterator):
for item in some_iterator:
print(item)
def b(some_iterator):
for item in some_iterator:
print(item)
s = SomeIterable()
a((si['a'] for si in s))
b((si['b'] for si in s))
但是,如果我只想迭代一次,该怎么做呢?
python大神给出的解决方案
根据注释的说明,a
和b
是您无法重写的外部库函数,但是可以交错执行。在那种情况下,您想要的是可能的,但是几乎需要线程:
import multiprocessing.pool # for ThreadPool, not multiprocessing
import Queue
_endofinput = object()
def _queueiter(queue):
while True:
item = queue.get()
if item is _endofinput:
break
yield item
def parallel_execute(funcs, iterable, maxqueue):
'''Interleaves the execution of funcs[0](iterable), funcs[1](iterable), etc.
No function is allowed to lag more than maxqueue items behind another.
(This will require adjustment if a function might return before consuming
all input.)
Makes only one pass over iterable.
'''
queues = [Queue.Queue(maxsize=maxqueue) for func in funcs]
queueiters = [_queueiter(queue) for queue in queues]
threadpool = multiprocessing.pool.ThreadPool(processes=len(funcs))
results = threadpool.map_async(lambda (f, x): f(x), zip(funcs, queueiters))
for item in iterable:
for queue in queues:
queue.put(item)
for queue in queues:
queue.put(_endofinput)
threadpool.close()
return results.get()