为什么用universal_newlines打开子进程会导致Unicode解码异常？ - python

我正在使用subprocess模块来运行子作业，并使用subprocess.PIPE收集其输出和错误流。为避免死锁，我不断在单独的线程中读取这些流。这是可行的，但有时程序会由于解码问题而崩溃:

`UnicodeDecodeError:'ascii'编解码器无法解码位置483的字节0xe2:序数不在范围内(128

从高层次上讲，我知道Python可能正在尝试使用ASCII编解码器将其转换为字符串，并且我需要在某个地方调用解码，但我不确定在哪里。创建子流程作业时，我将Universal_newlines指定为True。我认为这意味着将stdout / stderr作为unicode而不是二进制返回:

self.p = subprocess.Popen(self.command, shell=self.shell, stdout=subprocess.PIPE, stderr=subprocess.PIPE, universal_newlines=True)

崩溃发生在我的阅读线程函数中:

def standardOutHandler(standardOut):
    # Crash happens on the following line:
    for line in iter(standardOut.readline, ''):
       writerLock.acquire()
       stdout_file.write(line)
       if self.echoOutput:
           sys.stdout.write(line)
           sys.stdout.flush()
       writerLock.release()

目前尚不清楚为什么readline在这里抛出解码异常。如我所说，我以为Universal_newlines为true已经返回了我解码后的数据。

这是怎么回事，我该怎么做才能纠正此问题？

这是完整的追溯

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 920, in _bootstrap_inner
self.run()
  File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/threading.py", line 868, in run
self._target(*self._args, **self._kwargs)
  File "/Users/lzrd/my_process.py", line 61, in standardOutHandler
for line in iter(standardOut.readline, ''):
  File "/Users/lzrd/Envs/my_env/bin/../lib/python3.4/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 483: ordinal not in range(128)

python大神给出的解决方案

如果使用universal_newlines=True，则会使用系统上应为locale.getpreferredencoding(False)的utf-8字符编码将字节流解码为Unicode(检查LANG，LC_CTYPE，LC_ALL envvars)。

如果异常仍然存在；尝试使用空循环主体的代码:

for line in standardOut: #NOTE: no need to use iter() idiom here on Python 3
    pass

如果仍然收到异常，则在locale.getpreferredencoding(False)调用附近检查ascii不是Popen()时，可能是Python中的错误-在此处使用完全相同的环境很重要。

我会理解UnicodeDecodeError是否显示utf-8而不是ascii。在这种情况下，您可以尝试手动解码流:

#!/usr/bin/env python3
import io
import locale
from subprocess import Popen, PIPE

with Popen(['command', 'arg 1'], stdout=PIPE, bufsize=1) as p:
    for line in io.TextIOWrapper(p.stdout,
                                 encoding=locale.getpreferredencoding(False),
                                 errors='strict'): 
        print(line, end='')

您可以在此处尝试使用encoding，errors参数，例如设置encoding='ascii'或使用errors='namereplace'用\N{...}转义序列替换不支持的字符(以给定的字符编码)(用于调试)。

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在…

腾讯的同事天天给我安利让我看《三体》，说马化腾和雷军也在看。自己强行看了两个月，全部给看完了。感觉这文笔也就我读初中的水平……而且写着国内的一些情况，外国人能理解吗？这书为什么会这么火？这水平我也可以去写呀[笑哭][笑哭][笑哭] 招商银行员工：可以写赶紧写一个啊，能拿科幻文学雨果奖。包清白：哦楼主：pei ！tui ！你也配姓龙楼主：@赵龙王呵呵 […]