刮掉\ n \ t \ r - python

我正在尝试使用刮y的蜘蛛去除\ r \ n \ t字符,然后制作一个json文件。




def parse(self, response):
    for sel in response.xpath('//div[@class="d-grid-main"]'):
        item = xItem()
        item['TITLE'] = sel.xpath('xpath').extract()
        item['DESCRIPTION'] = map(unicode.strip, sel.xpath('//p[@class="class-name"]/text()').extract())


item['DESCRIPTION'] = str(sel.xpath('//p[@class="class-name"]/text()').extract()).strip()



unicode.strip 仅在字符串的开头和结尾处理空格字符



您可以使用自定义方法来删除字符串中的那些字符(使用正则表达式模块),甚至可以使用XPath's normalize-space()


示例python shell会话:

>>> text='''<html>
... <body>
... <div class="d-grid-main">
... <p class="class-name">
...  This is some text,
...  with some newlines \r
...  and some \t tabs \t too;
... <a href="http://example.com"> and a link too
...  </a>
... I think we're done here
... </p>
... </div>
... </body>
... </html>'''
>>> response = scrapy.Selector(text=text)
>>> response.xpath('//div[@class="d-grid-main"]')
[<Selector xpath='//div[@class="d-grid-main"]' data=u'<div class="d-grid-main">\n<p class="clas'>]
>>> div = response.xpath('//div[@class="d-grid-main"]')[0]
>>> # you'll want to use relative XPath expressions, starting with "./"
>>> div.xpath('.//p[@class="class-name"]/text()').extract()
[u'\n\n This is some text,\n with some newlines \r\n and some \t tabs \t too;\n\n',
 u"\n\nI think we're done here\n\n"]
>>> # only leading and trailing whitespace is removed by strip()
>>> map(unicode.strip, div.xpath('.//p[@class="class-name"]/text()').extract())
[u'This is some text,\n with some newlines \r\n and some \t tabs \t too;', u"I think we're done here"]
>>> # normalize-space() will get you a single string on the whole element
>>> div.xpath('normalize-space(.//p[@class="class-name"])').extract()
[u"This is some text, with some newlines and some tabs too; and a link too I think we're done here"]

Python 3运算符>>打印到文件 - python

我有以下Python代码编写项目的依赖文件。它可以在Python 2.x上正常工作,但是在使用Python 3进行测试时会报告错误。depend = None if not nmake: depend = open(".depend", "a") dependmak = open(".depend.mak&#…

快速返回没有Python中特定元素的列表的方法 - python

如果我有任意顺序的卡片套装列表,如下所示:suits = ["h", "c", "d", "s"] 我想返回一个没有'c'的列表noclubs = ["h", "d", "s"] 有没有简单的方法可以…

Python pytz时区函数返回的时区为9分钟 - python

由于某些原因,我无法从以下代码中找出原因:>>> from pytz import timezone >>> timezone('America/Chicago') 我得到:<DstTzInfo 'America/Chicago' LMT-1 day, 18:09:00 STD…

Python:检查是否存在维基百科文章 - python

我试图弄清楚如何检查Wikipedia文章是否存在。例如,https://en.wikipedia.org/wiki/Food 存在,但是https://en.wikipedia.org/wiki/Fod 不会,页面只是说:“维基百科没有此名称的文章。”谢谢! 参考方案 >>> import urllib >>> prin…

如何将Python字节字符串表示形式转换为字节? - python

我在文本文件中存储了许多Python字节对象,这些Python打印的内容类似于"b'\x80\x03}q\x00.'"如何将每个对象转换回字节对象?换句话说,我正在尝试找到一个执行convert("b'\x80\x03}q\x00.'") == b'\x80\x03}q…