我正在用我编写的简单脚本(有一些调整)来解析一个简单的XML文档。这是XML:
<?xml version="1.0" ?>
<library owner="James Wise">
<book>
<title>Sandman Volume 1: Preludes and Nocturnes</title>
<author>Neil Gaiman</author>
</book>
<book>
<title>Good Omens</title>
<author>Neil Gamain</author>
<author>Terry Pratchett</author>
</book>
<book>
<title>The Man And The Goat</title>
<author>Bubber Elderidge</author>
</book>
<book>
<title>Once Upon A Time in LA</title>
<author>Dr Dre</author>
</book>
<book>
<title>There Will Never Be Justice</title>
<author>IR Jury</author>
</book>
<book>
<title>Beginning Python</title>
<author>Peter Norton, et al</author>
</book>
</library>
这是我的Python脚本:
from xml.dom.minidom import parse
import xml.dom.minidom
import csv
def writeToCSV(myLibrary):
csvfile = open('output.csv', 'w')
fieldnames = ['title', 'author', 'author']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
books = myLibrary.getElementsByTagName("book")
for book in books:
titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
for author in book.getElementsByTagName("author"):
authorValue = author.childNodes[0].data
writer.writerow({'title': titleValue, 'author': authorValue})
doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]
# Get book elements in Library
books = myLibrary.getElementsByTagName("book")
# Print each book's title
writeToCSV(myLibrary)
这是我的输出:
title,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
Good Omens,Neil Gamain
Good Omens,Terry Pratchett
The Man And The Goat,Bubber Elderidge
Once Upon A Time in LA,Dr Dre
There Will Never Be Justice,IR Jury
Beginning Python,"Peter Norton, et al"
请注意,这本书“ Good Omens”有2位作者,分别显示在两行上。我真正想要的是显示如下:
title,author,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman,,
Good Omens,Neil Gamain,Terry Pratchett
The Man And The Goat,Bubber Elderidge,,
Once Upon A Time in LA,Dr Dre,,
There Will Never Be Justice,IR Jury,,
Beginning Python,"Peter Norton, et al",,
如您所见,共有3列,因此两位作者显示在同一行上。那些只有一位作者的书,只是一个空白条目,所以两个逗号相邻。
python大神给出的解决方案
这是另一种可能的解决方案:
码:
#! /usr/bin/python
from xml.dom.minidom import parse
import xml.dom.minidom
import csv
def writeToCSV(myLibrary):
with open('output.csv', 'wb') as csvfile:
writer = csv.writer(csvfile, delimiter=',',quotechar='"', quoting=csv.QUOTE_MINIMAL)
writer.writerow(['title', 'author', 'author'])
books = myLibrary.getElementsByTagName("book")
for book in books:
titleValue = book.getElementsByTagName("title")[0].childNodes[0].data
authors = [] # get all the authors in a vector
for author in book.getElementsByTagName("author"):
authors.append(author.childNodes[0].data)
writer.writerow([titleValue] + authors) # write to csv
doc = parse('library.xml')
myLibrary = doc.getElementsByTagName("library")[0]
# Print each book's title
writeToCSV(myLibrary)
输出:
title,author,author
Sandman Volume 1: Preludes and Nocturnes,Neil Gaiman
Good Omens,Neil Gamain,Terry Pratchett
The Man And The Goat,Bubber Elderidge
Once Upon A Time in LA,Dr Dre
There Will Never Be Justice,IR Jury
Beginning Python,"Peter Norton, et al"
亲切的问候,