Tag<?xml version="1.0" encoding='UTF-8'?> <painting> <img src="madonna.jpg" alt='Foligno Madonna, by Raphael'/> <caption>This is Raphael's "Foligno" Madonna, painted in <date>1511</date>-<date>1512</date>.</caption> </painting>
A markup construct that begins with "<" and ends with ">". Tags come in three flavors: start-tags, for example <section>, end-tags, for example </section>, and empty-element tags, for example <line-break/>.
Element
A logical component of a document which either begins with a start-tag and ends with an end-tag, or consists only of an empty-element tag. The characters between the start- and end-tags, if any, are the element's content, and may contain markup, including other elements, which are called child elements. An example of an element is <Greeting>Hello, world.</Greeting>. Another is <line-break/>.
Attribute
A markup construct consisting of a name/value pair that exists within a start-tag or empty-element tag. In this example, the name of the attribute is "number" and the value is "3": <step number="3">Connect A to B.</step> This element has two attributes, src and alt: <img src="madonna.jpg" alt='by Raphael'/> An element must not have two attributes with the same name.
XML in python
分为两种模式,event-based SAX and object-based DOM.
可参考 python in a nutshell
DOM:
The xml.dom.minidom Module
下面讲 DOM module
最主要的是 node class
document, element, attribute, text content 等都是 node
<excerpt> <!-- Framespan 1:5030 --> <filename> MCTTR0902h.mov.deint.mpeg </filename> <begin>0.0</begin> <duration> 201.20 </duration> <sample_rate> 25 </sample_rate> <language> english </language> <source_type> surveillance </source_type> </excerpt>注意,
- Node.nodeType
- An integer representing the node type. Symbolic constants for the types are on the Node object: ELEMENT_NODE, ATTRIBUTE_NODE, TEXT_NODE, CDATA_SECTION_NODE, ENTITY_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, NOTATION_NODE
代码
import sys import os import subprocess from xml.dom import minidom xmldoc = minidom.parse('E:/eclipse/xml/expt_2009_retroED_EVAL09_ENG_s-camera_NIST_2.xml') reflist = xmldoc.getElementsByTagName('excerpt') of = open('video.txt','w') for ref in reflist: filelist = ref.getElementsByTagName('filename') for file in filelist: #print file.firstChild.nodeValue of.write(file.firstChild.nodeValue.strip()) #print file.firstChild.nodeName #print file.firstChild.nodeType of.write('\t'); filelist = ref.getElementsByTagName('begin') for file in filelist: #print file.firstChild.nodeValue of.write(file.firstChild.nodeValue.strip()) #print file.firstChild.nodeName #print file.firstChild.nodeType of.write('\t'); filelist = ref.getElementsByTagName('duration') for file in filelist: #print file.firstChild.nodeValue #print file.firstChild.nodeName #print file.firstChild.nodeType of.write(file.firstChild.nodeValue.strip()) of.write('\t'); framespan = int(float(file.firstChild.nodeValue.strip())*25); of.write(str(framespan)) of.write('\n'); of.close()经验:
- 要对需处理的 xml 有足够的了解
- 使用 getElementsByTagName,如果确认只有一个 child,则可以用 [0]
- 如果不是,则处理一个 list
- 使用 nodeType 进行判断
0 comments:
Post a Comment