Tag<?xml version="1.0" encoding='UTF-8'?> <painting> <img src="madonna.jpg" alt='Foligno Madonna, by Raphael'/> <caption>This is Raphael's "Foligno" Madonna, painted in <date>1511</date>-<date>1512</date>.</caption> </painting>
A markup construct that begins with "<" and ends with ">". Tags come in three flavors: start-tags, for example <section>, end-tags, for example </section>, and empty-element tags, for example <line-break/>.
Element
A logical component of a document which either begins with a start-tag and ends with an end-tag, or consists only of an empty-element tag. The characters between the start- and end-tags, if any, are the element's content, and may contain markup, including other elements, which are called child elements. An example of an element is <Greeting>Hello, world.</Greeting>. Another is <line-break/>.
Attribute
A markup construct consisting of a name/value pair that exists within a start-tag or empty-element tag. In this example, the name of the attribute is "number" and the value is "3": <step number="3">Connect A to B.</step> This element has two attributes, src and alt: <img src="madonna.jpg" alt='by Raphael'/> An element must not have two attributes with the same name.
XML in python
分为两种模式,event-based SAX and object-based DOM.
可参考 python in a nutshell
DOM:
The xml.dom.minidom Module
下面讲 DOM module
最主要的是 node class
document, element, attribute, text content 等都是 node
<excerpt>
<!-- Framespan 1:5030 -->
<filename>
MCTTR0902h.mov.deint.mpeg
</filename>
<begin>0.0</begin>
<duration>
201.20
</duration>
<sample_rate>
25
</sample_rate>
<language>
english
</language>
<source_type>
surveillance
</source_type>
</excerpt>
注意,- Node.nodeType
- An integer representing the node type. Symbolic constants for the types are on the Node object: ELEMENT_NODE, ATTRIBUTE_NODE, TEXT_NODE, CDATA_SECTION_NODE, ENTITY_NODE, PROCESSING_INSTRUCTION_NODE, COMMENT_NODE, DOCUMENT_NODE, DOCUMENT_TYPE_NODE, NOTATION_NODE
代码
import sys
import os
import subprocess
from xml.dom import minidom
xmldoc = minidom.parse('E:/eclipse/xml/expt_2009_retroED_EVAL09_ENG_s-camera_NIST_2.xml')
reflist = xmldoc.getElementsByTagName('excerpt')
of = open('video.txt','w')
for ref in reflist:
filelist = ref.getElementsByTagName('filename')
for file in filelist:
#print file.firstChild.nodeValue
of.write(file.firstChild.nodeValue.strip())
#print file.firstChild.nodeName
#print file.firstChild.nodeType
of.write('\t');
filelist = ref.getElementsByTagName('begin')
for file in filelist:
#print file.firstChild.nodeValue
of.write(file.firstChild.nodeValue.strip())
#print file.firstChild.nodeName
#print file.firstChild.nodeType
of.write('\t');
filelist = ref.getElementsByTagName('duration')
for file in filelist:
#print file.firstChild.nodeValue
#print file.firstChild.nodeName
#print file.firstChild.nodeType
of.write(file.firstChild.nodeValue.strip())
of.write('\t');
framespan = int(float(file.firstChild.nodeValue.strip())*25);
of.write(str(framespan))
of.write('\n');
of.close()
经验:- 要对需处理的 xml 有足够的了解
- 使用 getElementsByTagName,如果确认只有一个 child,则可以用 [0]
- 如果不是,则处理一个 list
- 使用 nodeType 进行判断
0 comments:
Post a Comment