python模块之xml.etree.ElementTree

  • Post author:
  • Post category:python



xml.etree.ElementTree用于解析和构建XML文件

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23

<?xml version=


"1.0"


?>

<data>



<country name=


"Liechtenstein"


>



<rank>1<


/rank


>



<year>2008<


/year


>



<gdppc>141100<


/gdppc


>



<neighbor name=


"Austria"


direction=


"E"


/>



<neighbor name=


"Switzerland"


direction=


"W"


/>



<


/country


>



<country name=


"Singapore"


>



<rank>4<


/rank


>



<year>2011<


/year


>



<gdppc>59900<


/gdppc


>



<neighbor name=


"Malaysia"


direction=


"N"


/>



<


/country


>



<country name=


"Panama"


>



<rank>68<


/rank


>



<year>2011<


/year


>



<gdppc>13600<


/gdppc


>



<neighbor name=


"Costa Rica"


direction=


"W"


/>



<neighbor name=


"Colombia"


direction=


"E"


/>



<


/country


>

<


/data


>


解析XML文件


parse()函数,从xml文件返回ElementTree

1
2
3

from


xml.etree.ElementTree


import


parse

tree


=


parse(


'demo.xml'


)


/


/


获取ElementTree

root


=


tree.getroot()


/


/


获取根元素


Element.tag 、Element.attrib、Element.text

1
2
3
4
5
6
7
8

In [


6


]: root.tag

Out[


6


]:


'data'

In [


7


]: root.attrib

Out[


7


]: {}

In [


25


]: root.text

Out[


25


]:


'\n    '


for child in root  迭代获得子元素

1
2
3
4
5
6

In [


8


]:


for


child


in


root:



...:


print


(child.tag, child.attrib)



...:

country {



'name'


:


'Liechtenstein'


}

country {



'name'


:


'Singapore'


}

country {



'name'


:


'Panama'


}


Element.get()  获得属性值

1
2
3
4
5
6

In [


27


]:


for


child


in


root:



...:


print


(child.tag, child.get(


'name'


))



...:

country Liechtenstein

country Singapore

country Panama


root.getchildren()  获得直接子元素

1
2
3
4
5

In [


21


]: root.getchildren()

Out[


21


]:

[<Element


'country'


at


0x7f673581c728


>,



<Element


'country'


at


0x7f673581ca98


>,



<Element


'country'


at


0x7f673581cc28


>]


root[0][1]  根据索引查找子元素

1
2
3
4
5

In [


9


]: root[


0


][


1


].text

Out[


9


]:


'2008'

In [


10


]: root[


1


][


0


].text

Out[


10


]:


'4'


root.find() 根据tag查找直接子元素,返回查到的第一个元素

1
2

In [


13


]: root.find(


'country'


).attrib

Out[


13


]: {



'name'


:


'Liechtenstein'


}


root.findall()    根据tag查找直接子元素,返回查到的所有元素的列表

1
2
3
4
5
6

In [


16


]:


for


country


in


root.findall(


'country'


):



...:


print


(country.attrib)



...:

{



'name'


:


'Liechtenstein'


}

{



'name'


:


'Singapore'


}

{



'name'


:


'Panama'


}


root.iterfind()   根据tag查找直接子元素,返回查到的所有元素的生成器

1
2

In [


22


]: root.iterfind(


'country'


)

Out[


22


]: <generator


object


prepare_child.<


locals


>.select at


0x7f6736dccfc0


>


支持的XPath语句(XML Path)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55

In [


19


]: root.findall(


'.//rank'


)


/


/


查找任意层次元素

Out[


19


]:

[<Element


'rank'


at


0x7f673581c8b8


>,



<Element


'rank'


at


0x7f673581c6d8


>,



<Element


'rank'


at


0x7f673581cc78


>]

In [


32


]: root.findall(


'country/*'


)


/


/


查找孙子节点元素

Out[


32


]:

[<Element


'rank'


at


0x7f673581c8b8


>,



<Element


'year'


at


0x7f673581cbd8


>,



<Element


'gdppc'


at


0x7f673581c958


>,



<Element


'neighbor'


at


0x7f673581c688


>,



<Element


'neighbor'


at


0x7f673581cb38


>,



<Element


'rank'


at


0x7f673581c6d8


>,



<Element


'year'


at


0x7f673581c5e8


>,



<Element


'gdppc'


at


0x7f673581c868


>,



<Element


'neighbor'


at


0x7f673581cb88


>,



<Element


'rank'


at


0x7f673581cc78


>,



<Element


'year'


at


0x7f673581ccc8


>,



<Element


'gdppc'


at


0x7f673581cd18


>,



<Element


'neighbor'


at


0x7f673581cd68


>,



<Element


'neighbor'


at


0x7f673581cdb8


>]

In [


33


]: root.findall(


'.//rank/..'


)


/


/


..表示父元素

Out[


33


]:

[<Element


'country'


at


0x7f673581c728


>,



<Element


'country'


at


0x7f673581ca98


>,



<Element


'country'


at


0x7f673581cc28


>]

In [


34


]: root.findall(


'country[@name]'


)


/


/


包含name属性的country

Out[


34


]:

[<Element


'country'


at


0x7f673581c728


>,



<Element


'country'


at


0x7f673581ca98


>,



<Element


'country'


at


0x7f673581cc28


>]

In [


35


]: root.findall(


'country[@name="Singapore"]'


)


/


/


name属性为Singapore的country

Out[


35


]: [<Element


'country'


at


0x7f673581ca98


>]

In [


36


]: root.findall(


'country[rank]'


)


/


/


孩子元素中包含rank的country

Out[


36


]:

[<Element


'country'


at


0x7f673581c728


>,



<Element


'country'


at


0x7f673581ca98


>,



<Element


'country'


at


0x7f673581cc28


>]

In [


37


]: root.findall(


'country[rank="68"]'


)


/


/


孩子元素中包含rank且rank元素的text为


68


的country

Out[


37


]: [<Element


'country'


at


0x7f673581cc28


>]

In [


38


]: root.findall(


'country[1]'


)


/


/


第一个country

Out[


38


]: [<Element


'country'


at


0x7f673581c728


>]

In [


39


]: root.findall(


'country[last()]'


)


/


/


最后一个country

Out[


39


]: [<Element


'country'


at


0x7f673581cc28


>]

In [


40


]: root.findall(


'country[last()-1]'


)


/


/


倒数第二个country

Out[


40


]: [<Element


'country'


at


0x7f673581ca98


>]


root.iter()  递归查询指定的或所有子元素

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29

In [


29


]: root.


iter


()

Out[


29


]: <_elementtree._element_iterator at


0x7f67355dd728


>

In [


30


]:


list


(root.


iter


())

Out[


30


]:

[<Element


'data'


at


0x7f673581c778


>,



<Element


'country'


at


0x7f673581c728


>,



<Element


'rank'


at


0x7f673581c8b8


>,



<Element


'year'


at


0x7f673581cbd8


>,



<Element


'gdppc'


at


0x7f673581c958


>,



<Element


'neighbor'


at


0x7f673581c688


>,



<Element


'neighbor'


at


0x7f673581cb38


>,



<Element


'country'


at


0x7f673581ca98


>,



<Element


'rank'


at


0x7f673581c6d8


>,



<Element


'year'


at


0x7f673581c5e8


>,



<Element


'gdppc'


at


0x7f673581c868


>,



<Element


'neighbor'


at


0x7f673581cb88


>,



<Element


'country'


at


0x7f673581cc28


>,



<Element


'rank'


at


0x7f673581cc78


>,



<Element


'year'


at


0x7f673581ccc8


>,



<Element


'gdppc'


at


0x7f673581cd18


>,



<Element


'neighbor'


at


0x7f673581cd68


>,



<Element


'neighbor'


at


0x7f673581cdb8


>]

In [


31


]:


list


(root.


iter


(


'rank'


))

Out[


31


]:

[<Element


'rank'


at


0x7f673581c8b8


>,



<Element


'rank'


at


0x7f673581c6d8


>,



<Element


'rank'


at


0x7f673581cc78


>]