logo4 Evolution is progress—                          
progress is creativity.        
vline

Python SimpleXML conversion to nested arrays of dictionaries

view blog view wiki view wiki view wiki

For a long time, I was looking for a python XML conversion tool that produces a structure of nested arrays of dictionaries as SimpleXML which exists with Perl.

I found some attempts to convert the xml structure into nested objects. All these attempts have drawbacks as either data access is too complicated (calling a function) or it is too restricted (by usage of the attribute functions there is no way to distinguish between attributes and text or sub elements).

Beautiful Soup (BS) is a good alternative, but most useful for HTML. Though there exists a feature="xml" option it makes not a great difference. The main drawback of BS is that it contains so many elements that are virtually empty. If you are interested in xml only data within opening and closing tag (<tag>data</tag>) are necessary not all the newline and spaces between a closing and the next opening tag (</tag1> <tag2>). Next BS does not allow tag sorting, which makes perfect sense in HTML but in XML it is a nice feature, as xml files with sorted tags can be easilly compared by any text file comparing software. Besides Perl's SimpleXML provides this feature.

So I set out to develop it by my own.

Given a simple text xml file

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

This is my python code to convert the xml to a nested structure of arrays and dictionaries.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

The data can be accessed through the root attribute of the builder instance. The code below produces (newlines were added manually for readability):

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

Which is virtually the same as the Perl result:

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

Data can be simply accessed as with any other nested array and dictionary.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

The tree structure can be easily manipulated by python's array and dictionary functions append() and update() for instance.

To get back the xml data text file in an alphabetic order I wrote this little routine. Admittedly it still looks a bit clumsy. I'm working on it.

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

Which produces:

[Error: Macro 'code' error: code() got an unexpected keyword argument 'lang']

Tags: Software


Categories: Software

 
   

(c) Mato Nagel, Weißwasser 2004-2024, Disclaimer