Hello dear readers! welcome back to another section of our tutorial on Python. In this tutorial post, we are going to be discussing about the Python XML Processing. Try to read through this detailed guide carefully and feel free to ask your questions.
XML is a portable and open source language that allows developers to create applications that can be read by other application, mindless of the operating system and/or the developmental language.
XML is a portable and open source language that allows developers to create applications that can be read by other application, mindless of the operating system and/or the developmental language.
What is XML?
The Extensible Markup Language is a markup language much like Html, XHTML or SGML. This is recommeded by the World Wide Web Consortium and is available as an open standard.
The XML is extremely useful for keeping track of small to medium amounts of data without needing a SQL-based backbone.
The XML is extremely useful for keeping track of small to medium amounts of data without needing a SQL-based backbone.
RECOMMENDED POST: Python os.tempnam() Method with example
XML Parser Architectures and APIs
Python standard library provides a little but useful set of interfaces to work with XML.
The two most basic and broadly used APIs to XML data are known as the SAX and DOM interfaces.
The two most basic and broadly used APIs to XML data are known as the SAX and DOM interfaces.
- Simpe API for XML(SAX) - Here, you register callbacks for the events of interest and allow the parser to proceed through the document. This is actually useful when your documents are very large or when you have memory limitations, it parses the file as it is reading it from the disk and the full file is never stored in memory.
- Document Object Module (DOM) API - This is a World Wide Web Consortium(W3C) recommendation where the full file is read into memory and stored in a hierarchical (tree based) form to show all the nice features of an XML document.
RECOMMENDED POST: Python os.openpty() Method with example
SAX cannot process information as fast as DOM can when working with large files. On the other hand, using DOM exclusively can in fact kill your resources, especially if used on a lot of small files.
SAX is a read-only, while DOM let changes to the XML file. Since these two different APIs actually compliment each other, there is no reason at all why you cannot use them together in large projects.
SAX is a read-only, while DOM let changes to the XML file. Since these two different APIs actually compliment each other, there is no reason at all why you cannot use them together in large projects.
Example
For all our XML code examples, we will be using a simple XML file movies.xml as an input -
<collection shelf="New Arrivals"> <movie title="Enemy Behind"> <type>War, Thriller</type> <format>DVD</format> <year>2003</year> <rating>PG</rating> <stars>10</stars> <description>Talk about a US-Japan war</description> </movie> <movie title="Transformers"> <type>Anime, Science Fiction</type> <format>DVD</format> <year>1989</year> <rating>R</rating> <stars>8</stars> <description>A schientific fiction</description> </movie> <movie title="Trigun"> <type>Anime, Action</type> <format>DVD</format> <episodes>4</episodes> <rating>PG</rating> <stars>10</stars> <description>Vash the Stampede!</description> </movie> <movie title="Ishtar"> <type>Comedy</type> <format>VHS</format> <rating>PG</rating> <stars>2</stars> <description>Viewable boredom</description> </movie> </collection>
RECOMMENDED POST: Python Operators
Parsing XML with SAX APIs
The SAX is a standard interface for the event driven XML parsing. Parsing an XML with SAX normally needs that you create your own ContentHandler by subclassing xml.sax.ContentHandler.
Your newly created ContentHandler handles the particular tags and attributes of your flavor(s) of XML. A ContentHandler object provides methods to handle various parsing events. It is owning parser call ContentHandler methods as it parses the XML file.
The methods startDocument and the endDocument are called at the start and end of the XML file. The method characters(text) is passed character data of the XML file via parameter text.
The ContentHandler is called at the start and end of each element. If the parser isn't in namespace mode, then the following methods startElement(tag, attributes) and endElement(tag) are called; else, the corresponding methods startElementNS and endElementNS are called. Here, the tag is the element's tag and the attributes is an Attribute object.
Your newly created ContentHandler handles the particular tags and attributes of your flavor(s) of XML. A ContentHandler object provides methods to handle various parsing events. It is owning parser call ContentHandler methods as it parses the XML file.
The methods startDocument and the endDocument are called at the start and end of the XML file. The method characters(text) is passed character data of the XML file via parameter text.
The ContentHandler is called at the start and end of each element. If the parser isn't in namespace mode, then the following methods startElement(tag, attributes) and endElement(tag) are called; else, the corresponding methods startElementNS and endElementNS are called. Here, the tag is the element's tag and the attributes is an Attribute object.
RECOMMENDED: Object Oriented Programing in Python
The following below are other important methods to understand before proceeding -
The make_parser Method
The following method creates a new parser object and returns the object. The parser object created will be of the first parser type that the system finds.
Syntax
Following is the syntax for using the make_parser method -
xml.sax.make_parser( [parser_list] )
Parameter Details
Following below is the details of the parameters -
- parser_list - The optional argument consisting of a list of parsers to be used which must all implement the SAX make_parser method.
The parse Method
The following method creates a SAX parser and makes use of it to parse a document.
Syntax
Following is the syntax for using the parse method -
xml.sax.parse( xmlfile, contenthandler[, errorhandler])
Parameter Details
Following below is the details of the parameters -
- xmlfile - The name of the XML file to read from.
- contenthandler - This must be a ContentHandler object.
- errorhandler - If specified, it must be a SAX ErrorHandler object.
RECOMMENDED: Sending Emails in Python using SMTP
The parseString Method
There is one more method to create a SAX parser and parse the specified XML string.
Syntax
Following is the syntax for using the parse method -
xml.sax.parseString( xmlstring, contenthandler[, errorhandler])
Parameter Details
Following below is the details of the parameters -
- xmlstring - This is the name of the XML string to read from.
- contenthandler - This must be a ContentHandler object.
- errorhandler - If specified, it must be a SAX ErrorHandler object.
Example
Following below is an example -
#!/usr/bin/python import xml.sax class MovieHandler( xml.sax.ContentHandler ): def __init__(self): self.CurrentData = "" self.type = "" self.format = "" self.year = "" self.rating = "" self.stars = "" self.description = "" # Call when an element starts def startElement(self, tag, attributes): self.CurrentData = tag if tag == "movie": print "*****Movie*****" title = attributes["title"] print "Title:", title # Call when an elements ends def endElement(self, tag): if self.CurrentData == "type": print "Type:", self.type elif self.CurrentData == "format": print "Format:", self.format elif self.CurrentData == "year": print "Year:", self.year elif self.CurrentData == "rating": print "Rating:", self.rating elif self.CurrentData == "stars": print "Stars:", self.stars elif self.CurrentData == "description": print "Description:", self.description self.CurrentData = "" # Call when a character is read def characters(self, content): if self.CurrentData == "type": self.type = content elif self.CurrentData == "format": self.format = content elif self.CurrentData == "year": self.year = content elif self.CurrentData == "rating": self.rating = content elif self.CurrentData == "stars": self.stars = content elif self.CurrentData == "description": self.description = content if ( __name__ == "__main__"): # create an XMLReader parser = xml.sax.make_parser() # turn off namepsaces parser.setFeature(xml.sax.handler.feature_namespaces, 0) # override the default ContextHandler Handler = MovieHandler() parser.setContentHandler( Handler ) parser.parse("movies.xml")
Output
When the above code is executed, it will produce the following result -
*****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Year: 2003 Rating: PG Stars: 10 Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Year: 1989 Rating: R Stars: 8 Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Stars: 10 Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Stars: 2 Description: Viewable boredom
RECOMMENDED POST: Network Programming in Python
Parsing XML with DOM APIs
The document object module is a cross-language API which comes from the W3C for accessing and modifying XML documents.
The DOM is extremely useful for random-access applications. SAX only allows you view one bit of the document at a time. So if you are looking at one SAX element, then you have no access to another.
Here is the simplest way to quickly load an XML document and to create a minidom object using the xml.dom module. The minidom object provides a simple parser method that speedily creates a DOM tree from the XML file.
The sample phrase calls the phrase(file [,parser]) method of the minidom object to parse the XML file selected by file into a DOM tree object.
The DOM is extremely useful for random-access applications. SAX only allows you view one bit of the document at a time. So if you are looking at one SAX element, then you have no access to another.
Here is the simplest way to quickly load an XML document and to create a minidom object using the xml.dom module. The minidom object provides a simple parser method that speedily creates a DOM tree from the XML file.
The sample phrase calls the phrase(file [,parser]) method of the minidom object to parse the XML file selected by file into a DOM tree object.
Example
#!/usr/bin/python from xml.dom.minidom import parse import xml.dom.minidom # Open XML document using minidom parser DOMTree = xml.dom.minidom.parse("movies.xml") collection = DOMTree.documentElement if collection.hasAttribute("shelf"): print "Root element : %s" % collection.getAttribute("shelf") # Get all the movies in the collection movies = collection.getElementsByTagName("movie") # Print detail of each movie. for movie in movies: print "*****Movie*****" if movie.hasAttribute("title"): print "Title: %s" % movie.getAttribute("title") type = movie.getElementsByTagName('type')[0] print "Type: %s" % type.childNodes[0].data format = movie.getElementsByTagName('format')[0] print "Format: %s" % format.childNodes[0].data rating = movie.getElementsByTagName('rating')[0] print "Rating: %s" % rating.childNodes[0].data description = movie.getElementsByTagName('description')[0] print "Description: %s" % description.childNodes[0].data
Output
When the above code is executed, it will produce the following result -
Root element : New Arrivals *****Movie***** Title: Enemy Behind Type: War, Thriller Format: DVD Rating: PG Description: Talk about a US-Japan war *****Movie***** Title: Transformers Type: Anime, Science Fiction Format: DVD Rating: R Description: A schientific fiction *****Movie***** Title: Trigun Type: Anime, Action Format: DVD Rating: PG Description: Vash the Stampede! *****Movie***** Title: Ishtar Type: Comedy Format: VHS Rating: PG Description: Viewable boredom
RECOMMENDED: Multithreading Programing in Python
Alright guys! This is where we are rounding up for this tutorial post. In our next tutorial, we are going to be discussing about the Python GUI Programing.
Feel free to ask your questions where necessary and i will attend to them as soon as possible. If this tutorial was helpful to you, you can use the share button to share this tutorial.
Follow us on our various social media platforms to stay updated with our latest tutorials. You can also subscribe to our newsletter in order to get our tutorials delivered directly to your emails.
Thanks for reading and bye for now.
Feel free to ask your questions where necessary and i will attend to them as soon as possible. If this tutorial was helpful to you, you can use the share button to share this tutorial.
Follow us on our various social media platforms to stay updated with our latest tutorials. You can also subscribe to our newsletter in order to get our tutorials delivered directly to your emails.
Thanks for reading and bye for now.