Sunday, 12 April 2015

Desktop Live Score Board using Python to parse XML



Its IPL time !!! well.. to be honest,  somehow I wasn't too interested in this year's IPL, atleast before it started.  Few matches into it and the knocks by the B-Man from Chennai and Gayle made me think again. 

Having said that, this is quite a busy time which means no live streaming...well..at least some times.  Ya, cricinfo ..., but switching between tabs too isnt trivial, especially when coding a particular module on text editors.

So, I thought of writing a small snippet which can fetch the live scores onto the desktop. Luckily, Cricinfo provides RSS in XML format for non commercial purposes which can be leveraged for this.

So whats this XML????    Lets try to decipher !!!

XML is a basically a markup language that defines a set of rules for encoding documents in a format which is both human-readable and machine-readable. It is a textual data format with tremendous support for quite a lot human languages.There are three components in XML: DeclarationMarkup and Content

To keep it simple, lets say that every XML document begins with a declaration section where the specifics about the document format  can be found. 




Every thing that's between the special characters '<' and '>' is markup and the rest is the content. Having said that, markup is also specified between  '&' and ';', but lets not bother much about it now.


Markup and Content make up an XML Element. An XML document can be viewed as an accumulation of such XML elements.

Once we get  this basic understanding, the livescores file in the xml format can be parsed using Python for our task.  I've written a small script which does the same and pops up a desktop notification once every minute about the live scores. 

Example Rendering :















To run the script, navigate to the folder via teminal and type:

python scoreboard.py 'Number'

Here Number corresponds to the match you want the score for. 


Example:
Suppose the xml file looks this way:

<title>
Sri Lanka Air Force Sports Club 215/10 v Galle Cricket Club 31/10 & 278/7 *
</title>
<link>
http://www.cricinfo.com/ci/engine/match/859469.html?CMP=OTC-RSS
</link>
<description>
Sri Lanka Air Force Sports Club 215/10 v Galle Cricket Club 31/10 & 278/7 *
</description>
<guid>
http://www.cricinfo.com/ci/engine/match/859469.html
</guid>
</item>
<item>
<title>Hampshire v Sussex</title>
<link>
http://www.cricinfo.com/ci/engine/match/804159.html?CMP=OTC-RSS
</link>
<description>Hampshire v Sussex</description>
<guid>
http://www.cricinfo.com/ci/engine/match/804159.html
</guid>
</item>

Then

Number = 1 corresponds to the match between Srilankan Airforce vs Galle Cricket Club. 
Number = 2 corresponds to the match between Hampshire and Sussex
Feel free to play around with the script. 


Possible  Causes for Error:

1)  No module named bs4:

The script uses  Beautiful Soup to parse the XML, the absence of which is the cause of this error. 

BS can be installed via terminal using the command:
pip install beautifulsoup4

1)  No module named requests:

Requests can be installed via terminal using the command:
pip install requests

P.S. Use sudo in case of permission error
sudo pip install beautifulsoup4 
sudo pip install requests