keskiviikko 16. marraskuuta 2016

Simple python html parsing and list comprehension

Was just watching call-in television-show, where they asked people to find finish country names that are 5 letters long and end up in a-letter.

So decided to test how to do that fast way with python.

First just fetch wiki page about country names:

import urllib2;
response = urllib2.urlopen('https://fi.wikipedia.org/wiki/Luettelo_itsen%C3%A4isist%C3%A4_valtioista')

content = response.read()

Parse the content:

import BeautifulSoup as bs
soup = bs.BeautifulSoup(content)

Get all links in the page:

links = soup.findAll('a')

Get link texts that have ref starting with "/wiki":

wikisT = [l.getText() for l in links if l.get('href')!=None and l.get('href').startswith('/wiki')]

Get links that have text with length of 5 and end up in letter a:

wikisT5A = [w for w in wikisT if len(w)==5 and w[-1]=='a']

Print out wikisT5A in my iPython console:



Writing this blog entry took longer than finding out the countries with python, less than 5 mins or so (and that included googling about how urllib2 is used, have used it before, but didn't remember). Yep I like python these days.