Thursday, September 20, 2012

Download all Links from a Webpage with Python

Man, I love Python and BeautifulSoup!

I wanted to download a bunch of files from a webpage.  Fortunately we have Python.  Watch how ridiculously easy this is:

C:\>python
Python 2.6 (r26:66721, Oct  2 2008, 11:35:03) [MSC v.1500 32 bit (Intel)] on win 32
Type "help", "copyright", "credits" or "license" for more information.
>>> from BeautifulSoup import *

>>> import urllib
>>> import urllib2
>>> baseurl = "http://whatever.com/files/here/"
>>> soup = BeautifulSoup(urllib2.urlopen(baseurl))
>>> links=soup.findAll("a")
>>> for link in links[5:]:
...    print link.text
...    urllib.urlretrieve(baseurl+link.text, link.text)



--- Now watch the fun, your downloads have begun. ---

So now a little explanation.  BeautifulSoup is an HTML parser, and a damn good one at that.  It can handle really badly formed HTML with grace, and makes it really easy to do screen-scraping.  Really cool stuff.  

Basically my variable 'soup' will hold the entire contents of the webpage.  Now that object has a lot of capabilities, you will want to check out the BeautifulSoup docs to learn all of what it can do.  How about this:

soup.findAll("a")  #Boom.  This will return a Python list of all "a" tags


Now all I do is loop through them all.  I skip the first few because after inspection, these weren't files and I don't care about them.  Now I just call the urllib.urlretrieve(url, filename).  link.text is the actual text of the link.


In retrospect I probably should've done (urlretrieve(baseurl+link.href, link.text)), but you can figure that out for yourself.  This is meant to be inspiration and to apply this you might have to make some changes to these nine lines.  

Nine lines!!




Wednesday, September 19, 2012

ATV Riding in Indian River, MI, and GPS Tools

I've tried most of the GPS mapping solutions out there for Android, using my $45 Samsung phone (Craigslist find) as the test bed.  I was really hoping I could get an open source solution that would meet my needs.  The closest I found was called Aripuca.  It is almost exactly what I want, and I started playing around with the code, there is a lot of promise to this app.

The most advertising I saw was for Backcountry Navigator.  This app has good ratings on the Google Play store, but costs $10.  I figure this is not a bad price to pay if it is decent.  I tried the trial version and eventually bought the pro version.  I was quickly finding a lot of bugs with this app, such as my tracks drawing in the middle of a lake or a lot of flicker on the screen when the maps would redraw.

In between I'm not sure how many others I tried, but my favorite by far is a gem called OruxMaps.  If you're used to a Garmin eTrex, you will feel pretty close to home using this app.  It is EXTREMELY customizable and 'just works'.  Never had a FC, and almost always had expected behavior.  The only strange thing I found was at some point I had it navigating to a point, and I could not figure out how to 'cancel' the navigation, so it was always drawing a straight line from my position to the point it was trying to navigate me to.  Other than this annoyance it performed flawlessly.

So, here is our route in the Indian River area.  I used OruxMaps to record the route.  I shared it with myself by uploading it to my Google Drive account as a GPX file.  Once in drive I downloaded it to my laptop, where I tried to re upload it to Google Maps with very poor success.  I played with this for several hours, trying to modify the GPX file and get Maps to load the whole thing, but it would only display about half of the entire track.

I finally came across GPSVisualizer.com, a cool site that will let you upload your gpx file and visualize it in a number of different ways.  One of the ways you can save the file is to a site called EveryTrail.com, which is kind of like a blog site for your trips.  Here is the link it generated for our trip in Indian River:


EveryTrail - Find hiking trails in California and beyond


Later I'll post about my Ram mount on the four wheeler and maybe some screenshots of OruxMaps in action.