"Look out honey, 'cause I'm using technology..."


The Musical Gardener's Tools #4: Lazyweb, lazyweb on the wall...

..who is the smartestest wgetter of them all?

I need a little help here. As I've described as part of an earlier post, one of my sources for new music is wget, in combination with an ever growing list of mp3 blog urls. The ever growing part is now slowly starting to become a problem. I ran my update script yesterday evening and it took well over 12 hours to complete. (Mind you, I have fiberoptics to the door, speed is not an issue, at least not at my end.) That is unacceptable, in terms of energy wasted. Also the way it works potentially wastes a lot of bandwidth for the poor blog owners, mostly because files I have deleted are downloaded again, unless they were removed from the blog in the meantime. Note that this hits sites heavier that put up music I don't like or already have, but that should hardly be the measure of all things. Maybe. ;)

I see two ways to solve this:

  1. drastically clean up the list of urls that I harvest from.

    This is possible, I do it semi-regularly, but new and interesting mp3 blogs keep popping up, so this is only a short term solution.

  2. filter out the stuff I know I don't want

    To some extent, I know what I don't want to download. First of all, long podcasts and extended mixes (let's arbitrarily say, anything over 20MB,) since the way I like to listen to music is at the individual track level, otherwise all my tagging tools and last.fm don't work. Anyway we're getting past the whole idea that (web) music radio is consumed in an order predefined by someone else. More suggestion, less force feeding, kthxbye. (On a tangent: can we get this for news radio: just the news items, not a whole, usually extremely repetitive, bulletin as atomic? True podcasting should let me skip items I'm not interested in/have already heard.) Second of all, for obvious reasons, all the files I've already downloaded but deleted.

Since I am far from a linux command line deity, I thought I would ask here, does anyone have any suggestions on how to start on tackling these two problems, given the script:

wget --timeout=5 -U"Mozilla/5.0" -r -l1 -H -t1 -x -nc -np -P ~/mp3blogs/ -A.mp3,.ogg -erobots=off -i ~/mp3blogs/urls.txt

A: How can I limit the length of mp3s and oggs downloaded in this way to for instance 20MB per file? Keep in mind, throwing them away after downloading is not an option, since I want to prevent the download from happening at all. I don't think wget has a switch for this, so it will probably not be possible in a one liner.

B: I would like to store all of the urls of the files I do download (probably just in a flat text file for now) and then have my script skip them when downloading. Again, I don't think a one liner is possible.

Solutions to either problem are worth a 20$ amazon voucher from me (or somewhere else, I don't really care, as long as I'm out only 40$ total and it's not too much hassle to get it to you.)

I am, of course, the sole judge of this contest, but I will try to be fair. You don't have to give me a whole script, I'm a fairly competent programmer, just not too deep into bash, but if you'll point me at where to start, and I get it to work, that counts as a solution. Although as I've said, it's going to grow beyond a one liner, I would like to keep it a simple script, and I'm not looking for an application. I could build one in Python myself, but I want to keep it zero maintenance, basically too simple to even put the code into subversion.

UPDATE 2008-01-28: I'm now looking into pavuk, which may or may not have all the features I need. If this works, I just earned myself 40$ :)

UPDATE 2008-01-28.1: pavuk, although having rather exotic naming of options and switches, seems to solve A quite nicely, which is a bandwidth (and time, and thus energy) saver. Finding all the right options was made much easier by this guide. I'm still thinking about solving B, there may be options in pavuk to help me with that too.

For completeness' sake, the updated script looks like this (except it should all be one line...):

pavuk -timeout 5000 -identity "Mozilla/5.0" -lmax 1 -retry 1 -dont_leave_dir -cdir ~/mp3blogs/ -asfx .mp3,.ogg -noRobots
 -urls_file ~/mp3blogs/urls.txt -maxsize 30000000 -fnrules F '*' '%h/%d/%n'



After reading this hypernarrative post about calendar mashups using yahoo pipes, I realized I could make my own filtered calendar feed for stuff events that are recommended to me by various sources, chiefly my last.fm recommendation feed. Since those sources tend to contain more noise than signal, at least for now, (automatic recommendation is hard, I read that somewhere,) and I tend to miss things because they get buried, I decided to take a page out of Wilbert's book, and become the editor of my very own event feed, mostly targeting myself, and perhaps one or two friends.

Since I use thunderbird with the lightning and google calendar provider plugins, which tend to visually clutter when too many events show up, I can now show only this feed there. Once a month I copy everything that looks remotely interesting from the other calendar feeds by hand, and Bob's my uncle. The yahoo pipes part is cool, and I might redirect all the feeds I subscribe to into one big source funnel yet, but for now I don't need it. Also I like to see who recommended me what, so I can unsubscribe from feeds that turn out to be of less interest to me than I thought.

So, without further ado, I present you with: teh coolendar! (The actual ical feed is here, for completeness' sake.)


My top 50 artists for 2007

In descending order of listening frequency:

Bishop Allen, Bright Eyes, Belle and Sebastian, Beck, Nina Simone, De La Soul, Rilo Kiley, Joni Mitchell, Ween, Steve Earle, Aimee Mann, Johnny Cash, OutKast, John Prine, Casiotone for the Painfully Alone, Hank Williams, Dusty Springfield, The Knife, Johan, Frank Black, Gillian Welch, Tori Amos, Indigo Girls, of Montreal, Beth Orton, Gorillaz, The Flaming Lips, A Tribe Called Quest, Sultans of Ping F.C., Flip Kowlier, The Young Knives, Duvelduvel, The Thermals, Kaiser Chiefs, Billy Bragg, Jacques Brel, Missy Elliott, Elastica, M. Ward, Van Morrison, Devo, Peaches, Nouvelle Vague, Mates of State, Martha Wainwright, The View, Randy Newman, Tom Waits, Editors, Damien Rice

There you have it, gentlemen, what more evidence do you need? Hardly anything very *now*, except maybe for The Thermals, The Young Knives and The View, all of which I personally didn't discover until 2007. Not terribly hip I'm afraid, but last.fm don't lie. :)


last.fm dream job

Wow, this sounds pretty amazing, it has programming, music metadata obsessiveness, and last.fm. Maybe they'd even let me use Python ;). Shame the timing's a little off, I can't really move to London right now, should I even interview successfully. I wish whoever gets the job a lot of fun!