"Look out honey, 'cause I'm using technology..."

2008-11-02

Ubiquity command: lastfm

I found myself tagging all the songs of the soundtrack of the tv series 'Weeds' on last.fm (as you do,) when I realized the search on last.fm is less than optimal for this purpose. It almost always takes three clicks to get to the page I need, even if I have the exact spelling for an artist/track. Enter Ubiquity, the new firefox extension that allows you to add and use simple commands to your browser. In less than an hour, most of which was spent reading the excellent tutorial, the command was working.

The script can be found here on github.

To use it, open ubiquity with your command key, then type:

lastfm [artistname]

or:

lastfm [artistname] - [title]

to go directly to the artist or track page on last.fm in a new tab. Because of the way ubiquity works, you can also select text on a page, and call the command with the selected text as an argument.

Note: this is pretty rough around the edges for now: If the artist name or the track title contain dashes, it will not end up on the correct page. Also, I'm not sure how I can host this so that others can subscribe to the command, and automatically get changes. I currently don't have any 'real' hosting. Maybe github allows me to do this, but I haven't figured it out yet.

P.S.: Here's the link to the tag, and the one to the soundtrack for True Blood, which is a work in progress. Both have very good and diverse soundtracks, which make for nice impromptu last.fm radio stations:

Weeds tag radio
True Blood tag radio

2008-11-01

A better anti-spam method

Recently an idea on how to combat spam occurred to me, that I haven't heard before. It's very low tech, which I find attractive, and it would never result in false positives, which is the most important shortcoming of all the systems I've tried up until now.

What if everyone in the world had an additional email address, that they would never ever use or give out, but that would be publicized in places where only email address harvesting bots would encounter them. A bit of handwaving here, but in its simplest form, just put your real and your spam email address on your webpage, completely unobfuscated, (perhaps even in mailto: links!) but stipulating (in a clear, but not easily machine readable way,) that people should not send mail to the fake/spam one should be enough.

Then *any* message that arrives in the spam account is necessarily spam. You can now use that account to filter your real account, by removing messages with a body that is identical, or similare enough. Again some handwaving on similar enough, (do this wrong, and voilĂ : false positives again) but you get the drift.

This kind of filter could be implemented by a provider, where the user would not have to do anything manually, except putting the fake address out there for the harvesters to find, or it could be implemented client side, where the mail client gets all mail from both accounts, and does its thing.

And another thing: I also think the current bayesian filters could be improved upon, by recognizing more/different patterns than just lexical ones. I have an intuition that character based markov chaining could catch a lot of spam I get: I built a small script in college which could reliably distinguish quite a number of languages. That would get rid of all the mail addressed to me in languages I cannot read, which I would classify as spam. Taken further this could also get rid of intentionally misspelt spam, to the detriment of poor spellers that want to send me legitimate mail, or (specific patterns of) html markup in mails, which would get rid of most all the other tricks one could use to show the word 'viagra', without actually writing it.

I might have a go at this, to see if my intuition that these could be better predictors than word counts is correct. (I would train my dutch email and my english email into separate 'ham languages', and everything showing entire unpredicted character sequences, would be unsure/spam. Marking things as spam could train a 'spam' language, so there could be both positive and negative indicators.)

2008-08-09

Autoqueue goes cross-player!

When m' colleague Sylvain expressed an interest in porting my autoqueue plugin for Quod Libet to itunes, I experimented a little with factoring out all the generic parts, and it turned out the player specific stuff isn't all that much, so I decided to do a little work and see what kind of problems I would run into when porting it to another player. I chose Rhythmbox for my experiment, since it's in my Ubuntu anyway, and it has support for python plugins.

Turns out it was pretty easy. I have a large part of the featureset working in less than a day, with a lot of help from this page:

http://live.gnome.org/RhythmboxPlugins/WritingGuide

and example code in Alexandre Rosenfeld's lastfmqueue plugin:

http://code.google.com/p/airmindprojects/source/browse/#svn/trunk/rbplugins/lastfm_queue

which offers similar functionality, but is a little more lightweight (less features/bloat, depending on how you look at it ;)

I also moved autoqueue into it's own repository, since it's now no longer solely a Quod Libet plugin, nor, hopefully, a single developer effort. If you're a rhythmbox (or Quod Libet) user and you're interested in checking an early, but working version out, get the plugin here:

http://code.google.com/p/autoqueue/source/browse/trunk

You'll need autoqueue.py, rhythmbox_autoqueue.py, and rhythmbox_autoqueue.rb-plugin. Drop those in your ~/.gnome2/rhythmbox/plugins directory, start rhythmbox, and activate the autoqueue plugin.

If you have questions, feature requests, or would like to help with porting the plugin to your favorite player, you can contact me directly, or even better, join the autoqueue mailing list here:

http://groups.google.com/group/autoqueue

2008-08-04

mp3spider becomes barbipes

Just a short note: After my colleague Sylvain showed some interest in my mp3spider script, and actually built some really cool new features for it, I decided to split it off into its own little project rather than keep it in my supremely unimaginatively named 'thisfred-python-stuff'.

After a very short google I found this cute little critter:

http://en.wikipedia.org/wiki/Saitis_barbipes

And so, from now on, the mp3spider will be known as barbipes and can be found here:

http://code.google.com/p/barbipes/

Anyone interested in contributing, just drop me a note at my usual username at gmail and I'll give you check-in rights.

2008-06-29

Quod Libet Plugins Released!

After working on them for a long time, and then procrastinating for at least as long on wrapping them up into releasable shape, I'm sort of proud to announce my plugins for Quod Libet, (the best music player I have yet found):

http://code.google.com/p/thisfred-quodlibet-plugins/downloads/list

There are three plugins in there, in order of increasing complexity and interest:

  1. autosearch.py

    Very simple plugin, searches for the title of the current song in your library: Good for getting rid of duplicates, and finding possible covers.

  2. lastfmtagger.py

    Useful only if you have a last.fm account and make use of tags there. This will synchronize last.fm tags both ways, saving them in a custom 'tag' id3 field in your local files. Since Quod Libet has a great id3 editing interface (Ex Falso, also usable as a stand alone application,) this makes adding and editing tags to songs, artists and albums on last.fm much easier.

  3. autoqueue.py

    This gets similar tracks to the ones you play from last.fm and puts them in the queue. It is smart enough not to play the same artists/songs for a configurable time, and has some other options (for instance it can also look up similar songs based on the tags created by the lastfmtagger.py mentioned above.) It works pretty well in creating a consistent yet not wholly predictable listening experience if you have a large and diverse library.

As always, feedback is very welcome, I'm sure there are some bugs left in there, or at the very least some rough edges.

(edit: of course within minutes of posting this I find out last.fm's 2.0 API has gone official, so it appears that once again there is some work to do :) Apparently there's no API for getting similar tracks anymore though. That's a shame...)

2008-06-19

last.fm adds event radio

Cool! Now I don't have to tag the artists for the festivals I'm going to by hand, which always felt a little redundant. (I still might though, since I synchronize the tags with my local library, and it's nice to play all the songs I have locally from a particular edition of a particular festival.)

Anyhoo, urls like (for lowlands 2008, which is going to be very good):

http://www.last.fm/listen/event/436106

allow you to listen to a randomized stream of the entire lineup. A feature I have been rooting for for a long time. Last.fm, once again you come through.

2008-02-23

The Musical Gardener's Tools #5: Yet Another Way to Harvest mp3blogs

Update 2008-03-11: There were a number of things wrong with this script making the spidering *waaaay* slower than it needs to be. Fixed that below, and added threading for both the spidering and downloading, thanks to this cool recipe by Wim Schut which lets me run all the sqlite code in a separate thread. (Important because you can only use sqlite connections in the thread in which they were created.) All of this results in a nice speed-up.

Ok I said I wasn't going to, but I did end up writing a bit of code, although it didn't get too far out of hand. Yet :). It solves *all* of my problems: it does not download files over 30MB in size, and it never downloads the same link twice.

I found this message on the python mailing list, which seemed like a very good start. It almost did what I needed, but not quite, and also the parsing was overcomplicated and didn't catch all links, so I replaced that with a simple regular expression.

I ended up changing most of the code and functionality, (for instance it now stores links in a database.) There's a lot of hard coding in there, which I could factor out if people want to use it, but for now it solves my problems beautifully ;).

It's used with the following syntax:

# initial set up
python spider.py createdb
# add a new blog to be harvested
python spider.py add http://url.of.blog/
# (shallowly) spider all blogs for new links to files
python spider.py
# spider a url to a specific depth (5 for example should get 
# most everything, but will take a while)
python spider.py deepspider 5
# download all files
python spider.py download

A minor problem is that curl doesn't do *minimum* file sizes, and with a lot of broken links it does download something small that isn't really an ogg or mp3 file, but a http response. I can probably solve this better, but for now I call the download from an update script as follows:

python spider.py download
find . -iname "*.mp3" -size "-100k"  -print0 | xargs -0 rm
find . -iname "*.ogg" -size "-100k"  -print0 | xargs -0 rm
find . -iname "*.mp3" -print0 | xargs -0 mp3gain -k -r -f
find . -iname "*.ogg" -print0 | xargs -0 vorbisgain -fr

Translation: download files, throw away suspiciously small ones, mp3/vorbisgain what's left.

Here's the code:

Edit 2008-04-18: Moved the code to google code, so I don't have to update it here. Find the latest version here: spider.py

2008-02-12

Reminders via del.icio.us and Yahoo Pipes

Just thought I'd share this, while we still *have* del.icio.us and yahoo pipes... ;)

A while ago I stumbled agross tagmindr, which seemed like a cool idea: put a custom tag on some url in your del.icio.us account and it will remind you at a certain date to look at that url again. I frequently see announcements on website that say something like check back here on [some date] for [some interesting news]. Having automatic reminders for things like that are great, because the chances of me forgetting otherwise are near 100%. The thing is: the personal feed tagmindr promised me *NEVER WORKED*. That's how I completely forgot about a number of things, and tagmindr itself for a while. No biggie, just not very smart if you wanna get all start-uppy and generate buzz ;)

So I decided, how hard can it be to get this right? Turns out not hard at all! I built a yahoo pipe in under an hour that does exactly the same thing. You just give it your del.icio.us username, and it gives you reminders for anything you tag with the tags 'reminders' and 'remind:yyyy-mm-dd' where you replace the ys and ms and ds with the date you want to be reminded on.

The yahoo pipe is here. Enjoy! (As always, feedback and bug reports very welcome!)

Yahoo pipes rock, del.icio.us rocks. Let's hope Yahoo can hold out. I have nothing against Microsoft per se, but I don't think they ever got the web, and I fear they will screw up the nice and open things (like YUI, for instance) that Yahoo has been developing in the past few years.

2008-02-03

Exploratory programming, or my 2 ¢ on arc

A lot of people have blogged on Paul Graham's new language, arc, the (perceived lack of) new features it brings, and the intentionally non-PC announcement by Mr. Graham. I don't have much to add to that particular debate. It looks like lisp, with some new syntactic sugar, which is fine by me. I like lisp, but I wouldn't want to use it in my day job. Others no doubt do, and their taste is no worse or better than mine.

I think maybe the development time worked against it, in that some features seem less than revolutionary, because other languages got there first. Now lisp has them too, and maybe even better implemented, I'm not one to judge.

What I take issue with is that Mr. Graham explains the lack of some other features by the fact that arc is for exploratory programming only, and those features are somehow a hindrance for that. I think this is just plain wrong: unicode support will hurt noone in their exploratory programming, it will actually help a lot of people a good deal. Mr. Graham quotes Guido van Rossum stating he spent a year implementing unicode. I very much doubt that that quote is correct, but *even if it were*, so what? That means exactly nothing to the exploratory programmer, and only hurts the exploratory *language designer* which I think may be a little closer to what is going on here.

As an exploratory programmer in any language I've ever used, (I do think Mr. Graham correct in saying everyone is,) I can safely say that features have never harmed me, as long as they did not get in the way when I wasn't using them. Unicode support in python doesn't. In fact, python (my favorite language, *and* the one I'm most fluent in by now, so yes, I'm biased) is absolutely fantastic for exploratory programming, exactly because of its huge standard library, which helps you get to the meat of the task at hand, without having to build your own support library first.

2008-01-24

The Musical Gardener's Tools #4: Lazyweb, lazyweb on the wall...

..who is the smartestest wgetter of them all?

I need a little help here. As I've described as part of an earlier post, one of my sources for new music is wget, in combination with an ever growing list of mp3 blog urls. The ever growing part is now slowly starting to become a problem. I ran my update script yesterday evening and it took well over 12 hours to complete. (Mind you, I have fiberoptics to the door, speed is not an issue, at least not at my end.) That is unacceptable, in terms of energy wasted. Also the way it works potentially wastes a lot of bandwidth for the poor blog owners, mostly because files I have deleted are downloaded again, unless they were removed from the blog in the meantime. Note that this hits sites heavier that put up music I don't like or already have, but that should hardly be the measure of all things. Maybe. ;)

I see two ways to solve this:

  1. drastically clean up the list of urls that I harvest from.

    This is possible, I do it semi-regularly, but new and interesting mp3 blogs keep popping up, so this is only a short term solution.

  2. filter out the stuff I know I don't want

    To some extent, I know what I don't want to download. First of all, long podcasts and extended mixes (let's arbitrarily say, anything over 20MB,) since the way I like to listen to music is at the individual track level, otherwise all my tagging tools and last.fm don't work. Anyway we're getting past the whole idea that (web) music radio is consumed in an order predefined by someone else. More suggestion, less force feeding, kthxbye. (On a tangent: can we get this for news radio: just the news items, not a whole, usually extremely repetitive, bulletin as atomic? True podcasting should let me skip items I'm not interested in/have already heard.) Second of all, for obvious reasons, all the files I've already downloaded but deleted.

Since I am far from a linux command line deity, I thought I would ask here, does anyone have any suggestions on how to start on tackling these two problems, given the script:

wget --timeout=5 -U"Mozilla/5.0" -r -l1 -H -t1 -x -nc -np -P ~/mp3blogs/ -A.mp3,.ogg -erobots=off -i ~/mp3blogs/urls.txt

A: How can I limit the length of mp3s and oggs downloaded in this way to for instance 20MB per file? Keep in mind, throwing them away after downloading is not an option, since I want to prevent the download from happening at all. I don't think wget has a switch for this, so it will probably not be possible in a one liner.

B: I would like to store all of the urls of the files I do download (probably just in a flat text file for now) and then have my script skip them when downloading. Again, I don't think a one liner is possible.

Solutions to either problem are worth a 20$ amazon voucher from me (or somewhere else, I don't really care, as long as I'm out only 40$ total and it's not too much hassle to get it to you.)

I am, of course, the sole judge of this contest, but I will try to be fair. You don't have to give me a whole script, I'm a fairly competent programmer, just not too deep into bash, but if you'll point me at where to start, and I get it to work, that counts as a solution. Although as I've said, it's going to grow beyond a one liner, I would like to keep it a simple script, and I'm not looking for an application. I could build one in Python myself, but I want to keep it zero maintenance, basically too simple to even put the code into subversion.

UPDATE 2008-01-28: I'm now looking into pavuk, which may or may not have all the features I need. If this works, I just earned myself 40$ :)

UPDATE 2008-01-28.1: pavuk, although having rather exotic naming of options and switches, seems to solve A quite nicely, which is a bandwidth (and time, and thus energy) saver. Finding all the right options was made much easier by this guide. I'm still thinking about solving B, there may be options in pavuk to help me with that too.

For completeness' sake, the updated script looks like this (except it should all be one line...):

pavuk -timeout 5000 -identity "Mozilla/5.0" -lmax 1 -retry 1 -dont_leave_dir -cdir ~/mp3blogs/ -asfx .mp3,.ogg -noRobots
 -urls_file ~/mp3blogs/urls.txt -maxsize 30000000 -fnrules F '*' '%h/%d/%n'

2008-01-23

Coolendar

After reading this hypernarrative post about calendar mashups using yahoo pipes, I realized I could make my own filtered calendar feed for stuff events that are recommended to me by various sources, chiefly my last.fm recommendation feed. Since those sources tend to contain more noise than signal, at least for now, (automatic recommendation is hard, I read that somewhere,) and I tend to miss things because they get buried, I decided to take a page out of Wilbert's book, and become the editor of my very own event feed, mostly targeting myself, and perhaps one or two friends.

Since I use thunderbird with the lightning and google calendar provider plugins, which tend to visually clutter when too many events show up, I can now show only this feed there. Once a month I copy everything that looks remotely interesting from the other calendar feeds by hand, and Bob's my uncle. The yahoo pipes part is cool, and I might redirect all the feeds I subscribe to into one big source funnel yet, but for now I don't need it. Also I like to see who recommended me what, so I can unsubscribe from feeds that turn out to be of less interest to me than I thought.

So, without further ado, I present you with: teh coolendar! (The actual ical feed is here, for completeness' sake.)

2008-01-17

My top 50 artists for 2007

In descending order of listening frequency:

Bishop Allen, Bright Eyes, Belle and Sebastian, Beck, Nina Simone, De La Soul, Rilo Kiley, Joni Mitchell, Ween, Steve Earle, Aimee Mann, Johnny Cash, OutKast, John Prine, Casiotone for the Painfully Alone, Hank Williams, Dusty Springfield, The Knife, Johan, Frank Black, Gillian Welch, Tori Amos, Indigo Girls, of Montreal, Beth Orton, Gorillaz, The Flaming Lips, A Tribe Called Quest, Sultans of Ping F.C., Flip Kowlier, The Young Knives, Duvelduvel, The Thermals, Kaiser Chiefs, Billy Bragg, Jacques Brel, Missy Elliott, Elastica, M. Ward, Van Morrison, Devo, Peaches, Nouvelle Vague, Mates of State, Martha Wainwright, The View, Randy Newman, Tom Waits, Editors, Damien Rice

There you have it, gentlemen, what more evidence do you need? Hardly anything very *now*, except maybe for The Thermals, The Young Knives and The View, all of which I personally didn't discover until 2007. Not terribly hip I'm afraid, but last.fm don't lie. :)

2008-01-14

last.fm dream job

Wow, this sounds pretty amazing, it has programming, music metadata obsessiveness, and last.fm. Maybe they'd even let me use Python ;). Shame the timing's a little off, I can't really move to London right now, should I even interview successfully. I wish whoever gets the job a lot of fun!