"Look out honey, 'cause I'm using technology..."

2008-11-02

Ubiquity command: lastfm

I found myself tagging all the songs of the soundtrack of the tv series 'Weeds' on last.fm (as you do,) when I realized the search on last.fm is less than optimal for this purpose. It almost always takes three clicks to get to the page I need, even if I have the exact spelling for an artist/track. Enter Ubiquity, the new firefox extension that allows you to add and use simple commands to your browser. In less than an hour, most of which was spent reading the excellent tutorial, the command was working.

The script can be found here on github.

To use it, open ubiquity with your command key, then type:

lastfm [artistname]

or:

lastfm [artistname] - [title]

to go directly to the artist or track page on last.fm in a new tab. Because of the way ubiquity works, you can also select text on a page, and call the command with the selected text as an argument.

Note: this is pretty rough around the edges for now: If the artist name or the track title contain dashes, it will not end up on the correct page. Also, I'm not sure how I can host this so that others can subscribe to the command, and automatically get changes. I currently don't have any 'real' hosting. Maybe github allows me to do this, but I haven't figured it out yet.

P.S.: Here's the link to the tag, and the one to the soundtrack for True Blood, which is a work in progress. Both have very good and diverse soundtracks, which make for nice impromptu last.fm radio stations:

Weeds tag radio
True Blood tag radio

2008-11-01

A better anti-spam method

Recently an idea on how to combat spam occurred to me, that I haven't heard before. It's very low tech, which I find attractive, and it would never result in false positives, which is the most important shortcoming of all the systems I've tried up until now.

What if everyone in the world had an additional email address, that they would never ever use or give out, but that would be publicized in places where only email address harvesting bots would encounter them. A bit of handwaving here, but in its simplest form, just put your real and your spam email address on your webpage, completely unobfuscated, (perhaps even in mailto: links!) but stipulating (in a clear, but not easily machine readable way,) that people should not send mail to the fake/spam one should be enough.

Then *any* message that arrives in the spam account is necessarily spam. You can now use that account to filter your real account, by removing messages with a body that is identical, or similare enough. Again some handwaving on similar enough, (do this wrong, and voilĂ : false positives again) but you get the drift.

This kind of filter could be implemented by a provider, where the user would not have to do anything manually, except putting the fake address out there for the harvesters to find, or it could be implemented client side, where the mail client gets all mail from both accounts, and does its thing.

And another thing: I also think the current bayesian filters could be improved upon, by recognizing more/different patterns than just lexical ones. I have an intuition that character based markov chaining could catch a lot of spam I get: I built a small script in college which could reliably distinguish quite a number of languages. That would get rid of all the mail addressed to me in languages I cannot read, which I would classify as spam. Taken further this could also get rid of intentionally misspelt spam, to the detriment of poor spellers that want to send me legitimate mail, or (specific patterns of) html markup in mails, which would get rid of most all the other tricks one could use to show the word 'viagra', without actually writing it.

I might have a go at this, to see if my intuition that these could be better predictors than word counts is correct. (I would train my dutch email and my english email into separate 'ham languages', and everything showing entire unpredicted character sequences, would be unsure/spam. Marking things as spam could train a 'spam' language, so there could be both positive and negative indicators.)