thisblog

"Look out honey, 'cause I'm using technology..."

2010-02-25

Generating Band Names and Song Titles

Yesterday my distinguished colleague Stuart Langridge said he could use some fake band names for functional tests in the Ubuntu One Music Store he's working on. Since I'd done a similar thing once before (for realistic sounding Dutch city/village names.) I figured I could hash an algorhithm out pretty quickly, and so I did, last night.

I started with a simple Markov chain class that can analyse a body of data and then generate text that is like the source data, but does not actually occur in it. At first I used the same algorithm for artist names and song titles, but I soon decided generating songs word by word, and artist names character by character made more sense, since proper names (and a lot of band names) don't adhere to spelling rules anyway.

The song titles came out pretty great (as in, in every batch I generated there were at least a few funny ones,) but the artist names remained problematic, so the next thing I tried was splitting the artists into groups and people. This seems to be generating even better results, but splitting the list of artists (which I generated with a throwaway plugin for quodlibet, see below) turns out to be a lot of work. I did a few hundred manually, and the results below are quite cool, but there's a lot of partial duplication. Perhaps I shall finally download the musicbrainz data set, and see if I can generate separate lists of people and groups/bands from that easily. (HINT: if someone has such lists lying about, I would be immensely grateful if you could mail them to thisfred at gmail. That would be me.)

Anyway, here's a single unedited run of the script:

Jack Planter - First true love will die
The Islandry The Mountals - And We Wake Up (live acoustic)
Eef Bunyan - 08-Welcome to Rock Remix
Luna - The Deacon (Duke Dumont Remix)
Jolie Newsom - What A Christmas Duel
Dj Funky Banhardo Villah Priest - King Of The Enemy
The Ian Mouse - Carol Of The Goober Woobers
Robby Brokend - The Same Machine
Princeformeroon) - Long Dark Blues
Joha - What The Fuck Out
Cartripes Young Lips - How Blue You Can Live Without You
Rude Stooges - I Called Out Your Window?
El Perrown - And a She Wolf (Moto Blanco Radio Edit)
Jana Talmann - When My Broken Shield
Flip Kowlie Newsom - Lookin' For A Propulsion Device Based On Heim's Quantum Theory)
Kelle Shock - Used to Hate Us)
Billiams - Forever On The Verge
Iggy Polly - Extraball ft. Amanda Blank
Marton - We Are Decided
...And thers of Leon & Kypski - One Of These Days (Clifton Chenier cover)
Misse Dayton - Home On The Edge
Whisperdrag - Son of Rio Mix (Single Version)
Raftwer - Song A Day Another Day
Stard, Run Run - Begin to See What I Meant to Be Glad
Asobius Pip - Music Sounds Better with You
Ill Girl Seed - Never an Easy Way to the Center of the South
The Weeper Girls - Mind How You Feel It?
Flip Kowlie Newsom - Sondre Lerche - To Plant A Seed.mp3 [Unknown]
Alexand - Far Cry (live in the Jungle
Rocker - Veins To The City
Sean Lionhearthan - put it on the Dancefloor (John B Remix)
Trail Riot - Islands In The Gale/Josephine
Williott - Make It Home
Death - Section 7 (Hanging Around the Christmas Tree On Fire (Holy Ghost! Remix)
C-Monobotix - Never Make a Noise
MC Ricard Coxon - Hell - Part Four
Williams Jebeniana Nastarr - Drop Some Silver In The Dead
Williott - Le Le
Alexandt - Pop song for our City
Beth Lakemann - A Friend (That I've Never Understood
Corns of Happies - The Only Healer - Featuring Caroline Schutz Of Folksongs For The Winter
The Naturday - Awoken By a Horse
The Plaza Cent - Like a Mama
Emma Pop - Mogwai and Summer Walks
Ra Ricord Citi 80 - I.C. Y'All (feat. Busta Rhymes, Raekwon & Lil Wayne)
Wints Sected The Tegenwoording Cooks - To Save You
Madow - I Sold My Hands Are Made
Read - 94b Christmas in July
Franco et and - Can't Turn You Into the Pit
The Do Roots - Remember When I Was a Lover
Killalobos - How We Do Is Wrong
David Krauss - Remedy (A1 Bassline Remix) - L2
Eringfielson - Rock The Beach (Neil Young Cover Live 8/15/2003)
Mitch Harcourtis Pilar - Bathroom Gurgle (Duke Dumont Ode To Todd Mix)
Shape - Lost in the game (pt 1)
Tweakes - If You've Got Hopes
Steve Elliams - Dis policeman keeps on kicking me to the Mardi Gras In New Orleans
Billaloner - Motown Never Sounded So Good They Named it Thrice
Van Lidbo - Take me Down
Elastin Trainfully Bessy Bean Moby Grape. - How's The World Can Stop Me Worryin' Bout That Girl
Jural - Long Live The Fallen Aristocracy.mp3 [Unknown]
The Machiefs - Hunt Like the Real Thing
Case - Back In Your Window
Bennie Hollalobos - Shoes (A Bang Gang Remixxx)
Foung Afteras - It Aint Me Babe.mp3 [Unknown]
Dar Willalobotnicks - You Still Believe In Christmas
Anthony Robinson - The Pink Wig To My World Fell Down (Single Version)
erlights - Fall From Your Bed
Eddie Kennor - Last Kiss (Originally recorded by J. Bryson & 1st draft by Zaki Ibrahim)
Chrissy Elliams - Sunday Kind of Chill
Erlendrick Ense - Get Up I Feel For You
Mic Spareck Plan - Walking With a Mixx
Jay Bird - I'm In Your Area
Neko Catra - More Like It
Сергей Шнургей Шнургей Шнургей Шнургей Шнуров - Back of the Dead
Doctors - What Once Was Will Be Free
Emmy Cliff - We Are Golden (Jokers of the seasons
Sean Lionheart - CrowdedHouse - Something Special
Lesbian Cobra - If I Got 5 On It (Clean Edit)
Del Maar - ...Has A Way
Brooks Stra - The Hazards of Love
Chiness Candy - Brahms: Studies, Anh 1A/1 - Presto - Allegro con spirito
Hermanna Nadle - The Other Version (ft. Kid Cudi)
Tokyo Police - Got It and Grab It
A Silents - Your Ex-Lover Is Dead (Remaster)
Tigerince - Now I'm Here You're There (Mexicans with Guns Remix)
Page Fays - I Won't Be That Way
J Dieneman - Got To Make You Strong
The Walkmena Vistener - Someone to Love You Until My Veins Again
Digable Strung - Standing At the speed of life
Willalobos - Sinatra - It Was
Wayne Staalendricks - Steven McCauley for President (Exclusive)
Bonna - Devil Made Us Do It Again
Palaxy - small town (live)
Territsen - We Got The Money I've Got It Bad and Young Jeezy: National Anthem
Shears - A Lonely Construction Worker
Garvie - Walking On A Cloud Of Smoke and Sassafras
Broobinski - Lords of The World, Jonah
Williams - Lake Shore Drive (Todd Terje Edit)
Two Bassibles - Madmen's Discotheque (Disconet Casey Jones (On The Road

As you can see there's a lot of names that are too close to real names to be interesting, and quite a few common patterns. Also broken parentheses, due to (my implementation of) Markov chains being only mildly context sensitive. (See how I used that term in an actual sentence? Totally worth it, that education.) Also, I can't guarantee that none of these titles or artist names aren't actually real, because of trivial lower case/upper case and or white space and punctuation differences, or, due to the artist not being in my source data set.

And here's the code that generated it:


import random

class Markov(object):

    def __init__(self, words=False):
        self.db = {}
        self.lines = set([''])
        self.words = words
        if words:
            self.prevs = 2
        else:
            self.prevs = 3

    def process_file(self, filename):
        with open(filename, 'r') as file:
            for line in file:
                self.process_line(line)

    def process_line(self, line):
        self.lines.add(line.strip())
        prevs = []
        for i in range(self.prevs):
            prevs.append(None)
        if self.words:
            line = line.split()
            line.append('\n')
        for character in line:
            self.db.setdefault(
                tuple(prevs), []).append(character)
            prevs.append(character)
            prevs = prevs[1:]

    def generate_line(self):
        line = ''
        tries = 0
        while line.strip() in self.lines and tries < 100:
            tries += 1
            prevs = []
            for i in range(self.prevs):
                prevs.append(None)
            line = ''
            while True:
                char = random.choice(self.db[tuple(prevs)])
                if char == '\n':
                    break
                prevs.append(char)
                prevs = prevs[1:]
                line += char
                if self.words:
                    line += ' '
        return line.strip()

n = Markov()
n.process_file('names.txt')

g = Markov()
g.process_file('groups.txt')

t = Markov(words=True)
t.process_file('titles.txt')

for i in range(100):
    x = random.choice([n, g])
    print x.generate_line() + ' - ' + t.generate_line()

[Edit]: removed a redundancy left by earlier refactoring.

And here's the dead simple quodlibet plugin, just to show how cool quodlibet is. Note that quodlibet, unlike for instance the also quite nice Rhythmbox, would allow you to do this (or much more interesting things) with *any* id3 tag, including ones you make up yourself.:
import os
import const
from plugins.songsmenu import SongsMenuPlugin

class AddToListPlugin(SongsMenuPlugin):
    PLUGIN_ID = "Export artist list"
    PLUGIN_NAME = _("Export artist list")
    PLUGIN_DESC = _("Add artist name to artists.txt.")
    PLUGIN_ICON = "gtk-find-and-replace"
    PLUGIN_VERSION = "0.1"

    def player_get_userdir(self):
        """get the application user directory to store files"""
        try:
            return const.USERDIR
        except AttributeError:
            return const.DIR

    def plugin_songs(self, songs):
        f = open(os.path.join(self.player_get_userdir(), "artists.txt"), 'a')
        artists = set()
        for song in songs:
            artist = song("artist")
            if artist in artists:
                continue
            artists.add(artist)
            f.write('%s\n' % artist)

2009-02-05

Subscribing to google groups with a non gmail mail address

Dear lazyweb, I keep running up against this, and it may be documented, but not any place I could find easily:

I want to subscribe to a google group with a non gmail email address, and I can't seem to do it myself. I know as admin of several groups that owners can subscribe or invite people with different email addresses, and I figured that maybe I could just send a mail to [nameofgroup]+subscribe@googlegroups.com, since the +unsubscribe version of that works. I had a brief moment of "doodgemaakt met een blije mus"-erlebnis when I got a confirmation mail, but alas, the link seemed to be invalid. Anyone have any bright ideas?

[edit:] Actually there is a separate page where you can manage all your group subscriptions, and *there* it is possible to select all your registered email addresses on a per group basis. Yay!

Go to http://groups.google.com/groups/mysubs

[edit:] actually still doesn't work, because only my gmail address shows up in the dropdowns, not my other registered and confirmed address.

[edit:] And it suddenly just works now, I noticed the other day. (as in the other email address shows up in the dropdown now.) Cool!

2008-11-02

Ubiquity command: lastfm

I found myself tagging all the songs of the soundtrack of the tv series 'Weeds' on last.fm (as you do,) when I realized the search on last.fm is less than optimal for this purpose. It almost always takes three clicks to get to the page I need, even if I have the exact spelling for an artist/track. Enter Ubiquity, the new firefox extension that allows you to add and use simple commands to your browser. In less than an hour, most of which was spent reading the excellent tutorial, the command was working.

The script can be found here on github.

To use it, open ubiquity with your command key, then type:

lastfm [artistname]

or:

lastfm [artistname] - [title]

to go directly to the artist or track page on last.fm in a new tab. Because of the way ubiquity works, you can also select text on a page, and call the command with the selected text as an argument.

Note: this is pretty rough around the edges for now: If the artist name or the track title contain dashes, it will not end up on the correct page. Also, I'm not sure how I can host this so that others can subscribe to the command, and automatically get changes. I currently don't have any 'real' hosting. Maybe github allows me to do this, but I haven't figured it out yet.

P.S.: Here's the link to the tag, and the one to the soundtrack for True Blood, which is a work in progress. Both have very good and diverse soundtracks, which make for nice impromptu last.fm radio stations:

Weeds tag radio
True Blood tag radio

2008-11-01

A better anti-spam method

Recently an idea on how to combat spam occurred to me, that I haven't heard before. It's very low tech, which I find attractive, and it would never result in false positives, which is the most important shortcoming of all the systems I've tried up until now.

What if everyone in the world had an additional email address, that they would never ever use or give out, but that would be publicized in places where only email address harvesting bots would encounter them. A bit of handwaving here, but in its simplest form, just put your real and your spam email address on your webpage, completely unobfuscated, (perhaps even in mailto: links!) but stipulating (in a clear, but not easily machine readable way,) that people should not send mail to the fake/spam one should be enough.

Then *any* message that arrives in the spam account is necessarily spam. You can now use that account to filter your real account, by removing messages with a body that is identical, or similare enough. Again some handwaving on similar enough, (do this wrong, and voilà: false positives again) but you get the drift.

This kind of filter could be implemented by a provider, where the user would not have to do anything manually, except putting the fake address out there for the harvesters to find, or it could be implemented client side, where the mail client gets all mail from both accounts, and does its thing.

And another thing: I also think the current bayesian filters could be improved upon, by recognizing more/different patterns than just lexical ones. I have an intuition that character based markov chaining could catch a lot of spam I get: I built a small script in college which could reliably distinguish quite a number of languages. That would get rid of all the mail addressed to me in languages I cannot read, which I would classify as spam. Taken further this could also get rid of intentionally misspelt spam, to the detriment of poor spellers that want to send me legitimate mail, or (specific patterns of) html markup in mails, which would get rid of most all the other tricks one could use to show the word 'viagra', without actually writing it.

I might have a go at this, to see if my intuition that these could be better predictors than word counts is correct. (I would train my dutch email and my english email into separate 'ham languages', and everything showing entire unpredicted character sequences, would be unsure/spam. Marking things as spam could train a 'spam' language, so there could be both positive and negative indicators.)

2008-08-09

Autoqueue goes cross-player!

When m' colleague Sylvain expressed an interest in porting my autoqueue plugin for Quod Libet to itunes, I experimented a little with factoring out all the generic parts, and it turned out the player specific stuff isn't all that much, so I decided to do a little work and see what kind of problems I would run into when porting it to another player. I chose Rhythmbox for my experiment, since it's in my Ubuntu anyway, and it has support for python plugins.

Turns out it was pretty easy. I have a large part of the featureset working in less than a day, with a lot of help from this page:

http://live.gnome.org/RhythmboxPlugins/WritingGuide

and example code in Alexandre Rosenfeld's lastfmqueue plugin:

http://code.google.com/p/airmindprojects/source/browse/#svn/trunk/rbplugins/lastfm_queue

which offers similar functionality, but is a little more lightweight (less features/bloat, depending on how you look at it ;)

I also moved autoqueue into it's own repository, since it's now no longer solely a Quod Libet plugin, nor, hopefully, a single developer effort. If you're a rhythmbox (or Quod Libet) user and you're interested in checking an early, but working version out, get the plugin here:

http://code.google.com/p/autoqueue/source/browse/trunk

You'll need autoqueue.py, rhythmbox_autoqueue.py, and rhythmbox_autoqueue.rb-plugin. Drop those in your ~/.gnome2/rhythmbox/plugins directory, start rhythmbox, and activate the autoqueue plugin.

If you have questions, feature requests, or would like to help with porting the plugin to your favorite player, you can contact me directly, or even better, join the autoqueue mailing list here:

http://groups.google.com/group/autoqueue

2008-08-04

mp3spider becomes barbipes

Just a short note: After my colleague Sylvain showed some interest in my mp3spider script, and actually built some really cool new features for it, I decided to split it off into its own little project rather than keep it in my supremely unimaginatively named 'thisfred-python-stuff'.

After a very short google I found this cute little critter:

http://en.wikipedia.org/wiki/Saitis_barbipes

And so, from now on, the mp3spider will be known as barbipes and can be found here:

http://code.google.com/p/barbipes/

Anyone interested in contributing, just drop me a note at my usual username at gmail and I'll give you check-in rights.

2008-06-29

Quod Libet Plugins Released!

After working on them for a long time, and then procrastinating for at least as long on wrapping them up into releasable shape, I'm sort of proud to announce my plugins for Quod Libet, (the best music player I have yet found):

http://code.google.com/p/thisfred-quodlibet-plugins/downloads/list

There are three plugins in there, in order of increasing complexity and interest:

  1. autosearch.py

    Very simple plugin, searches for the title of the current song in your library: Good for getting rid of duplicates, and finding possible covers.

  2. lastfmtagger.py

    Useful only if you have a last.fm account and make use of tags there. This will synchronize last.fm tags both ways, saving them in a custom 'tag' id3 field in your local files. Since Quod Libet has a great id3 editing interface (Ex Falso, also usable as a stand alone application,) this makes adding and editing tags to songs, artists and albums on last.fm much easier.

  3. autoqueue.py

    This gets similar tracks to the ones you play from last.fm and puts them in the queue. It is smart enough not to play the same artists/songs for a configurable time, and has some other options (for instance it can also look up similar songs based on the tags created by the lastfmtagger.py mentioned above.) It works pretty well in creating a consistent yet not wholly predictable listening experience if you have a large and diverse library.

As always, feedback is very welcome, I'm sure there are some bugs left in there, or at the very least some rough edges.

(edit: of course within minutes of posting this I find out last.fm's 2.0 API has gone official, so it appears that once again there is some work to do :) Apparently there's no API for getting similar tracks anymore though. That's a shame...)

2008-06-19

last.fm adds event radio

Cool! Now I don't have to tag the artists for the festivals I'm going to by hand, which always felt a little redundant. (I still might though, since I synchronize the tags with my local library, and it's nice to play all the songs I have locally from a particular edition of a particular festival.)

Anyhoo, urls like (for lowlands 2008, which is going to be very good):

http://www.last.fm/listen/event/436106

allow you to listen to a randomized stream of the entire lineup. A feature I have been rooting for for a long time. Last.fm, once again you come through.

2008-02-23

The Musical Gardener's Tools #5: Yet Another Way to Harvest mp3blogs

Update 2008-03-11: There were a number of things wrong with this script making the spidering *waaaay* slower than it needs to be. Fixed that below, and added threading for both the spidering and downloading, thanks to this cool recipe by Wim Schut which lets me run all the sqlite code in a separate thread. (Important because you can only use sqlite connections in the thread in which they were created.) All of this results in a nice speed-up.

Ok I said I wasn't going to, but I did end up writing a bit of code, although it didn't get too far out of hand. Yet :). It solves *all* of my problems: it does not download files over 30MB in size, and it never downloads the same link twice.

I found this message on the python mailing list, which seemed like a very good start. It almost did what I needed, but not quite, and also the parsing was overcomplicated and didn't catch all links, so I replaced that with a simple regular expression.

I ended up changing most of the code and functionality, (for instance it now stores links in a database.) There's a lot of hard coding in there, which I could factor out if people want to use it, but for now it solves my problems beautifully ;).

It's used with the following syntax:

# initial set up
python spider.py createdb
# add a new blog to be harvested
python spider.py add http://url.of.blog/
# (shallowly) spider all blogs for new links to files
python spider.py
# spider a url to a specific depth (5 for example should get 
# most everything, but will take a while)
python spider.py deepspider 5
# download all files
python spider.py download

A minor problem is that curl doesn't do *minimum* file sizes, and with a lot of broken links it does download something small that isn't really an ogg or mp3 file, but a http response. I can probably solve this better, but for now I call the download from an update script as follows:

python spider.py download
find . -iname "*.mp3" -size "-100k"  -print0 | xargs -0 rm
find . -iname "*.ogg" -size "-100k"  -print0 | xargs -0 rm
find . -iname "*.mp3" -print0 | xargs -0 mp3gain -k -r -f
find . -iname "*.ogg" -print0 | xargs -0 vorbisgain -fr

Translation: download files, throw away suspiciously small ones, mp3/vorbisgain what's left.

Here's the code:

Edit 2008-04-18: Moved the code to google code, so I don't have to update it here. Find the latest version here: spider.py

2008-02-12

Reminders via del.icio.us and Yahoo Pipes

Just thought I'd share this, while we still *have* del.icio.us and yahoo pipes... ;)

A while ago I stumbled agross tagmindr, which seemed like a cool idea: put a custom tag on some url in your del.icio.us account and it will remind you at a certain date to look at that url again. I frequently see announcements on website that say something like check back here on [some date] for [some interesting news]. Having automatic reminders for things like that are great, because the chances of me forgetting otherwise are near 100%. The thing is: the personal feed tagmindr promised me *NEVER WORKED*. That's how I completely forgot about a number of things, and tagmindr itself for a while. No biggie, just not very smart if you wanna get all start-uppy and generate buzz ;)

So I decided, how hard can it be to get this right? Turns out not hard at all! I built a yahoo pipe in under an hour that does exactly the same thing. You just give it your del.icio.us username, and it gives you reminders for anything you tag with the tags 'reminders' and 'remind:yyyy-mm-dd' where you replace the ys and ms and ds with the date you want to be reminded on.

The yahoo pipe is here. Enjoy! (As always, feedback and bug reports very welcome!)

Yahoo pipes rock, del.icio.us rocks. Let's hope Yahoo can hold out. I have nothing against Microsoft per se, but I don't think they ever got the web, and I fear they will screw up the nice and open things (like YUI, for instance) that Yahoo has been developing in the past few years.