Very nice. I worked on a similar project a few years back. This is actually very similar to the MusicBox project by Anita Lillie back in 2008 (see demo at: http://thesis.flyingpudding.com/videos/demo/index.html, thesis at http://thesis.flyingpudding.com/), which itself was build on top of the analysis provided by Echonest.
Using the open-source aubio for the analysis and building playlist instead of working out a new player are very good decisions. When we tried to do something similar, this was also the direction we picked, and then had layers for sending playlists to various players.
Now what’s missing here is an UI to “build” the playlist visually (check the demo in the link earlier). The principle for building such an UI is very simple: instead of just 1 distance between 2 songs, you have a set of N distances (corresponding to similarity to various parameters such as rhythm, loudness, pitch, but also tag metadata, etc) which is then reduced to 2 dimensions (using PCA), and you get a 2D map of all your music library. Then you can draw a path for building your playlist.
This is in my opinion the only sane answer to “the music classifying nightmare” (http://blog.pkh.me/p/15-the-music-classifying-nightmare.html – author here).
For a more traditional approach to solving the music classifying problem, I’d recommend looking at https://beets.io/
Thanks a lot for the very interesting references - I’ll take a look at implementing the PCA for blissify, to check how it performs.
I’ve actually checked the landscape of tools like this before starting the project, and saw that there were a lot of music similarity thesis, but very few tools actually usable “out of the box”.
So, instead of trying to make something really innovative, I’ve tried to aggregate the existing results to build a (somewhat) usable / maintainable “real-world” tool.
I’m unfamiliar with audio classification, but I’ve made a lot of mix “tapes”, and I pay a lot of attention to track ordering and transitions (usually crossfaded.)
IMO it’s very important to consider the beginnings and endings of tracks, not just an average of the whole track. Many pieces of music have lengthy intros or outros that are distinct from the rest, and many just end in a different emplacements than they began. “Stairway to Heaven” and “Don’t Stand So Close To Me” are good examples, or Sonic Youth’s “Expressway To Yr Skull”. And how do you average out “Bohemian Rhapsody”, or King Crimson’s “Fracture”, which starts extremely softly, has some actual silence in the middle, and ends as frantic art-rock?
I agree that it’s very important, and it all boils down to find a way to summarize a track’s features.
Right now, we use mean and median for most of the features, because it gives good enough results, but we do store features throughout the whole song to be able to change that if it proves to be useful, and summarize them at the end (f.ex. https://github.com/Polochon-street/bliss-rs/blob/master/src/timbral.rs#L56-L65).
One thing that is also frustrating is when a song from a gapless album comes, and it just transitions to something else. I’ve attempted to deal with that (no very successfully) a long time ago, but it’s also definitely on the long-term list of things to implement for bliss-rs.
Love this project, seems like it was made with love. I used to want to do this years ago but never started it. I wanted to make a Markov chain to decide whether to hop from one genre to the other, with the goal that you could transition from say classical music to jazz in a long enough playlist by going through close enough subgenres.
This looks interesting indeed. However I am not sure whether I believe in the premise that songs that are “close” together make up for a good playlist. At least for me. There needs to be some contrast every now and then to keep it interesting. Also songs can be close together in a way that is completely undetectable for any algorithm. (Think: they both were featured on that mix tape that we had on continuous play on that one trip ten years ago. )
In The Netherlands we had (probably still have) a radio station that had no DJ’s, only songs and occasional commercials. The idea behind it was not to try to create a playlist with songs a certain target demographic really liked, but to reverse it: to make a playlist that was the least annoying to the whole population. The idea being, I guess, that if people get annoyed, they change stations and don’t hear the commercials. This should make it an ideal station for workplaces etc.
And they did it. It was a very smooth blend of songs and evergreens that would quickly dissipate into the background and it was a popular choice for a lot of those situations. It also totally drove me up against the wall if I had to listen to it for more than 30 minutes.
That’s a tough problem indeed. I’ve chosen that premise for a couple of reasons:
This software is mainly aimed at private people, who own small to medium-size libraries (I have around 7k songs and I consider that “small” if that can give you an idea). So, even if you only chose songs that are close (by whatever metric, here we’ll consider that they’re close if they “sound” the same), the playlist will eventually reach a point where it has to evolve, for the sheer lack of really close songs to the starting song.
Of course, if you’re Spotify, then you need to implement some variety there, otherwise you’ll indeed circle in the same genre too much.
It all boils down to the distance metric you chose to use to make playlists. Even if you take the very simple “I’m taking a starting song and want to make a playlist of the closest songs out of it”, you still have (at least) two ways to deal with that (I’ve written a bit about this here, section 4.2). You can either chose all the songs that are close to your first song (in that case, you probably won’t drift much from that song’s genre, but you might have rough transitions between songs themselves, see the drawings in the link). You can also just draw a “path” between close songs, in that case you might drift away from the starting genre very fast, depending on what your library looks like.
You’re not limited to these two distance metrics (and I’ve tried to make it fairly easy to customize in bliss-rs, since you can just get the raw features and experiment with them). You could for example use cosine or similar distances, where you’d go “in a direction” - which would roughly translate to, if you chose a very calm song, “give me just calm songs, I don’t care if it’s acoustic guitar or just electronic ambient music”.
And that’s without touching the weight of the distance metric - if you decide that the chroma features (roughly “songs with the same pitch”) are the most important ones and you attribute more weight to them, you’d probably end up with a playlist of very different songs, with transitions you’d perceive as somehow natural because the pitch class would be roughly the same. (I’m not a musicology expert though, so please correct me if I’m wrong!)
TL;DR: You can basically tweak the meaning of “close” so that it fits your definition of a good playlist by either changing the distance metric or the weights of the distance metric :D
Ah, I see it now. I was a bit focused on the word “close”, but your answer made me realize that one can get very creative with these ingredients. I hope I have some time this weekend to play with this. Thanks for linking your thesis!