Archive for the 'rant(ish)' Category

h1

IJCAI 2011: Part 1

Wednesday, July 27th, 2011

I was at IJCAI last week; this was my first major academic conference and I very much enjoyed it.

IJCAI is a well-established bi-annual conference (since 1969) and the acronym stands for “International Joint Conference(s) on Artificial Intelligence”. The term “Artificial Intelligence” is an interesting one, and it came up in several conversations throughout, though I won’t go into it here (we can pub-talk this if we meet!). But in short, it spawned from the fact that it is a very large and broad conference with ~1500 participants, and I think a few of us were struggling to figure out where we fit.

Tangential Modularity Rant

Personally, I was playing the community-finding card, specifically a community of roles in the network (at least that’s what I’ve been convincing myself!), so I was browsing the clustering, web mining topics, and bits of the search landscape. My impression is that there is a lot of work going on that focussed on recommendation/prediction (I suppose this is a kind of ‘intelligence’ after all, so shouldn’t be surprising) but I also noticed that Newman’s modularity heuristic was very popular.  It was often used as a basis of a sub-part of a solution (to another problem) or formulated differently in order to scale it up, with little requirement in interpretation of results, which I think is a pity.

In other words, results sections often consisted of graphs showing how fast so and so performed in comparison to other methods, but the found clusters themselves usually omitted. This is fair enough if they were generated from synthetic data, but for any real-world data, the only argument as to how they are good is that the smaller the Q value, the better it is which is too weak IMHO. If I agree with your assumptions, then perhaps this makes sense, but it does require a leap of faith, and Fortunato & Barthélemy does a good job of making this leap look bigger than a simple hop.

Furthermore, although it was highlighted to me that my approach may be too specific to social networks (semi-supervised approach to finding roles), that for very large networks it may not be possible to interpret everything, I still think there should be a place for thoughts on the trees than the wood. In particular, the very first workshop talk I attended, Detecting Communities from Social Tagging Networks Based on Tripartite Modularity (as well as a few others in the IJCAI proceedings itself), at least tried to show some of the clustered results and acknowledged that methods beyond modularity may be considered. Ok, less rant, more conference details.

Workshops/Tutorials

The conference officially started on Monday evening but workshops and tutorials were run during the weekend (and including the Monday) before. I took part in the Link Analysis in Heterogeneous Information Networks workshop which I unfortunately have to say, wasn’t very well-organised. The talks were interesting enough, but it became clear that not a lot of thought was put into the scheduling of them; there wasn’t really any themes within/connecting the sessions.

The timing was also a bit off. Workshops were held in parallel with specific times for breaks, and we didn’t align to them in the second half of the day which resulted in some confusion and the workshop finishing early. There was also promise of a wiki where the slides and information on participants were to be put up, which I thought was a good idea at the time but I haven’t heard anything about it yet… However, all workshop proceedings are available in one place, which is handy. As for recommendations, I would look at the Web Mining one in the tutorials.

Invited Talks

I didn’t manage to make it to all of them (9am Barcelona time meant 8am Irish time, which, combined with 20-30 minute journey is a near-impossible feat for me). However, of the ones I did attend, two particularly stuck out – Daphne Koller who spoke about “Rich probabilistic models for image understanding” and Jonathon Schaeffer on “The Games People Play Revisited”. This is not to say the other talks weren’t well presented but these two (I felt) struck the right balance between detail and, what I consider often difficult to do without sounding like a blind fan, genuine enthusiasm & passion.

Remember, although they were addressing to a very broad audience, a common theme is that they all have a fairly technical and critical minds, which can be hard to please. Both formats of the talks were similar in that they stated the problem in plain English terms, and then showed the incremental (but important) developments in the field – how aspects of one approach worked, why it didn’t overall, what approach was popular, what inspired a new one, and what they think is the “future”. I especially like Jonathon’s slide below (quoting his wife), which he very proudly gave his response to each line, and in particular to the final one where he said he didn’t know what that meant – this (I assume he meant research into games) is his life 🙂

Note I’ve ended up splitting this entry into two parts, as I had more to say than I’d expected (here’s Part 2).

h1

Comparing clusterings

Monday, December 13th, 2010

I am currently was looking at how to compare (at least) two algorithms’ clustering results and had Wagner & Wagner’s Comparing clusterings: An overview as a starting point, which appeared to be a longer and less useful version of Meila’s 2002 paper. Anyway, in short, I had decided to go with the latter’s suggestion of using Variation of Information (VoI) as a measure. My actual problem is that I have a bunch of data – run an algorithm on it – and the results are essentially clusters. Thus, I need a systematic way of evaluating how ‘good’ these clusters are. This VoI will hopefully be useful as it can give me an indicator of which are the best sets for me to humanly look at (and make some sort of interpretation of).

I wrote a little script (which took far too long, mostly ‘cos I had to re-learn how to program after not having done much in about two years) in Python so if anyone wants to borrow it, feel free to contact me. I’ll post a sample calculation of VoI at some point too.

On a slightly (un)related note, I am getting tired of writing damn little Python scripts…little things require little scripts which requires a little more time… little + little = big, like few + few = lots. Grr…