I've just stumbled upon this service, Magpie Brandwatch. As an engineer by training, nothing quite appeals to me more than a good looking chart, but I'd like to ask a question of anyone who knows more about these things than me... How do they disambiguate brand names?
In other words, when looking for commentary about Apple, how do they differentiate between Apple and the type you eat? What if you were trying to track where people are talking about Creative. Great brandname; very very difficult to disambiguate. Virgin? Next? Palm? Shell? Gap?
And if anyone has tried Magpie Brandwatch, it'd be good to hear what it's like.
Giles Palmer says:
hi philip
i'm giles palmer - MD of Magpie.
your question is a good one as the challenge of differentiating between apple computer and apples isn't easy - it's not like comparing apples with apples :) - or maybe it is?!?
the short answer to your question is systems that have been trained by people.
we train classifiers (http://en.wikipedia.org/wiki/Machine_learning) that we have developed to recognise when a page is talking about apple computers rather than the ones you eat.
Step 1: a team here at Magpie find a couple of thousand pages that do talk about apple computer (ipods, macs, iphones etc).
Step 2: we feed these pages into a natural language classifier and it 'learns' to identify pages that are talking about apple computer.
(oh step 1.5 - develop natural language classification system - that's rather a big step)
Step 3: pass system pages that have been keyword matched to the term 'apple'. It compares these pages to what is has learned about from its training data set and makes a call whether the new page is indeed talking about apple computer rather than granny smiths. it will also attribute a rating to each page based on how confident it is.
Step 4: Brandwatch discards all mentions that don't meet a pre-defined confidence threshold and passes the page onto the other analysis systems such as sentiment analysis. it's not perfect, but if trained properly, it is very good. There is also a feedback mechanism within the system that allows users to say when the system has the matching wrong. This is used to further train the system so over time it will get better and better.
I hope that helps...
giles
24 September 2007 — 11:44 am
Philip Sheldrake says:
Thanks Giles... looks like you have been very busy!
25 September 2007 — 3:56 pm