I am an Applied Researcher working in the Bing Core Relevance team. I collaborate closely with a number of researchers at MSR and Bing.
I have several abiding interests in Information Retrieval, including Evaluation and Measurement, Personalization and User Modeling, and Ranking.
I have been involved over the years in TREC as a co-coordinator of the Web track (in the period 1999-2000) where I worked with David Hawking and Nick Craswell in developing and releasing a number of widely-used test collections, including WT10g. More recently, I was a co-coordinator of the Enterprise track (in 2007-2008), where we developed the CERC test collection. I also review papers for several conferences, including SIGIR, WWW, WSDM. I wrote a short series of tips on paper writing for SIGIR (see SGIR Tips tab), and occasionally try to publish papers and posters with colleagues.
While at Microsoft I have worked on a number of measurement, ranking and recommendation problems at the heart of Bing, including measurement of whole page relevance, and a slew of issues related to personalization of ranking. For about 3 years, I was the dev manager for the Bing Contextual Relevance team, which introduced personalized and contextual Web ranking to Bing’s results for the first time. We collaborated closely with the MSR CLUES team, helping to carry out tech transfer of various user modeling techniques.
My current research interests centre on semantic comprehension of text, with especial application to information interaction technologies like search.
SIGIR Paper Writing Tips
SIGIR Paper Writing Tips by Peter Bailey is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License.
I wrote this series of short tips and published them individually to my Twitter feed while reviewing for ACM SIGIR 2013. Lots of people wrote nice things about them and suggested I collate them; Twitter not being good for long term access/archiving. The 140 character limit imposes a certain terse style. I’ve made additional comments to explain sometimes opaque tips.
I don’t claim to be an expert, nor have I published at SIGIR as often as I have reviewed for it, so get other advice as well please! If you are submitting to forums other than SIGIR, note that their acceptance requirements will differ, and you should address these directly.
If you are submitting to SIGIR, and you ignore all (or even a significant subset) of these tips, your acceptance probability is likely to be even lower than the historical ~<20%. Being published at SIGIR is hard anyway, so even if you follow all these tips, there’s no guarantee your work will be accepted.
Tip 1. Titles describe the paper; don’t mislead, overpromise, or be unduly punny. Less is more. A tweet’s length is too long.
Tip 2. Abstracts summarize the paper. In 1 short paragraph. Subject. Motivation. Goal. Method. Achievement. Make me want to read.
Tip 3. Intro: What are you doing? Why is it interesting and novel? Will I learn something in the next hour? 1/2 to 3/4 page max.
Tip 4. Related Work. 30-40 citations is sweet spot; less means you don’t know the area. Don’t omit key works from other people.
Tip 5. Report experimental results with appropriate statistical info like error bars, confidence intervals, effect size. Please.
An increase in scores is not necessarily an improvement, it’s just some probability of improvement, which may be near zero.
Tip 6. ML is a means to an end, not the main meal. Feature engineering is only interesting if it gives insights on user behavior.
SIGIR is not ICML. ML helps us accomplish things, but the goal of IR is not better ML.
Tip 7. Use Greek letters and maths/stats for precision and brevity. Explain intuitions in English. Don’t assume familiarity.
Maths and stats is complex at the best of times, so make it easy to follow both what you’re doing and why.
Tip 8. Unless space-constrained, put lead author’s name before citation cross-ref, so I don’t have to flip back and forth always.
Tip 9. Citations. Run cross-ref update in Latex or Word before final submission. Check author names’ spelling; they might review!
Tip 10. If evaluating with a test collection, use more than 1. Don’t use pre 2000 exclusively. There’s more to life than ad hoc.
Ad hoc IR is great! It’s really foundational for many things. It’s not typical web search behavior however, nor is it micro-blog use behavior, nor …. A really interesting result is something that will generalize, like BM25 (which came from ad hoc IR research). Almost any test collection will have some peculiarities; make sure your findings are not overly limited.
Tip 11. Working with big log data doesn’t make it interesting. Insights make it interesting. Especially ones which generalize.
This tip is a cousin of Tip 6; understanding matters more than just being able to run some experiments with lots of data.
Tip 12. Building real IR systems involves much compromise. Don’t overclaim for your new algorithm unless you’ve already tried it.
As an industrial researcher myself, it’s frustrating to read lines like “it should be easy to incorporate new algorithm and improve current systems”. Unless the algorithm is making a 10% improvement in some (existing, industrial state-of-the-art) metric, it’s hard to justify the investment.
Tip 13. What’s product interesting is not always research interesting, and vice versa. What’s both is not always publishable.
Industrial researchers in particular can fall prey to this trap – Nick Craswell articulated this crisply for me a few years ago, and understanding the line between them is a regular point of discussion internally for us within Microsoft. Sometimes it’s challenging to publish or otherwise share work about what’s interesting and commercially sensitive. Similarly, an algorithm that obviously won’t scale can still be useful if it provides insights into thinking differently about a problem.
Tip 14. Graphics. Make legends readable when printed on paper. If color essential to interpret output, make note in fig caption.
Tip 15. When selecting data to exclude, describe what you did and why. Otherwise it’s not clear what the bias in dataset will be.
From a reproducability standpoint, which is increasingly a part of the SIGIR reviewing guidelines, it’s essential to describe what you did (and did not) do.
Tip 16. Negative results can be interesting. What didn’t work, and why not? Surely not all expts led directly to success? #dreamon
Tip 17. Your first audience is overworked, on a deadline, tired, had a glass of wine and put the kids to bed. That is your prior.
This tip is just “know your audience”, and the environment under which they’re likely to read your paper. Make them happy to be reviewing, and make them work as little as possible. Reviewers want to accept papers, and are more likely to give useful feedback if it is clear you put in the maximum amount of work you could to get it accepted, not the minimum.
Tip 18. Two words: Strunk and White. Read then follow: Rules of usage. Principles of composition. Matters of form. Misused words.
Tip 19. Don’t pad out with lower quality work. >4:5 papers are rejected. Many great 8 page papers in past. Make it all shine.
SIGIR papers are often rejected due to their weakest part, not accepted due to their strongest. You can’t afford weak parts.
Tip 20. Make your contributions clear. Both at the start and end of the paper. Also, Conclusions are not a summary. What changes?
Restating the paper’s contributions does not make a conclusion – I already read it. I want to know how the future may be different due to your work.
Tip 21. Acknowledge those who helped you out. Thanks cost little, while forgetting may offend. If big help, make them a co-author.
End of Tips: Thanks for all the feedback from the Twitter audience, hope you enjoyed them. What did I miss? What do you most love/hate in reading or writing or reviewing SIGIR papers?
Tips – Thanks: A big shout out to many colleagues over years, especially to Dave Hawking, Nick Craswell, Ryen White & Susan Dumais.