Ranking is forever at the core of Information Retrieval since it allows to sift out non relevant information and to select a list of items ordered by their estimated relevance to a given query. Documents, Information needs, search tasks and interaction mechanisms between users and information systems are getting more and more complex and diversified, and this calls for more and more sophisticated techniques able to cope with this emerging complexity and the high expectations of users.
Learning to Rank (LtR), and machine learning in general, have proven to be very effective methodologies to address these issues, significantly improving over state-of-the-art traditional algorithms. Popular areas of investigation in LtR are related to efficiency, feature selection, supervised learning, but many new angles are still overlooked. The goal of this workshop is to investigate how to improve ranking, in particular LtR, by bringing in new perspectives which have not explored or fully addressed yet by our community after the 2011 Yahoo Learning to Rank Challenge.
In particular, we wish to encourage researchers to discuss the opportunities, challenges, results obtained in the development and evaluation of novel approaches to LtR. New perspectives on LtR may concern innovative models, study of their formal properties as well as experimental validation of their efficiency and effectiveness. We are in particular interested in proposal dealing with novel LtR algorithms, evaluation of LtR algorithms, LtR dataset creation and curation, and domain specific applications of LtR.
We invite papers from researchers and practitioners working in Information Retrieval, Machine Learning and related application areas to submit their original papers to this workshop.
The workshop proceedings are available online as a volume of the CEUR-WS proceeding.
Submission deadline: August 14, 2017
Notification of acceptance: September 4, 2017
Camera ready: October 16, 2017
Workshop day: October 1, 2017
General areas of interests include, but are not limited to, the following topics:
Papers should be formatted according to the ACM SIG Proceedings Template.
Papers should be four-six pages (maximum) in length.
Papers will be peer-reviewed by members of the program committee through single-blind peer review, i.e. authors do *not* need to be anonymized. Selection will be based on originality, clarity, and technical quality. Papers should be submitted in PDF format to the following address:
https://easychair.org/conferences/?conf=learner2017
Accepted papers are published online at the following link:
Nicola Ferro, University of Padua, Italy
ferrodei.unipd.it
Claudio Lucchese, ISTI-CNR, Italy
c.luccheseisti.cnr.it
Maria Maistro, University of Padua, Italy
maistrodei.unipd.it
Raffaele Perego, ISTI-CNR, Italy
r.peregoisti.cnr.it
Roi Blanco, Amazon, Spain
Jiafeng Guo, Chinese Academy of Sciences, China
Craig Macdonald, University of Glasgow, UK
Fabrizio Silvestri, Facebook, UK
Arjen de Vries, Radboud Universiteit, The Netherlands
Hamed Zamani, University of Massachusetts, Amherst, USA
Arpita Das, Saurabh Shrivastava and Manoj Chinnakotla. Discovery and Promotion of Subtopic Level High Quality Domains for Programming Queries in Web Search.
Nicola Ferro, Paolo Picello and Gianmaria Silvello. A Software Library for Conducting Large Scale Experiments on Learning to Rank Algorithms.
Rolf Jagerman, Harrie Oosterhuis and Maarten de Rijke. Query-Level Ranker Specialization.
Or Levi. Online Learning of a Ranking Formula for Revenue and Advertiser ROI Optimization.
Brian Brost. Multileaving for Online Evaluation of Rankers.
Hui Fang and Chengxiang Zhai. When Learning to Rank Meets Axiomatic Thinking.
Claudio Lucchese, Franco Maria Nardini, Raffaele Perego, and Salvatore Trani. The Impact of Negative Samples on Learning to Rank.
Darío Garigliotti and Krisztian Balog. Learning to Rank Target Types for Entity-Bearing Queries.
School of Computing Science, University of Glasgow, UK
Bio
Craig Macdonald is Lecturer at the University of Glasgow, UK. Currently, his main research topics deal with Information Retrieval (IR) in general, for instance in settings such as Web, Enterprise, social media and Smart cities. He regularly participates in TREC, and jointly co-ordinated the TREC Blog track from 2006-2010, the Microblog track (from 2011-2012), and the Web track (2014-). He is a lead developer for the Terrier IR platform, and also uses Terrier in his research publications.
When a user is unsatisfied with the quality of results of a web search engine, they may switch to another, leading to a loss of ad revenue to the engine. Use of a robust retrieval approach is therefore essential, to that the experience of the users of the search engine is not damaged by poorly-performing queries. For this reason, there has been growing interest in measuring robustness using a new class of risk-sensitive evaluation measures, which assess the extent to which a system exhibit risk, i.e. performs worse than a given baseline system on a set of queries.
In this talk, we describe our recent advances in two families of risk-sensitive evaluation measures both based upon hypothesis testing, and their integration into a state-of-the-art learning to rank algorithm, to create effective yet robust retrieval models.
Firstly, we argue that risk-sensitive evaluation is akin to the underlying methodology of the Student's t-test for matched pairs. Hence, we introduce a risk-reward tradeoff measure TRisk that generalises the existing URisk measure, and which is theoretically grounded in statistical hypothesis testing.
Secondly, we argue that using a single system as the baseline suffers from the fact that retrieval performance highly varies among IR systems across topics. Thus, a single system would in general fail in providing enough information about the real baseline performance for every topic under consideration, and hence cannot in general measure the real risk associated with any given system. Based upon the Chi-squared statistic, we describe a second family of risk-reward tradeoff measures that take into account multiple baselines when measuring risk.
Experiments using 10,000 topics from the MSLR learning to rank dataset from the Bing search engine demonstrate that our proposed t-test and Chi-square based objective functions that reduces the number of poorly performing queries exhibited by a state-of-the-art learning to rank algorithm.
Department of Computer Science, University of Twente, The Netherlands
Bio
Djoerd Hiemstra is part-time associate professor at the University of Twente. He also heads Searsia, a University of Twente spin-off that develops an open source federated search engine. Djoerd contributed to over 200 research papers in the field of information retrieval, covering topics such as language models, structured information retrieval, multimedia retrieval, and federated web search. Djoerd published papers with research labs of Microsoft (where he did an internship in 2000), Yahoo (where he was a visiting researcher in 2008), and Yandex (which he visited in 2011).
Like most information retrieval methods, learning-to-rank methods are evaluated on benchmark datasets, such as the many datasets provided by Microsoft and the datasets provided by Yahoo and Yandex. Many of the learning-to-rank datasets offer feature set representations of the to-be-ranked documents instead of the documents themselves. Therefore, any difference in ranking performance is due to the ranking algorithm and not the features used. This opens up a unique opportunity for cross-benchmark comparison of learning-to-rank methods.
In this talk, I propose a way to compare learning to rank methods based on a sparse set of evaluation results on many benchmark datasets. Our comparison methodology consists of two components: (1) the Normalized Winning Number, a measure that gives insight in the ranking accuracy of the learning to rank method, and (2) the Ideal Winning Number, which gives insight in the degree of certainty concerning the ranking accuracy.
Evaluation results of 87 learning-to-rank methods on 20 well-known benchmark datasets are collected. I report on the best performing methods by Normalized Winning Number and Ideal Winner Number and suggest what methods need more research to make our analysis more robust.
Finally, we test the robustness of our results by comparing the results to situations where one of the datasets is not included in the analysis.
Online Learning of a Ranking Formula for Revenue and Advertiser ROI Optimization
Query-Level Ranker Specialization
Discovery and Promotion of Subtopic Level High Quality Domains for Programming Queries in Web Search
A Software Library for Conducting Large Scale Experiments on Learning to Rank Algorithms
The Impact of Negative Samples on Learning to Rank
Multileaving for Online Evaluation of Rankers
When Learning to Rank Meets Axiomatic Thinking
Learning to Rank Target Types for Entity-Bearing Queries