Ranking is forever at the core of Information Retrieval since it allows to sift out non relevant information and to select a list of items ordered by their estimated relevance to a given query. Documents, Information needs, search tasks and interaction mechanisms between users and information systems are getting more and more complex and diversified, and this calls for more and more sophisticated techniques able to cope with this emerging complexity and the high expectations of users.
Learning to Rank (LtR), and machine learning in general, have proven to be very effective methodologies to address these issues, significantly improving over state-of-the-art traditional algorithms. Popular areas of investigation in LtR are related to efficiency, feature selection, supervised learning, but many new angles are still overlooked. The goal of this workshop is to investigate how to improve ranking, in particular LtR, by bringing in new perspectives which have not explored or fully addressed yet by our community after the 2011 Yahoo Learning to Rank Challenge.
In particular, we wish to encourage researchers to discuss the opportunities, challenges, results obtained in the development and evaluation of novel approaches to LtR. New perspectives on LtR may concern innovative models, study of their formal properties as well as experimental validation of their efficiency and effectiveness. We are in particular interested in proposal dealing with novel LtR algorithms, evaluation of LtR algorithms, LtR dataset creation and curation, and domain specific applications of LtR.
We invite papers from researchers and practitioners working in Information Retrieval, Machine Learning and related application areas to submit their original papers to this workshop.
Submission deadline: August 14, 2017
Notification of acceptance: September 4, 2017
Camera ready: October 16, 2017
Workshop day: October 1, 2017
General areas of interests include, but are not limited to, the following topics:
Papers should be formatted according to the ACM SIG Proceedings Template.
Papers should be four-six pages (maximum) in length.
Papers will be peer-reviewed by members of the program committee through single-blind peer review, i.e. authors do *not* need to be anonymized. Selection will be based on originality, clarity, and technical quality. Papers should be submitted in PDF format to the following address:
Accepted papers will be published online as a volume of the CEUR-WS proceeding series.
Roi Blanco, Amazon, Spain
Jiafeng Guo, Chinese Academy of Sciences, China
Craig Macdonald, University of Glasgow, UK
Fabrizio Silvestri, Facebook, UK
Arjen de Vries, Radboud Universiteit, The Netherlands
Hamed Zamani, University of Massachusetts, Amherst, USA
Arpita Das, Saurabh Shrivastava and Manoj Chinnakotla. Discovery and Promotion of Subtopic Level High Quality Domains for Programming Queries in Web Search.
Nicola Ferro, Paolo Picello and Gianmaria Silvello. A Software Library for Conducting Large Scale Experiments on Learning To Rank Algorithms.
Rolf Jagerman, Harrie Oosterhuis and Maarten de Rijke. Query-Level Ranker Specialization.
Or Levi. Online Learning of a Ranking Formula for Revenue and Advertiser ROI Optimization.
Brian Brost. Multileaving for Online Evaluation of Rankers.
In online learning to rank we are faced with a tradeoff between exploring new, potentially superior rankers, and exploiting our pre-existing knowledge of what rankers have performed well in the past. Multileaving methods offer an attractive approach to this problem since they can efficiently use online feedback to simultaneously evaluate a potentially arbitrary number of rankers. In this talk we discuss some of the main challenges in multileaving, and discuss promising areas for future research.
Hui Fang and Chengxiang Zhai. When Learning to Rank Meets Axiomatic Thinking.
Axiomatic thinking has been successfully applied to analyze and improve retrieval models and evaluation metrics. The basic idea of axiomatic thinking is to leverage formalized constraints to guide the search of the optimal solutions for a given problem. In this talk, we will talk about our vision on applying axiomatic thinking to the problem of learning to rank.
Darío Garigliotti and Krisztian Balog. Learning to Rank Target Types for Entity-Bearing Queries.
Detecting the target types of entity-bearing queries can help improve retrieval performance as well as the overall search experience. We propose a Learning-to-Rank approach, with a rich variety of features, for automatically identifying the target types of a query with respect to a type taxonomy. Using a purpose-built test collection, we show that our method outperforms existing ones by a remarkable margin. In this talk, we present the current approach, and draw some insights and challenges for its extension.
School of Computing Science, University of Glasgow, UK
Craig Macdonald is Lecturer at the University of Glasgow, UK. Currently, his main research topics deal with Information Retrieval (IR) in general, for instance in settings such as Web, Enterprise, social media and Smart cities. He regularly participates in TREC, and jointly co-ordinated the TREC Blog track from 2006-2010, the Microblog track (from 2011-2012), and the Web track (2014-). He is a lead developer for the Terrier IR platform, and also uses Terrier in his research publications.
When a user is unsatisfied with the quality of results of a web search engine, they may switch to another, leading to a loss of ad revenue to the engine. Use of a robust retrieval approach is therefore essential, to that the experience of the users of the search engine is not damaged by poorly-performing queries. For this reason, there has been growing interest in measuring robustness using a new class of risk-sensitive evaluation measures, which assess the extent to which a system exhibit risk, i.e. performs worse than a given baseline system on a set of queries.
In this talk, we describe our recent advances in two families of risk-sensitive evaluation measures both based upon hypothesis testing, and their integration into a state-of-the-art learning to rank algorithm, to create effective yet robust retrieval models.
Firstly, we argue that risk-sensitive evaluation is akin to the underlying methodology of the Student's t-test for matched pairs. Hence, we introduce a risk-reward tradeoff measure TRisk that generalises the existing URisk measure, and which is theoretically grounded in statistical hypothesis testing.
Secondly, we argue that using a single system as the baseline suffers from the fact that retrieval performance highly varies among IR systems across topics. Thus, a single system would in general fail in providing enough information about the real baseline performance for every topic under consideration, and hence cannot in general measure the real risk associated with any given system. Based upon the Chi-squared statistic, we describe a second family of risk-reward tradeoff measures that take into account multiple baselines when measuring risk.
Experiments using 10,000 topics from the MSLR learning to rank dataset from the Bing search engine demonstrate that our proposed t-test and Chi-square based objective functions that reduces the number of poorly performing queries exhibited by a state-of-the-art learning to rank algorithm.
Department of Computer Science, University of Twente, The Netherlands
Djoerd Hiemstra is part-time associate professor at the University of Twente. He also heads Searsia, a University of Twente spin-off that develops an open source federated search engine. Djoerd contributed to over 200 research papers in the field of information retrieval, covering topics such as language models, structured information retrieval, multimedia retrieval, and federated web search. Djoerd published papers with research labs of Microsoft (where he did an internship in 2000), Yahoo (where he was a visiting researcher in 2008), and Yandex (which he visited in 2011).
Like most information retrieval methods, learning-to-rank methods are evaluated on benchmark datasets, such as the many datasets provided by Microsoft and the datasets provided by Yahoo and Yandex. Many of the learning-to-rank datasets offer feature set representations of the to-be-ranked documents instead of the documents themselves. Therefore, any difference in ranking performance is due to the ranking algorithm and not the features used. This opens up a unique opportunity for cross-benchmark comparison of learning-to-rank methods.
In this talk, I propose a way to compare learning to rank methods based on a sparse set of evaluation results on many benchmark datasets. Our comparison methodology consists of two components: (1) the Normalized Winning Number, a measure that gives insight in the ranking accuracy of the learning to rank method, and (2) the Ideal Winning Number, which gives insight in the degree of certainty concerning the ranking accuracy.
Evaluation results of 87 learning-to-rank methods on 20 well-known benchmark datasets are collected. I report on the best performing methods by Normalized Winning Number and Ideal Winner Number and suggest what methods need more research to make our analysis more robust. Finally, we test the robustness of our results by comparing the results to situations where one of the datasets is not included in the analysis.