FedWeb Greatest Hits

FedWeb Greatest Hits: Presenting the New Test Collection
For Federated Web Search

by Thomas Demeester, Dolf Trieschnigg, Ke Zhou, Dong Nguyen, Djoerd Hiemstra

accepted as a poster at WWW’15

This paper outlines research challenges and opportunities for federated web search, based on experiments by the participating research teams over two consecutive years at the TREC Federated Web Search Track. We also present a combined and enriched research collection, based on the two datasets used for the track, which provides new research possibilities on federated web search techniques, and on other aspects such as the dynamics over time of various verticals, or the relation between snippet and page relevance.

Here’s the pdf version of the paper, and here‘s the official site where you can download the collection.


Ugent-IBCN at FIRE 2014

Entity linking: Test collections revisited.

by Laurent Mertens, Thomas Demeester, Johannes Deleu, Matthias Feys, and Chris Develder.

This paper analyzes two important conditions that are usually taken for granted in the evaluation of information retrieval systems: the test queries should be representative for the intended application scenario, and a sufficient amount of queries are needed to robustly assess system performance, as well as discern performance differences between systems. Both issues have important consequences, as studied in this paper for the specific case of Entity Linking systems. We investigate two methods for automatic query generation, and show them to have a vast impact on evaluated system performance. We further demonstrate the effect a query set’s size has on its ability to faithfully distinguish systems, and propose a method for assessing the possible impact on system performance that adding a specific number of queries to the set might have.

On the robustness of event detection evaluation: A case study.

by Matthias Feys, Thomas Demeester, Blaz Fortuna, Johannes Deleu, and Chris Develder.

Research on evaluation of IR systems has led to the insight that a robust evaluation strategy requires tests on a large number of events/queries. However, especially for event detection, the number of manually labeled events may be limited. In this paper we investigate how to optimize the evaluation strategy in those cases to maximize robustness. We also introduce two new vector space models for event detection that aim to incorporate bursty information of terms and compare these with existing models. Experiments show that exploiting graded relevance levels reduces the impact of subjectivity and ambiguity of event detection evaluation. We also show that although user disagreement is signi ficant, it has no real impact on result ranking.



TREC FedWeb 2014

Overview of the TREC 2014 Federated Web Search Track

by Thomas Demeester, Dolf Trieschnigg, Dong Nguyen, Ke Zhou, Djoerd Hiemstra.

The TREC Federated Web Search track facilitates research in topics related to federated web search, by providing a large realistic data collection sampled from a multitude of online search engines. The FedWeb 2013 challenges of Resource Selection and Results Merging challenges are again included in FedWeb 2014, and we additionally introduced the task of vertical selection. Other new aspects are the required link between the Resource Selection and Results Merging, and the importance of diversity in the merged results. After an overview of the new data collection and relevance judgments, the individual participants’ results for the tasks are introduced, analyzed, and compared.

[Read more]

Vertical intent

Aligning Vertical Collection Relevance with User Intent

by Ke Zhou, Thomas Demeester, Dong Nguyen, Djoerd Hiemstra, Dolf Trieschnigg. To be presented at the ACM International Conference on Information and Knowledge Management (CIKM 2014), Shanghai, China, 2014.

Selecting and aggregating different types of content from multiple vertical search engines is becoming popular in web search. The user vertical intent, the verticals the user expects to be relevant for a particular information need, might not correspond to the vertical collection relevance, the verticals containing the most relevant content. In this work we propose different approaches to define the set of relevant verticals based on document judgments. We correlate the collection-based relevant verticals obtained from these approaches to the real user vertical intent, and show that they can be aligned relatively well. The set of relevant verticals defined by those approaches could therefore serve as an approximate but reliable ground-truth for evaluating vertical selection, avoiding the need for collecting explicit user vertical intent, and vice versa.

[Read more]

Event Detection

Towards large-scale event detection and extraction from news

by Blaz Fortuna, Thomas Demeester, and Chris Develder. Large-scale Online Learning and Decision Making Workshop (LSOLDM 2014), Windsor, UK, 10-12 Sep. 2014.

Understanding and reasoning about textual data is one of the important topics in artificial intelligence and is being addressed by various research communities, ranging from knowledge representation, over natural language processing to text mining. Each community provides a different set of often overlapping intuitions, tools and methodologies for working with text.
Event processing from news and social media can be seen as a subtopic of text understanding. It comprises different tasks, including New and Retrospective Event Discovery, Event Type Classification and Event Template Extraction.
Research on event processing requires access to annotated data covering different tasks in the event processing pipeline. Over the last decades, several datasets have been created covering event discovery and event extraction. These datasets are rather limited in scope. For example, they contain articles from only few selected sources, or they contain a limited number of annotated events with a high selection bias (e.g., towards larger or well defined events like natural disasters or terrorism). Using such limited datasets to evaluate solutions for the event processing tasks may lead to favoring approaches that do not work well on real-world datasets. The main reasons for these limitations are (1) limited access to data resources and (2) the required and expensive manual annotations.
The main contributions we are working towards are (1) a systematic methodology for efficiently creating a large golden standard of manually annotated events over a large corpus of news articles with a realistic distribution over the covered topics and events, and (2) a resulting annotated corpus a resulting annotated corpus of 10,000 English general news articles embedded in 31 million news articles.

[Read more]


News from Twitter

Detecting newsworthy topics in twitter

by Steven Van Canneyt, Matthias Feys, Steven Schockaert, Thomas Demeester, Chris Develder, and Bart Dhoedt. Presented at 2nd Workshop on Social News on the Web (SNOW2014), WWW 2014, Seoul, Korea, April 8, 2014.

The task of the SNOW 2014 Data Challenge is to mine Twitter streams to provide journalists a set of headlines and complementary information that summarize the most newsworthy topics for a number of given time intervals. We propose a 4-step approach to solve this. First, a classifier is trained to determine whether a Twitter user is likely to post tweets about newsworthy stories. Second, tweets posted by these users during the time interval of interest are clustered into topics. For this clustering, the cosine similarity between a boosted tf-idf representation of the tweets is used. Third, we use a classifier to estimate the confidence that the obtained topics are newsworthy. Finally, for each obtained newsworthy topic, a descriptive headline is generated together with relevant keywords, tweets and pictures. Experimental results show the effectiveness of the proposed methodology.

[Read more]

Topics in Lyrics

Assessing Quality of Unsupervised Topics in Song Lyrics

by Lucas Sterckx, Thomas Demeester, Johannes Deleu, Laurent Mertens, and Chris Develder
to be presented at ECIR 2014.

How useful are topic models based on song lyrics for applications in music information retrieval? Unsupervised topic models on text corpora are often dicult to interpret. Based on a large collection of lyrics, we investigate how well automatically generated topics are related to manual topic annotations. We propose to use the kurtosis metric to align unsupervised topics with a reference model of supervised topics. This metric is well-suited for topic assessments, as it turns out to be more strongly correlated with manual topic quality scores than existing measures for semantic coherence. We also show how it can be used for a detailed graphical topic quality assessment.

[Read more]

User Disagreement Model

Exploiting User Disagreement for Web Search Evaluation: an Experimental Approach

by Thomas Demeester, Robin Aly, Djoerd Hiemstra, Dong Nguyen, Dolf Trieschnigg, Chris Develder
presented on February 25, at WSDM 2014.

To express a more nuanced notion of relevance as compared to binary judgments, graded relevance levels can be used for the evaluation of search results. Especially in Web search, users strongly prefer top results over less relevant results, and yet they often disagree on which are the top results for a given information need. Whereas previous works have generally considered disagreement as a negative effect, this paper proposes a method to exploit this user disagreement by integrating it into the evaluation procedure. First, we present experiments that investigate the user disagreement. We argue that, with a high disagreement, lower relevance levels might need to be promoted more than in the case where there is global consensus on the top results. This is formalized by introducing the User Disagreement Model, resulting in a weighting of the relevance levels with a probabilistic interpretation. A validity analysis is given, and we explain how to integrate the model with well-established evaluation metrics. Finally, we discuss a specific application of the model, in the estimation of suitable weights for the combined relevance of Web search snippets and pages.

[Read more]

TREC FedWeb 2013

Overview of the TREC 2013 Federated Web Search Track

by T. Demeester, D. Trieschnigg, D. Nguyen, D. Hiemstra
TREC 2013

The TREC Federated Web Search track is intended to promote
research related to federated search in a realistic web
setting, and hereto provides a large data collection gathered
from a series of online search engines. This overview paper
discusses the results of the rst edition of the track, FedWeb
2013. The focus was on basic challenges in federated search:
(1) resource selection, and (2) results merging. After an
overview of the provided data collection and the relevance
judgments for the test topics, the participants’ individual
approaches and results on both tasks are discussed. Promising
research directions and an outlook on the 2014 edition
of the track are provided as well.

[Read more]

The lowlands at TREC 2013

Mirex and Taily at TREC 2013

by R. Aly, D. Hiemstra, D. Trieschnigg, and T. Demeester
TREC 2013

We describe the participation of the Lowlands at the Web Track and the FedWeb track of TREC 2013. For the Web Track we used the Mirex Map-Reduce library with out-of-thebox approaches and for the FedWeb Track we adapted our shard selection method Taily for resource selection. Here, our results were above median and close to the maximum performance achieved.

[Read more]