University of Glasgow - Schools - School of Computing Science - Research - Research sections - Information, Data & Analysis Section

The Glasgow Information Retrieval Group within the School of Computing Science at the University of Glasgow was founded 32 years ago in 1986 by Professor C. J. ‘Keith’ van Rijsbergen, often considered one of the founders of modern Information Retrieval (IR). From its outset, the Glasgow IR group has focused on improving the effectiveness of IR systems, inventing new logic & probabilistic retrieval models in the 90's and early 2000's, followed by the development of adaptive query expansion techniques, interactive multimedia models, the Divergence From Randomness framework, as well as leading research into quantum, expertise search and search result diversification models in the late 2000's. Since then, the Information Retrieval group embraced emerging machine learning and deep learning technologies for very large corpora and data streams, and have been at the forefront of research, development and application of those technologies for search and recommendation use-cases in a manner that ensures both effectiveness and efficiency.

The Glasgow IR Group has a strong research track record. Indeed, the ACM Digital Library shows that the group is ranked first by number of papers (429) at the SIGIR conference (the top CORE A* conference in the IR field). Meanwhile, a recent study by Microsoft Research of the 40 years of SIGIR showed the University of Glasgow as the 5th most cited university at the conference and the 1st in Europe. The group is also renowned for developing the popular open source IR platform, Terrier.org, which has been downloaded over 60,000 since its first release in 2004 and is cited by over 3500 research papers. Furthermore, the group has a long history of engagement with the public and industry sectors from small SMEs to multinational corporations.

The Informer magazine of BCS's Information Retrieval Specialist Group carried a recent profile on the Glasgow Information Retrieval Group.

Topics

As the most active Information Retrieval group by publications in Europe and one of the longest running, our research covers the full-spectrum of topics that are relevant to the development of IR systems:

IR & Recommender Systems Models

Theoretical modelling of IR systems
Machine learning and deep learning for information retrieval and recommender systems
Interactive information retrieval (personalised IR, emotion based search, user modelling for IR, gestural IR)
User modelling and personal information access
Topic modeling; Entity search; Natural language processing for IR
Recommender systems; Context-aware venue suggestion

Large-scale IR & Efficient IR

Web information retrieval; Big data and information retrieval
Efficient architecture for large-scale IR systems; Data stream processing architectures

Data Streams & IR

Real-time information retrieval
Search in social and sensor networks

Artificial Intelligence & IR

Conversational information seeking and dialogue systems
Information credibility, transparency, explainability and verification in IR systems
Fairness in information retrieval & recommender systems

Natural Language Processing & IR

Information extraction including entity and relation extraction
Automatic knowledge graph construction
Multi-task models, joint models and summarization

Applications

Multimedia information retrieval
Domain-specific information retrieval: smart cities; health; news; eDiscovery; sensitivity review
Emergency management and crisis informatics
Politics and Media

Evaluation

Test collections and evaluation metrics
Evaluation of IR systems and crowdsourcing for IR
Online and Offline Evaluation of IR and Recommender Systems
Eye-tracking and physiological approaches, such as fMRI

Projects

Active Projects:

Recent Projects:

Current staff and students

Academic Staff:

Current Research Assistants and Research Students:

Javier Sanz-Cruzado Puig
Yashon Wu
Xin Xin
Carlos Gemmel
Federico Rossetto
Alexander Hepburn
Iain Mackie
Jun Choi Hyun
Hitarth Narvala
Jijun Long
Maria Vlachou
Sasha Petrov
Thomas Janich
Zixuan Yi
Erland Frayling
Edward Richards
Hajra Klair
Andreas Chari
Jack McKechnie
Andrew Parry
Gan Wang
Xuejun Chang
Jinyuan Fang
Xinhao Yi
Zeyuan Meng
Fangzheng Tian
Lubingzhi Guo
Zhaohan Meng

Recent Graduates

Jarana Manotumruksa, University College London, Researcher
Anjie Fang, Amazon, Applied Scientist
Jorge David Gonzalez Paule, Jobandtalent Espana, Data Scientist
Colin Wilkie, Siemens, Data Engineer
David Maxwell, University of Deft, Data Engineer
Graham McDonald, University of Glasgow, Senior Lecturer
James McMinn, ScoopAnalytics, Co-Founder
Stuart Mackie, BiP Solutions/Strathclyde Uni, Data Scientist
Horatiu Bota, Prodsight, Data Scientist
Jesus Alberto Rodriquez Perez, University of Glasgow, Postdoctoral Researcher
Fajie Yuan, Tencent, Senior Researcher

Notable Alumni

Ryen White (Research Manager, Microsoft Research AI)
Mark Sanderson (Professor, Royal Melbourne Institute of Technology)
Mounia Lalmas (Head of Tech Research, Spotify)
Ian Ruthven (Professor, Strathclyde University)
Fabio Crestani (Professor, University of Lugano)
Vassilis Plachouras (Software Engineering, Facebook)
Leif Azzopardi (Chancellor's Fellow, Strathclyde University)
Rodrygo Santos (Assistant Professor, Federal University of Minas Gerais)
Eugene Kharitonov (Research Engineer, Facebook)
Saul Vargas (Senior Machine Learning Scientist, ASOS)
Dyaa Albakour (Lead Data Scientist, Signal Media)
Nut Limsopatham (Senior Researcher, Microsoft AI)
Amir Jadidinejad (AI Engineer, Glaxo Smith Kline)
Zaiqiao Meng (Researcher, Cambridge University)

Terrier IR platform

Terrier is a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents developed by the IR group. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications. Indeed, Terrier is used internationally, with over 60,000 downloads since its first release in 2004. Terrier is is used widely by the research community, with over 3700 citations in research papers according to Google Scholar.

Visit the website at http://terrier.org to learn more and download Terrier for free.

Popular resources

For those new to the Information Retrieval field, the group maintains a useful set of common resources for researchers and practitioners:

Information Retrieval Test Collections: On this page are a list of publically available IR test collections. Some are held locally and some are pointers to remote sites.
Collections of text and corpora: What's the difference between a test collection and a text collection? Well a test collection has to have associated queries and relevance judgements. The things in here are simply document collections.
Language reference works: This page contains links to online language reference works, such as dictionaries, thesauri etc.
IR systems: A list of links to some sites that have information about IR systems.
Linguistic utilities: Bits of IR language related utilities like stemmers, stop words lists, morphological taggers, etc.
IR Journals: Various table of contents and abstracts of the papers in a number of well known IR journals.
IR Organisations: Various IR groups and more formal organisations.
Books: Supplements of books or whole books online.

Upcoming events

Annotative Indexing

Group: Information Retrieval (IR)
Speaker: Charles Clarke, Waterloo University
Date: 07 October, 2024
Time: 15:00 - 16:00
Location: Sir Alwyn Williams Building, 422 Seminar Room

Title
Annotative Indexing

Abstract
This talk presents and explores annotative indexing, a novel framework that unifies and generalizes traditional inverted indices, column stores, object stores, and graph databases. As a result, annotative indexing can provide the underlying indexing framework for retrieval systems that integrate sparse retrieval, dense retrieval, entity retrieval, knowledge graphs, and semi-structured data. While our reference implementation primarily supports human language data in the form of text, annotative indexing is sufficiently general to support a wide range of other data types. The talk will include examples of SQL-like queries over a JSON store built on our reference implementation that include numbers and dates. Taking advantage of the flexibility of annotative indexing, the talk will also demonstrate a fully dynamic inverted index incorporating support for ACID properties of transactions with hundreds of multiple concurrent readers and writers.

Bio
Charles Clarke is a Professor in the School of Computer Science and an Associate Dean for Innovation and Entrepreneurship at the University of Waterloo, Canada. His research focuses on data intensive tasks involving human language data, including search, ranking, and question answering. Clarke is an ACM Distinguished Scientist and leading member of the search and information retrieval community. From 2013 to 2016 he served as the Chair of the Executive Committee for the ACM Special Interest Group on Information Retrieval (SIGIR). From 2010-2018 he was Co-Editor-in-Chief of the Information Retrieval Journal. He was Program Co-Chair for the SIGIR main conference in 2007 and 2014, and he was elected to the SIGIR Academy in 2022. His research has been funded by Google, Microsoft, Meta, Spotify, and other companies and granting agencies. Along with Mark Smucker, he received the SIGIR 2012 Best Paper Award. Along with colleagues, he received the SIGIR 2019 Test of Time Award for their SIGIR 2008 paper on novelty and diversity in search. In 2006 he spent a sabbatical at Microsoft, where he was involved in the development of what is now the Bing search engine. From August 2016 to August 2018, while on leave from Waterloo, he was a Software Engineer at what is now Meta, where he worked on metrics and ranking for Facebook Search. He is a co-author of the textbook Information Retrieval: Implementing and Evaluating Search Engines, MIT Press, 2010, which he has had the pleasure of seeing almost entirely deprecated in recent years. Almost.

Jinyuan Fang IR Seminar

Group: Information Retrieval (IR)
Speaker: Jinyuan Fang, University of Glasgow
Date: 14 October, 2024
Time: 15:00 - 16:00
Location: Sir Alwyn Williams Building, 422 Seminar Room

Title

TBC

Abstract

TBC

Bio

TBC

Chuan Meng IR Seminar

Group: Information Retrieval (IR)
Speaker: Chuan Meng, University of Amsterdam
Date: 21 October, 2024
Time: 15:00 - 16:00
Location: Sir Alwyn Williams Building, 422 Seminar Room