Every year, researchers publish an average* of over 7000 academic papers that acknowledge the Wellcome Trust. Our guidelines state that a Wellcome-funded researcher should explicitly mention a grant number in all research outputs. However, in reality, at least one quarter of the publications that acknowledge Wellcome are not linked to a grant number. This means that when reviewing our funding portfolio’s academic outputs, our analysts and managers have the caveat that one quarter of the publications are unaccounted for. Below is a typical acknowledgement statement that doesn’t mention a grant number:

Funding statement that acknowledges some funders but doesn’t contain a grant number

To solve this problem, the data science team at…


A large portion of the data produced in the context of the grant making activities of the Wellcome Trust is unstructured, and consists essentially of text (e.g. grants synopses, academic publications, and policy documents). A big part of the job of the data science team at Wellcome Data Labs is to employ natural language processing (NLP) techniques to make sense of this data.

In this blog post, we will describe different ways in which we applied NLP this year to help grant managers at Wellcome better understand our research portfolio. Here is a non-exhaustive list of applications:

  1. Employed text clustering…

With a science portfolio of over £3.5bn (as of March 2020), the Wellcome Trust has funded research responsible for at least tens of thousands of academic publications in the last 5 years alone. This volume of publications presents a challenge for grant advisors and data analysts when tracking research outcomes. In particular, Wellcome Data Labs has often been asked the recurring question:

How can we visualise the research fields/topics that have emerged from our funded grants?

This complex question has been investigated previously, for example in the field of neuroscience by tracking journeys of researchers in conferences. …


Disclaimer: this is not an official document, and should in no way replace the official guidance at https://www.gov.uk/skilled-worker-visa/.

Update 21/02/2021: After many requests, I made the list of licensed sponsors available here in a friendly format, derived from the official pdf document.

Update 30/12/2020: The previous work visa route (Tier 2) has been renamed to “Skilled Worker Visa” (https://www.gov.uk/skilled-worker-visa/). Much of of the information in this blog still holds, with the exception of labour market test (which has been scrapped altogether)

Update 14/06/2020: The shortage occupation jobs have been updated. All jobs under the Standard Occupational Code 2135 IT business…


My impressions of the British Computer Vision Summer School

Last month, the British Machine Vision Association organised the Computer Vision Summer School at the University of East Anglia, Norwich. This is an annual 4-day workshop where young computer vision practitioners can listen to UK leading academic experts on the various research aspects of the field. The talks ranged from introductory courses (such as colour, low-level vision) to the latest research trends, this year including active vision, probabilistic generative models and, as it always has to be, deep learning. Here are my top picks.

Trend #1: An attempt to know what we don’t know

There are known knowns. These are things we know that we know. There are known unknowns…


Days two and three

During days two and three of the LAWCI the talks showed the versatility of coding and information theory, starting from advanced applications to cryptography, until quantum bits. We have learned how to apply finite fields on protocols such as the Advanced Encryption Standard (AES), which is used in whatsapp’s end to end encryption, how to interpret mutations in the DNA, and how to manipulate data with quantum bits.

A word cloud based on the students’ presentations

Student posters

Students presented their ongoing research work during two poster sessions. Besides the sessions, we had a 3-minute elevator pitch presentation for every student. In massive conferences such as…


Day one

Today is the first day of the Latin American Workshop on Coding and Information, a satellite event of the celebrated International Congress of Mathematicians which will be held in Rio de Janeiro on August (you know, Fields medals and stuff). The workshop is devoted to discussing questions of mathematical foundations of secure and reliable data storage and transmission, including applications to the quantum world, cryptography, and biology.

How to solve a sudoku effectively?

The first lecture was given by Prof Daniel Panario (Carleton U) on finite fields an applications. Finite fields are arguably one of the finest fields of mathematics (see what I did…

Antonio Campello

Data science. All things data governance, machine learning and open data.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store