Work Package 1: Collecting Data

"Data Access, Data Wrangling, Data Engineering"

AI MULTIPLY researchers will make use of multiple, large patient datasets, both national and regional, to address key research questions. 

To achieve this, we will make use of a new generation of algorithms that can identify trends in large collections of patient records. Examples of these records include Electronic Health Records (e.g. diagnoses, clinical test results) and associated Prescription Records (from GP practices and primary care records). We will use data which represents ethnic minorities and migrants.

Artificial Intelligence

The algorithms, known as “Machine Learning” or “Artificial Intelligence”, specialise in recognising patterns in large amount of data, including patients’ medical histories. For the algorithms to be able to function effectively, it is important that the data used is available in large quantities (thousands of patients) and is of good quality.

Whilst in principle these records “tell a patient’s story”, they mostly reflect doctors’ notes, taken either in a GP clinic or in a hospital, and as such they can be incomplete, ambiguous, and possibly vague (for example, the rationale behind a specific drug prescription may not be clear).  When similar data are collected across different practices and at different times, they also need to be integrated, as the format and clinical meaning of the data may be different.

Data Engineering

“Fixing” the data that we have access to entails multiple technical challenges. In Work Package 1 we will access the data required and address these challenges, so that they can be meaningfully analysed in Work Package 2. We will use techniques known as ‘data engineering’ to make it easier to use the data in the next part of the project.

We are also planning to share our research with teams working on similar projects throughout the UK. This will help us to understand how our research can be applied across different communities. We will work with local healthcare providers and inequality groups to ensure our findings have impact within local populations.