Reinventing Test and Trace: A Bayesian Approach For Modelling SARS-CoV-2 Setting-Specific Transmission

## Introduction

It is no secret that the response of multiple countries following the outbreak of the COVID-19 pandemic has been abismal - and the UK is no exception. With £355 billion amounting in economic losses (OBR, 2021), a 4.8% rate of unemployment (ONS, 2021) and, perhaps most shockingly, over 128,000 deaths to this date (Gov.uk, 2021), the UK government has undoubtedly handled this crisis poorly, with two key government-led programmes standing out amongst the list of policy failures.

The Test and Trace programme, established to curb Coronavirus reproduction, aims to provide accessible testing as well as contact tracing to notify exposed individuals and instruct them to self-isolate. One major failure of the T&T programme has been the lack of a coherent data collection strategy for analysing setting-specific transmission; that is, the relative likelihood of contracting COVID in different locations, such as a hair dresser, a restaurant, etc. Given the dependence of the UK Tier System on our understanding of setting-specific transmission, this alone serves as an explanation of the latter’s failure, as settings grouped together often bear no resemblance in terms of their underlying transmission rates, resulting in unnecessary business closures and contributing to the economic hemorrhage.

Thus, we set out to formulate a data collection and analysis methodology that is both compatible with the current T&T programme and enables us to model and estimate setting-specific COVID-19 transmission rates, in the hope of guiding lockdown policies using a reliable, data-driven approach.

## Our approach and assumptions

### Data collection methodology

Starting out, our goal was to model a random vector $\boldsymbol{\theta} \in [0,1]^P$ of transmission rates for $P$ different settings. To estimate this model, we would require data that could be feasibly collected through the T&T programme. Therefore, we devised the following data collection strategy to accompany the probabilistic model developed below, making sure to prioritize its feasibility and scalability:

1. Data is to be collected from individuals in the T&T programme through a short survey of binary responses on whether or not they visited each one of $P$ locations in the last few days.

2. For each individual surveyed through T&T, the result of their COVID-19 antigen test is also observed.

3. Finally, a random survey is also sent out. This survey is identical to the one in (1.) but, since it is not distributed by T&T, no antigen test is taken and therefore no outcome is observed.

### Model assumptions

In order for us to combine this data collection methodology with a statistical model that allows for inference on the estimated setting-specific transmission rates, we had to lay down the following assumptions:

• Multiple visits to a location in one week are rare enough to be ignored or in cases (such as supermarkets) where multiple visits are expected, the distribution of number of visits is tight.

## Building a first-principles model

Given the data collection strategy detailed above, for each individual, we observe a binary vector $\boldsymbol{x}_i \in \{0,1\}^{P}$, which indicates their responses on the location visits survey, and random variable $y_i \in \{0,1\}$ which indicates if they tested positive for COVID-19.

### The base model

We formulate our model generatively through the latent transmission vector $\boldsymbol{t}_i \in \{0,1\}^{P}$, which follows a multivariate Bernoulli distribution with transmission probabilities dependent on the individual’s attendance to each setting and the respective setting-specific transmission rates. Furthermore, we capture the case of COVID transmission in a location not specified by the survey through the latent variable $b_i \in \{0,1\}^{P}$, which is analogous to $\boldsymbol{t}_i$, but parameterized by the underlying base transmission risk:

$\boldsymbol{t}_i \sim \text{Ber}(\boldsymbol{x}_i \circ \boldsymbol{\theta})$

$b_i \sim \text{Ber}(\rho)$

We denote whether or not the individual contracted COVID-19 with the binary variable $w_i$ and define it using the natural relationship with the aforementioned latent variables. Since transmission in any of the settings surveyed or in some other unaccounted location results in contraction, we define $w_i$ as the indicator of this scenario:

$w_i=\mathbb{1}\{\textbf{1}^\text{T}\boldsymbol{t}_i + b_i>0\}$

### Curbing selection bias

The next challenge for our model was to account for selection bias in T&T survey observations. More specifically, infected individuals (who feel unwell and present symptoms) are more likely to get in contact with T&T and hence select into the survey, thus introducing bias into any transmission rate estimates. To mitigate this, we collect observations of $\boldsymbol{x}_i$ from a random survey and define $s_i$ to indicate whether or not the observation came from T&T:

$s_i=\begin{cases} 1,\; \text{T\&T sample} \\ 0,\; \text{otherwise} \end{cases}$

From there, we utilize $s_i$ to define the testing rates as the probability of getting tested conditional on being COVID-19 infected, and use these to weight down the transmission likelihood for our observations.

$\mathbb{P}(s_i=1|w_i=1) = \gamma_+$

$\mathbb{P}(s_i=1|w_i=0) = \gamma_-$

Similarly, we also account for false positive and negative antigen test results by defining the test sensitivity and specificity parameters through the conditional probability of testing positive given infected and testing negative given non-infected, respectively:

$\mathbb{P}(y_i=1|w_i=1) = \lambda_+$

$\mathbb{P}(y_i=0|w_i=0) = \lambda_-$

### Hierarchical extension

We modeled the hierarchical nature of setting-specific transmission by grouping each setting into one of $K$ encompassing classes of similar settings: $c[p]:[P] \to [K]$. For each class, we model transmission rates of member settings as draws from a logit-normal distribution, parameterized by the mean class transmission rate $\boldsymbol{\mu} \in [0,1]^K,$ and the class transmission rate variance $\, \boldsymbol{\sigma}^2 \in \mathbb{R}_{>0}^K$. It should be noted that :

$\theta_p \sim \text{Logitnormal}\left(\text{logit}\left(\mu_{c[p]}\right) \;,\; \sigma_{c[p]}^{2} \right)$

### Modelling policy interventions

Finally, we also wanted to model the effect of policy interventions on different setting classes to answer questions such as: “Does social distancing have different effects in cinemas than in restaurants?” To do this, we can introduce policy intervention parameters and model out their interactions with different setting transmission rates, provided our data collection methodology can be feasibly extended to collect the data necessary to pin down these additional parameters.

We exemplify this with the mask-wearing intervention, with the objective of estimating the different effects mask-wearing can have in different settings. To do this, we could extend our data collection survey to ask individuals about their mask-wearing habits, allowing us to define $m_i$ as an indicator for habitual mask-wearers:

$m_i=\begin{cases} 1,\; \text{habitual mask-wearing} \\ 0,\; \text{otherwise} \end{cases}$

This additional data then allows us to incorporate and estimate $\iota_{c[p]}$ as the class-specific mas wearing impact on transmission rates:

$\theta_{ip} \sim \text{Logitnormal}\left(\, \text{logit}\left(\mu_{c[p]}\right) + m_i \log\left(\iota_{c[p]}\right) \;,\; \sigma_{c[p]}^{2} \,\right)$

Modeling policy interventions in this way allows for a clear-cut interpretation of the intervention effect parameter. More specifically, with a little algebra, it becomes evident that $\iota_{c[p]}$ essentially acts as a multiplier effect on the average class transmission odds:

$\log\left(\frac{\mu_{c[p]}}{1-\mu_{c[p]}} \right) \rightarrow \log\left(\iota_{c[p]} \frac{\mu_{c[p]}}{1-\mu_{c[p]}} \right)$

## Simulating our data

In order to be test out this model, the following metadata must be defined in relation to the application context:

• $P:$ A vector containing the number of settings per class
• $K:$ The number of setting classes considered in the model
• $N:$ The total population
• $S:$ The number of random surveys sent out
• $T:$ The total number of T&T samples

In particular, it is worth highlighting that $N,S,$ and $T$ are constrained by the following:

$N \geq S+T$

It is assumed that, in the typical case, both $N$ and $T$ are predetermined, and the authority applying the model gets to choose $S$ subject to the above. In our simulation, $N$ and $S$ are pre-set and $T$ is randomly determined given the simulated testing rates.

### Population ground truth

Since we our data collection strategy is (an ideal) hypothetical, to test our model we must first specify a ground truth for the population parameters and then simulate our data accordingly. To this end we sampled our population parameters as follows, making sure to try a range of values for these to ensure model robustness:

ParameterModel NotationSimulationModel Priors
Setting transmission rates$\theta_{ip}$Logit-normal distribution parameterized by class transmission rate mean and variance$U(0,1)$
Mean class transmission rates$\mu_{c[p]}$Beta distribution$U(0,1)$
Class transmission rate variance$\sigma^2_{c[p]}$Inverse gamma distribution$\text{Inv-Gamma}(10, 1)$
Base transmission rate$\rho$Beta distribution$U(0,1)$
Class-specific mask impact$\iota_{c[p]}$Normal with negative mean$\mathcal{N}(-1, 1)$
Testing rates$\gamma_{+,\,-}$Beta distribution$U(0,1)$
Test precision and recall rates$\lambda_{+,,-}$Beta distribution$\text{Beta}(\alpha*, \beta*)$, with shape hyperparameters calibrated to match the antigen test sensitivity and specificity results from the Joint PHE Porton Down & University of Oxford SARS-CoV-2 test development and validation cell

## Fitting our Model

### Our model in Stan

We implemented the code for our first-principles model using Stan, a probabilistic inference language compiled in C++. Stan allows users to carry out full Bayesian inference on statistical models via Markov Chain Monte Carlo (MCMC) sampling, and enables model specification in a block-like fashion. The main building blocks for our Stan model are:

1. The data block, which specifies the type and dimensions of data used to train the model.
2. The parameter block, which specifies all underlying statistical parameters of the model.
3. The model block, which assigns parameter priors and constructs the model likelihood describing the joint probability of the observed data as a function of the parameters.

With computational feasibility in mind, we also coded a version of our model using the TensorFlow Probability framework, as this version supports the use of GPU’s and distributed computing capabilities to deliver greater computational power. For a more extensive review of the code produced, please refer to the project GitHub repository.

### Markov Chain Monte Carlo Posterior Sampling

In general, the aim of Bayesian inference is to derive the posterior distribution of our parameters by defining it mathematically using Bayes’ theorem as below, where $p(\theta)$ is the parameter prior and $p(\boldsymbol{X}\,|\,\theta )$ is the model likelihood:

$p(\theta \,|\,\boldsymbol{X}) \;=\; \frac{p(\theta) \, p(\boldsymbol{X}\,|\,\theta )}{p( \boldsymbol{X})}$

Unfortunately, this distribution in our model (as is the case for most Bayesian models) cannot be evaluated analytically given the highly-dimensional parameter space. To overcome this, Stan utilizes the NUTS algorithm (part of the general class of MCMC methods) to infer the posterior distribution by repeatedly drawing samples from it, even though its full closed-form characterization is unknown. Unlike regular Monte Carlo sampling, which draws independent samples, MCMC allows for ‘smarter’ sampling as it draws correlated samples from the stationary distribution of a Markov chain that’s proportional to the desired distribution. This allows for the sampling process to enter regions of high density much faster. For more details regarding how Stan achieves this using the NUTS algorithm, the interested reader should refer to this link.

At its core, there are three hyperparamters that need to be specified for MCMC sampling with the NUTS algorithm.
Firstly, the num_warmup parameter is used to specify the number of samples that are discarded as burn-in. This is done in order to allow convergence onto the stationary distribution of the Markov chain. Secondly, the n_samples parameter specifies the number of samples to be drawn from the distribution. Finally, n_chains specifies the number of chains that should be constructed, as using more well-mixed chains increases robustness.

## Results and conclusion

### Visualizing posterior samples

After running our model, we computed the summary statistics of our posterior samples in the table below. Additionally, we visualize these in the subsequent traceplots, which show the distributions for all $P$ setting transmission rates, as well as the other model parameters, in the left, and the sampled values on the right. Overall, we can see that there was proper chain mixing given both the shape of our sampled value traces and the $\hat{R}$ statistics (equal to one for all model parameters), which suggests that improper model parametrization is unlikely to be an issue.

meansdhdi_3%hdi_97%mcse_meanmcse_sdess_meaness_sdess_bulkess_tailr_hat
theta[0]0.2500.1110.0420.4440.0020.0014090.04090.03711.03200.01.0
theta[1]0.2240.1010.0420.4100.0020.0014116.04116.03688.03103.01.0
theta[2]0.2120.0980.0400.3900.0010.0014352.04352.03834.03492.01.0
theta[3]0.2280.0980.0530.4040.0020.0012733.02733.02720.03567.01.0
theta[4]0.2310.0990.0660.4200.0020.0012788.02787.02807.03854.01.0
theta[5]0.2230.0970.0620.4100.0020.0012890.02877.02818.03744.01.0
theta[6]0.2670.1080.0800.4710.0020.0012956.02956.02817.03703.01.0
theta[7]0.2620.1100.0760.4700.0020.0022323.02323.02254.03422.01.0
theta[8]0.1890.1270.0000.4080.0020.0015756.05756.04251.02878.01.0
theta[9]0.3140.0880.1530.4760.0010.0014434.04431.04447.05614.01.0
theta[10]0.3220.0920.1540.4950.0010.0014290.04290.04273.05728.01.0
theta[11]0.2820.0850.1310.4370.0010.0014411.04323.04461.05124.01.0
theta[12]0.3040.0870.1500.4700.0010.0014219.04200.04279.05716.01.0
theta[13]0.2230.1460.0000.4760.0020.0016577.06577.05525.04074.01.0
theta[14]0.2100.1110.0210.4140.0020.0014139.04139.03687.03482.01.0
theta[15]0.2360.1220.0280.4580.0020.0014540.04540.04006.03202.01.0
theta[16]0.1330.1030.0000.3180.0010.0016605.06605.04609.02862.01.0
theta[17]0.2070.1280.0010.4300.0020.0014882.04882.03842.02402.01.0
theta[18]0.2220.1050.0430.4150.0020.0013161.03161.03078.04504.01.0
theta[19]0.2300.1090.0460.4280.0020.0013269.03269.03178.04160.01.0
theta[20]0.1890.0900.0370.3530.0010.0013694.03694.03518.04593.01.0
theta[21]0.2150.1020.0450.4080.0020.0013363.03363.03232.04370.01.0
rho0.2110.0640.0870.3320.0010.0015180.05180.05262.03748.01.0
mu[0]0.2300.1000.0530.4190.0020.0013899.03899.03493.02856.01.0
mu[1]0.2410.0930.0800.4180.0020.0012338.02338.02262.03169.01.0
mu[2]0.2020.1380.0000.4470.0020.0015961.05961.04364.02970.01.0
mu[3]0.3070.0800.1640.4550.0010.0013525.03512.03531.04945.01.0
mu[4]0.2340.1540.0000.5040.0020.0016848.06848.05735.03834.01.0
mu[5]0.2280.1170.0270.4440.0020.0014374.04374.03914.03227.01.0
mu[6]0.1450.1140.0000.3510.0010.0017078.07078.04699.02933.01.0
mu[7]0.2190.1390.0000.4600.0020.0015242.05242.04052.02295.01.0
mu[8]0.2140.0960.0510.3910.0020.0013137.03137.03025.03789.01.0
sigma2[0]0.1310.0510.0560.2200.0010.0007513.05708.010852.05458.01.0
sigma2[1]0.1500.0620.0610.2600.0010.0016404.05250.08836.05385.01.0
sigma2[2]0.1180.0430.0530.1930.0000.0008280.06082.012257.05441.01.0
sigma2[3]0.1350.0530.0590.2260.0010.0017057.05298.010888.05228.01.0
sigma2[4]0.1170.0430.0540.1960.0000.0009024.06569.013371.05450.01.0
sigma2[5]0.1250.0480.0540.2110.0010.0008224.06024.012240.05518.01.0
sigma2[6]0.1180.0430.0560.2000.0000.0008184.05615.013440.05532.01.0
sigma2[7]0.1180.0430.0550.1980.0000.0007992.05761.012288.04824.01.0
sigma2[8]0.1380.0520.0580.2320.0010.0007786.06455.09620.05729.01.0
iota[0]-0.9540.619-2.1310.2080.0080.0065775.05775.05852.04816.01.0
iota[1]-1.2560.566-2.343-0.1970.0090.0064263.04263.04302.04532.01.0
iota[2]-1.3900.864-2.9930.2620.0090.0079762.08061.09793.06472.01.0
iota[3]-1.5710.429-2.392-0.8120.0060.0045454.05227.05576.05157.01.0
iota[4]-1.1630.866-2.8290.4370.0080.00710823.07689.010823.06792.01.0
iota[5]-0.6200.734-2.0140.7340.0090.0075981.04847.06037.05498.01.0
iota[6]-1.2950.875-2.9470.3760.0090.00710158.07307.010212.06283.01.0
iota[7]-1.0390.850-2.6440.5500.0090.0079000.06438.09066.06105.01.0
iota[8]-0.5050.599-1.6280.6330.0090.0064381.04381.04388.04901.01.0
gamma[0]0.4610.0180.4290.4970.0000.0002755.02747.02819.03925.01.0
gamma[1]0.1910.0150.1620.2190.0000.00010384.010364.010370.05707.01.0
lambda[0]0.7470.0200.7120.7850.0000.0003566.03554.03625.05333.01.0
lambda[1]0.9970.0010.9960.9980.0000.00015214.015214.015246.05420.01.0

### Comparison with ground truth

After fitting, one key question is the extent to which our model is successful in estimating the underlying transmission rates. To test this, we take the posterior mean as the minimum mean squared error estimate and compare it to the ground-truth parameters used to simulate the data-generating process. As shown in the table below, most parameter estimates fall reasonably close from the actual ground truth values, and all of them fall within the 94% high density interval (HDI), serving as a good indicator of our model’s accuracy.

The key condition for this comparison as proof-of-concept is the fact that our model only observes the parameter priors and the training data, but never the simulating distributions themselves. Moreover, excluding the case of antigen test accuracy rates, the priors as specifically chosen to be largely uninformative relative to the simulating distribution in order to limit the unreasonable influence of these on model performance.

Finally, a question of interest is how the accuracy of the model is affected by the number of training samples or the dimension of the parameter space. The results for both of these questions are excluded for brevity, but they tend to fall in line with what would be expected (model accuracy increases with training data and decreases with dimensionality) and the interested reader should consult the project repository for more details.

Ground TruthPosterior MeanHDI 3%HDI 97%
Parameter
theta[0]0.1253880.2500.0420.444
theta[1]0.1295710.2240.0420.410
theta[2]0.1277060.2120.0400.390
theta[3]0.1218980.2280.0530.404
theta[4]0.1417310.2310.0660.420
theta[5]0.1413760.2230.0620.410
theta[6]0.1163300.2670.0800.471
theta[7]0.1404230.2620.0760.470
theta[8]0.1591810.1890.0000.408
theta[9]0.1267100.3140.1530.476
theta[10]0.1338790.3220.1540.495
theta[11]0.1290200.2820.1310.437
theta[12]0.1448060.3040.1500.470
theta[13]0.0325780.2230.0000.476
theta[14]0.1170050.2100.0210.414
theta[15]0.1330920.2360.0280.458
theta[16]0.1081550.1330.0000.318
theta[17]0.0657460.2070.0010.430
theta[18]0.1057660.2220.0430.415
theta[19]0.1246220.2300.0460.428
theta[20]0.1216150.1890.0370.353
theta[21]0.1205050.2150.0450.408
rho0.1838740.2110.0870.332
mu[0]0.1197640.2300.0530.419
mu[1]0.1335960.2410.0800.418
mu[2]0.1406940.2020.0000.447
mu[3]0.1368860.3070.1640.455
mu[4]0.0351240.2340.0000.504
mu[5]0.1376120.2280.0270.444
mu[6]0.1013370.1450.0000.351
mu[7]0.0750240.2190.0000.460
mu[8]0.1188770.2140.0510.391
sigma2[0]0.1016240.1310.0560.220
sigma2[1]0.0972990.1500.0610.260
sigma2[2]0.0940040.1180.0530.193
sigma2[3]0.1031040.1350.0590.226
sigma2[4]0.1056770.1170.0540.196
sigma2[5]0.0915710.1250.0540.211
sigma2[6]0.1074850.1180.0560.200
sigma2[7]0.1062120.1180.0550.198
sigma2[8]0.1053880.1380.0580.232
iota[0]-0.980943-0.954-2.1310.208
iota[1]-1.108789-1.256-2.343-0.197
iota[2]-0.864302-1.390-2.9930.262
iota[3]-1.395580-1.571-2.392-0.812
iota[4]-1.423259-1.163-2.8290.437
iota[5]-0.037854-0.620-2.0140.734
iota[6]-0.919838-1.295-2.9470.376
iota[7]-1.245577-1.039-2.6440.550
iota[8]-0.524771-0.505-1.6280.633
gamma[0]0.5639920.4610.4290.497
gamma[1]0.1803360.1910.1620.219
lambda[0]0.8089960.7470.7120.785
lambda[1]0.9792140.9970.9960.998

### Conclusion and future work

This project aimed to understand & model the nature of setting-specific COVID transmission with the purpose of informing and improving related policy interventions. The first-principles model presented in this article allows us to do this in a way that properly captures the data-generating process behind setting-specific transmission, all with considerations of a simple and feasible data collection methodology. Furthermore, the Bayesian nature of our model allows flexibility when incorporating expert knowledge through priors, as exemplified with the calibrated antigen test accuracy parameters, and provides accurate estimates even with few training samples thanks to its generative nature. Finally, from an implementation perspective, our model is built with scalability in mind, as the TensorFlow Probability version makes it feasible for large-scale applications with the aid of distributed computing architecture.

There is still much left to be done and many possible extensions could be added to make our model more robust and powerful. One possible angle for future work would be extending the model to encapsulate the effect of other policy interventions (such as social distancing regulations, vaccination, etc.) in order to derive insights about the effectiveness of these in different settings. Another possible extension could be to add temporal dynamics into the model by combining with an SIR-type model.

Overall, we believe this model serves as a great case for statistical modeling of setting-specific epidemic transmission, with great potential for future extensions and applications to similar settings. Additionally, it serves as a great example of the power of Bayesian modelling and the many benefits it can bring in applied settings.

Authors:
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 4.0 unless otherwise specified.

Comment