Testing infections with fewer tests than people

Daniel Tashjian
Tapad Engineering
Published in
4 min readMar 24, 2020

--

With COVID-19 dominating the attention of the world, and me being a computer programmer, I felt like I did not have anything to contribute to the situation. After all, I do not know anything about biology or pandemic response strategies. I am not a biologist, epidemiologist, or a statistician.

However, there are some things I do know about:

  • binary search trees
  • quicksort
  • divide and conquer algorithms

As it happens, we have a lot of people and very few tests. Could some of these computer science techniques be used to help test more people?

Here, I will outline a testing strategy that manages to test N people while performing less than N tests. The goal is to assign positive and negative results for every person.

It makes a few assumptions that, given my lack of credentials, I do not know if they are true or not. I would love it if any experts out there could weigh in on these assumptions:

  1. samples are more plentiful than tests
  2. samples can be combined such that if any sample is positive, the combination tests as positive

From here on out, we will take these assumptions as granted.

This testing strategy has multiple phases:

Phase 1: Sampling

When taking samples from people, you should take many samples instead of just 1. Let’s assume we take 11 samples from each person for now.

Phase 2: Segmentation

A segment is a subset of your population. This testing scheme will be done for each segment of the population. Since we have 11 samples per person, we have a maximum segment size of (11 — 1)² = 1024. Randomize the population into segments of this size.

Phase 3: Testing

For each segment, follow this recursive algorithm:

1 : combine one sample from each person in the segment

2 : test the combined sample for the virus

3a : if it is negative, stop: everyone in the segment is negative

3b : if it is positive, divide the segment of size N into 2 sub-segments of size N / 2, go back to step 1 for both sub-segments

Repeat this process of branching out in halves until you test segments of size 1, or stop at step 3a for all branches. The depth of the tree never exceeds log2(segment-size) + 1, so 11 samples per person is sufficient.

In order to determine the effectiveness of this approach, I wrote up a quick Scala script to try it out:

With a ratio of 0.167 (or 16.7% of your population being infected), it appears to break even with using about 1024 tests to test 1024 people. Surprising to me, at least, is that the 16.7% break-even point appears to be a magic number that is the break-even point for any segment size based on empirical testing.

If the percentage of people infected is higher, then this testing scheme is actually worse than simply testing one person at a time. However, if the percentage is lower, then you end up requiring fewer tests than the segment size.

Here are some examples of tests needed for a given infection rate in a segment size of 1024:

16.7% : ~1024 tests

15% : ~959 tests

5% : ~466 tests

1% : ~137 tests

exactly 1 person infected : exactly 21 tests

nobody infected : exactly 1 test

Obviously, this scheme saves more on tests if the percentage of infections in your segment size is lower. So, what that tells me is that we could use this scheme on populations we know have a low infection rate, and only for people who are asymptomatic and unaware of coming into contact with people who are symptomatic or who tested positive.

Another interesting thing to note is that you could start off testing randomly one by one, and when you have high confidence in the infection rate, switch over to this scheme if the rate is well below 16.7%. So, imagine you have samples from millions of people. If you randomly test, say, 2000 of them individually, you will have a decent degree of confidence regarding whether the infection rate is well below 16.7%.

It could be this testing strategy is completely useless. Maybe the assumptions don’t hold, maybe it’s unfeasible to collect and process so many samples, maybe the results would take too long to come in. There are many reasons why this may not work. However, we don’t move forward by doubting ourselves. We move forward by thinking outside of the box until we land on something valuable. I encourage everyone to consider what they can do to address the problems you and others face, both big and small.

Daniel Tashjian is a software engineer at Tapad. The views and opinions expressed in this article are those of the author and do not necessarily reflect the views, opinions, or policies of Tapad, Inc. nor any of its affiliates.

--

--