Measurements, interpreted with models, are how we make sense of the world. I'm excited to share @niket_h_thakkar's new approach for building surprisingly rich situational awareness models with surprisingly weak assumptions from disease surveillance data! arxiv.org/abs/2205.02150
The pandemic has us thinking about three types of models: forecasting, complex interventions, and situational awareness. Most models are only good at (at most) one of these things, and @niket_h_thakkar's new approach is really good at the third--better than anything else I know.
But first, what's situational awareness modeling? Situational awareness modeling works backward from what we can see (measure) to figure out what what's happening that we can't see (system state and dynamics). It helps us answer "what's going on?" and "how sure are we about it?"
Situational awareness modeling differs from complex interventions modeling focused on weighing future choices. This often benefits from good SA, but it doesn't always require it, as the best interventions are beneficial no matter what exactly is going on. Like vaccines.
Situational awareness modeling also differs from forecasting. IMO 9 times out of 10 in epi, situational awareness is what people need when given forecasts. "Don't take away my agency and tell me what's gonna happen! Tell me what's happening so I can figure out how to change it!"
Back to Niket's paper. How did he, with a little help from his friends, figure out a new way to do situational awareness modeling for infectious diseases?
First, we got up close with the daily practice of public health. This made clear that we needed fast, flexible methods to assimilate knowledge from the literature and constantly changing surveillance data into an understanding of the recent past and present state of the epidemic.
Second, decisions often needed to be made faster than even our fastest colleagues could model them. So it was often more useful to help policymakers understand what's happening, how past choices worked out, and how to think thru uncertainty. They do the rest better than we could.
Third, Niket is great at being unsatisfied by things people take for granted (like splines) and also at being entertained by things our field tends not to think deeply about (like combinatorics and entropy). (And fourth, I help with "I think I get it, but what about this?")
So what are the accomplishments? I think there are at least three.
First, from surveillance data, like COVID cases and hospitalizations over time, we show how to filter out all the observation noise from the signal and isolate the mean and variance of the transmission process.
We isolate the transmission process signal with a simple optimization procedure and linear algebra. We define an operator determined by only by the latent and infectious periods, identify a basis compatible with the biology, and project onto the basis to remove sampling noise.
This signal processing approach gives us a flexible smoothing of the epi curve AND a characterization of the variance in the transmission process separated from the variance in the observation process. Because all the steps are linear or convex, it's also very fast.
It's then very straightforward using basically standard SEIR compartmental model ideas to turn this into a transmission model that can tell you about hidden states of the system like the effective reproduction number, prevalence, and the fraction of infections reported as cases.
The second major innovation is that because the signal extraction is specified by the biology, the signal variance is also meaningful. We can model the variance and infer things like the daily variation in the overdispersion/superspreading parameter "k". theatlantic.com/health/archive/2020/09/k-overlooked-variable-driving-pandemic/616548/
And when we check them against other studies, we find similar results (like 85% of people infect no one on average; most transmission happens in superspreading events), but ours have daily temporal resolution. And again, they are computed very quickly on a laptop.
The next thing blew my mind. We can predict the outbreak investigation data in Washington state from surveillance time series data and case investigation follow-up rates, with only one global parameter distinguishing household from community spread. This is a 1-parameter fit!
Third, @niket_h_thakkar realized that with a model of the daily prevalence and mean and variation in transmission rates, he can reconceive it as a branching process. (My role was saying, "there are no individual superspreaders in your model", and he said "you're wrong, see?")
With a fully-specified branching process, Niket can construct transmission trees that are consistent with everything we know about the epidemic, within the assumptions of the model. This set of trees, built from the ground up, is the forest of the paper's title.
I love this and I'm blown away by it, even as I'm excited for all the ways it needs more development to really carefully compare to and hopefully someday integrate into existing phylogenetic workflows.
All this to say, I think what @niket_h_thakkar pulled off here is super, super cool. We're sure it's the start of a research journey that can make some of epi modeling a lot easier. It also really highlights how much you can learn when situational awareness is your goal.
This first paper is unusually personal in style, reflecting the intellectual environment that created it, and I don't know what audiences it'll find. So here, I'm gonna tag people, grouped by what I think they may find interesting. And if it got you, retweet and spread the word!