Building Better Systems

#8: Eric Davis – Building Better Data Models

Episode Summary

Dr. Eric Davis walks us through what it means for a data model to be trustworthy, what common pitfalls predictive models run into, reproducibility issues, and what can be done. We chat about how subject area experts are expected to be many things: statisticians, computer scientists, and mathematicians, and how that can sometimes lead to mistakes. We also look at the COVID-19 pandemic and how data models affect decision-making.

Episode Notes

https://www.imagwiki.nibib.nih.gov/ https://www.imagwiki.nibib.nih.gov/content/committee-credible-practice-modeling-simulation-healthcare-description https://www.biorxiv.org/content/10.1101/2020.08.07.239855v1 https://www.imagwiki.nibib.nih.gov/content/10-simple-rules-conformance-rubric

You can watch this episode on our Youtube Channel: https://youtube.com/c/BuildingBetterSystemsPodcast

Joey Dodds: https://galois.com/team/joey-dodds/

Shpat Morina: https://galois.com/team/shpat-morina/

Eric Davis: https://galois.com/team/eric-davis/

Galois, Inc.: https://galois.com/

Episode Transcription

Intro (00:02):

Designing manufacturing, installing and maintaining the high-speed electronic computer, the largest and most complex computers ever built.

Joey Dodds (00:22):

Hello, welcome to building better systems, the podcast where we explore tools and technologies that help us become better software engineers and build better programs and systems. I'm Joey Dodds,

Shpat Morina (00:33):

I'm Shpat Morina. Joey and I work at Galois, um, uh, R&D lab focused on hard problems in computer science. Today's episode is a little different. Um, joining us is Dr. Eric Davis, who also works at Galois, um, and his R and D work is focused on the application of formal methods. [inaudible] formal modeling techniques, actually on ML machine learning and data science in general, uh, specifically as it applies to more human centric, social applications like epidemiology, food insecurity, and things like that. Eric, it's nice to have you here. Thanks for joining us.

New Speaker (01:08):

Great to be here.

Shpat Morina (01:09):

So, part of what you looked at this this year has been kind of the, the global pandemic that we're going through in data models that kind of inform us of different, uh, facets of, of, of the pandemic and how it's going specifically their trustworthiness and what makes a good data model and usable data model of what makes a data model that might not be as reliable. Can you tell us a little bit about that?

Joey Dodds (01:34):

Yeah. So what we've been focusing on here is really a subset of kind of modeling for these social applications, which is modeling during crisis. So a big part of what we're looking at is not just the trustworthiness of the system in the end, but how we build an assessment of the credibility, uh, and the utility of the model for things like making decisions by a government planning office. So you can imagine, uh, when it comes time to distribute things like vaccines, one of the questions we have is what's the likely impact or when a state governors are looking at the impact of something like a masking mandate. Uh, and they want to know if I go out and I try to implement this policy, what's the likely outcomes and modeling is really something that we need in these cases, uh, because we can only run the experiment once.

New Speaker (02:19):

Uh, hopefully we never ended up in a pandemic like this, again, at least not within our lifetimes, um, and trustworthiness and credibility becomes a much more, um, intrinsic problem there. We don't have months or even years to, uh, assess these models. We have to know tomorrow whether or not we can rely on them during the crisis.

Joey Dodds (02:36):

So during COVID, we've seen policymaking, that's guided by models, right? That's something that's been happening and you're working, you're, you're working on what part of this problem in particular.

Eric Davis (02:50):

So we've been working on a couple different aspects of the problem. One is the actual model development. So how does a domain scientists get what they want into model context now for you? And I, we think of models explicitly often as a computer interpretable model, because we're not running the models on paper and, you know, rolling dice, like maybe they did back in the fifties.

Eric Davis (03:11):

Uh, it means getting the scientific and domain centric, uh, views into something that is a, you know, a high high-level language or otherwise computer interpretable and that's problem. Number one, which is how do I get an epidemiologist to code and the code? Well, you know, as, as you've discussed on the show a lot, there's a lot of problems with software reliability and credibility. We'd like to push as many of those, uh, uh, further away from the domain model or problem as we can. But then even once we have a version of the model, there's other questions we have to answer. One of them is, does the model actually represent what I intended it to? So I write up my paper, I have a systems of equations that tell me what that model is. Um, I have some real-world data that I intend the model to, uh, recapitulate.

Eric Davis (03:56):

Is that what the model solves or does it solve something completely unrelated or only partially related? Is it accidentally correct?

Shpat Morina (04:03):

Uh, how do you evaluate that?

Eric Davis (04:05):

Well, there's a lot of different ways to evaluate that kind of question. One of them is simply to take a look at data that you've not predicted with the model again, and see if the model predicts this new set of data that it should, that's adjacent to that problem domain. Um, but we're actually finding that there's deeper problems with a lot of these models that go beyond saying, can I predict something new it's even, can I predict what the paper said? It predicted we've worked with several models that, uh, you know, you'll have this very clear paper that says we ran this model with the following parameters. Here's what we got when we actually build that model or use the source code when it's available.

Eric Davis (04:39):

Uh, oftentimes we don't get those same results. Uh, there's actually a paper published by some colleagues of ours in 2020. Uh, this was in the molecular systems biology journal. Uh, the title is reproducibility and systems biology model. They did one of the most extensive studies to date on model reproducibility. They surveyed five, 455 models, uh, over 152 different journals found that 49% of those models could not be reproduced with the information provided. Uh, 12% of the models could be reproduced, uh, by empirical correction and support from the authors. Uh, but, uh, another 37% were not reproducible in any form. And when they reached out to these models, uh, the authors of the models, less than a third of them actually responded. So if I'm relying on these models that somebody in practice that's, this is a very hard problem.

Shpat Morina (05:26):

You say, when you, when you say with the information available, you mean the raw data. So if you know these folks from the nature paper, they took the raw data from, from these papers and they couldn't reproduce the models that they supposedly ended up in, um, ended up producing

Eric Davis (05:41):

It's, it's a multifaceted problem. Sometimes it's about the raw data. Did they provide the data that they use to benchmark their model? Usually the answer is no, or they don't provide it in a reproducible form. Sometimes it's worse than that. We found examples of models and literature that have got uncorrected for over a decade, uh, where they gave us a system of equations. They gave a parameterization of those equations, and then they gave the data that reproduced. But when we rebuilt that model, uh, we were unable to reproduce the results and what it came down to was errors in the translation and the paper, uh, or errors in the translation of their code, because fundamentally when these models are being published, now, there is a disconnect between what the author writes in the paper and what the software engineer implemented in the code and matching those together is a very hard problem right now because unlike a lot of, uh, system specifications for like security, um, up until recently, there was no way to get a specification from say, a system of differential equations. And compare that to say a C plus plus implementation.

Shpat Morina (06:40):

Well, uh, I have so many questions. Um, um, I, first of all, so how come right? I, I like to think that people aren't, you know, and I, I bet that people aren't, you know, purposefully malicious when it comes to these things. So there must be something that's going to going wrong with the reproducibility stuff, um, what's going on.

Eric Davis (07:00):

Well, I think one of the fundamental problems is if you're an epidemiologist today, uh, without new techniques, what societies, uh, in essence, asking you to do is to be an epidemiologist, uh, to be a data scientist, to be a statistician, uh, to be a software engineer and to be someone who's able to do their own quality assurance on their code. And that's a lot to ask any one person to do. So when we're putting all of this load on these domain scientists, uh, it really makes the problem and tractable, even when they have a large team working with them. Now we have the disconnect of the teams and the disagreement in languages. The scientists might speak ordinary differential equations. The software engineer might end up speaking something like Haskell or C or Python, and the statistician who's checking. The results is used to working in our, how do I get these three people dinner, operate the tools, um, didn't exist until recently, uh, to do that. And even today, those tools are in their infancy, uh, and has been a big piece that we've been working on here is to try to build tools that allow us to reason across these kind of different expressions of domain knowledge

Shpat Morina (07:59):

Makes a lot of sense. Are there, I know you've been looking at COVID, um, and the pandemic in general, are there, um, kind of striking examples of when something like this actually affected our kind of collective decision-making when it comes to COVID?

Eric Davis (08:13):

I would say that with, with COVID the, the real way that the reproducibility crisis has affected us, hasn't been in kind of mistakes used by the government, but rather just a slowing down of our ability to respond, uh, and kind of a need to rely on intuition as opposed to formal modeling. You know, this is something that has been looked at in other cases, too, NASA actually talked about the credibility and the reproducibility problem. After the challenger disaster, they wrote a set of standards and guidelines, uh, to try to limit the risk introduced by models and model credibility, uh, within the aerospace domain. Uh, and the FDA has released documents that try to do a similar thing, but, uh, the collaboration wasn't really motivated by an impending crisis now and have that slower cadence, uh, associated with it. So I'd say the biggest way that it's impacted us is, um, people have gotten a bit suspicious about, um, models in their credibility, uh, and within the COVID crisis, it's probably slowed our response some to, you know, compared to if we had been better prepared, you know, they noticed early on that models, weren't terribly great at predicting everything that they wanted them to, or you have a model that was really good at say, predicting the size of an effect of a social intervention, but it couldn't tell you how many people are going to be infected tomorrow.

Eric Davis (09:27):

So the question is, is it credible at all? Do I discard the model entirely because it's only partially credible, uh, and this, this has been, I guess, the primary impact is we haven't known how to use them even as we've had a proliferation of the number of models, uh, many of which could have actually been a lot more useful than they were.

Joey Dodds (09:44):

So to sort of restate the problem, there's data scientists working for the government more or less, and they're presented with this massive body of work. That's not necessarily great at representing how trustworthy it is on the whole. And furthermore, the history of that field has maybe called the trustworthy ended question. So there's, there's extra necessary steps when we're in this crisis situation where you first have to do an evaluation, then hopefully you would have done in the first place. Um, in order to provide more information about the models, is that a reasonable summary of the, the challenge that we're facing?

Eric Davis (10:19):

That's at least part of it. Another problem has been on the model integration side. You can imagine the three of us could all be working on the COVID response. Maybe I'm looking at the direct epidemiological effect Shpat is maybe designing a molecular system that models the interactions of a biochemical processes within the cell. And maybe you're looking at drug discovery at some point, I need to fuse these pieces of knowledge together, but we're all working independently at different research labs and different, uh, institutions. Uh, and at the end of the day, when we try to integrate and compose these models, those different assumptions, uh, that are oftentimes not stated explicitly, but rather implicitly in each of our heads, uh, then pose some problems too, at that integration stage. And there they become a lot of questions of, can I compose these models to build something to reason about the actual crisis. Uh, and we've seen that even outside of COVID, it's one of the primary challenges we've been looking at with food insecurity. If I model food insecurity, uh, for a country, for instance, I need hydrologists, I need agronomists. I need a population statistics, conflict experts, and I need to somehow fuse their knowledge in a programmatic and automatic way that checks for problems and inconsistencies. And that's just been something where the technology has been lagging like, uh, lately. But

Joey Dodds (11:34):

I mean, this, this one thing is lagging, but like the, you just described two dimensions for integration and, and with both of these going on, I think it's remarkable that we've also been able to make any use of models and produce anything. Cause you, on one hand mentioned this pipeline for a single model, which involves going all the way from data scientists, programmers

Joey Dodds (11:54):

Through statistical analyses. And then for each of those points, it sounds like those people are interfacing with other people that have parallel pipelines going on. And so, um, really the modeling world is obviously doing something remarkable to get anything accomplished. And it sounds like the bar needs to be raised a bit, but something really remarkable has gone on to, to inform our response here.

Eric Davis (12:16):

Yeah, there, there have been a lot of successes from this and there's been a lot of work, uh, by the government to try to standardize this rapidly in response to the crisis. So, so I think one of the big wins we've had in this area has been, uh, leadership from the NIH and other governmental organizations around, uh, the inter-agency modeling and analysis group. So imag set out very early in the crisis to tackle this issue before it got out of hand and say, we have a credibility and reproducibility crisis in the field, let's get people together and working groups and talk about what we can do to improve things.

Eric Davis (12:48):

And where that started so far is, um, they have a set what they call 10 simple rules for model credibility, uh, and they try to assess models on, you know, does it state its intended purpose? Uh, does it place itself properly in context, does it list its preconditions and requirements, uh, and, and kind of basically lay out a roadmap to reproducibility and credibility for the field, but then the next stage that we've been working on there is trying to make as much of that automatic as possible. Cause it's one thing to have kind of a sheet of guidelines for system development and integration. It's another thing to have the tools to help make sure you've done that because I think as you pointed out there quite a strictly, it's a, it's a very deep process, even within one model, let alone across composing models across different scales and domains.

Shpat Morina (13:31):

So how do we do better? Um, yeah, as Joey said, it's, it's remarkable, but you know, there are things it's your job to think, how do we do better? Right. And I feel like you spend a lot of time thinking about that. Um, you, you know, you, you positive the problem perfectly, you know, we, we asked too much of these folks, um, and you're working to maybe ease that burden a little bit, also working on, um, not only is the burden to make sure the result is more trustworthy. What's the, you know, softball question, how do we fix it all?

Eric Davis (14:04):

Well, I think one of the big ways is involving a human machine teaming. So it's taking the right now, extremely manual process and automating as many parts of it as we can while also understanding, uh, the burden we'd like to place on different people in the loop and, uh, how to check the work and communicate that work between them. So, uh, one of the big things we've been working on is building tools that internally we're thinking a lot about as compilers and interpreters, uh, they translate from one language to another, but what we're really doing is, is more making the equivalent of a compiler for knowledge. So it's separating out the concerns to different layers. When we interface with the main scientists, for instance, we let them express their model in a way that is machine readable, but also human understandable. So if they like to write down systems, differential equations, we actually ingest those equations along with groundings for what they represent to build an abstract mathematical representation, uh, that captures those systems semantics.

Joey Dodds (15:03):

So, you know, in every differential equation, there's a number of different assumptions that are hidden, but it tells us about the state of the system and how that state evolves. We can translate that then into a general set of abstract rules that we can then structure around questions about how could this be solved? You know, if all of the rules about how the system of all meets certain requirements, we can actually just solve them using, uh, estimation methods for differential equations. But if, for instance, I'm interested in a deeper statistical analysis. So I want more than just the mean, but maybe the variance on the system. So I can know not just what the average cases, but the worst case in the best case, uh, then I'm actually gonna want to do discrete event simulation, which is an entirely different, uh, implementation of the system with different solution techniques that would require a human, you know, weeks or months to implement.

Joey Dodds (15:52):

But we can automatically then compile that to a high-level language, uh, implementation with a solution framework, uh, using the stack. So we're, we're separating out the domain knowledge from the mathematical representations and requirements and from the executable knowledge and then using human machine teaming to automatically translate between those layers. So now the scientist doesn't have to touch code and the person who's working with code doesn't have to learn differential equations. They can work in these frameworks that they're interested with and the machine handles a lot of those translations. So we're, we're trying to put as much of those pieces that, that we tend to be bad at, right. Humans are bad at translating without errors. Uh, as long as we have a good grammar for the language machines are perfect at that, and they don't make those mistakes that we do. And so it's, it's kind of finding the right places for the scientists to work and the right place for the machine to work with.

Joey Dodds (16:40):

This feels a lot like, I mean, obviously this is, this is a software, so we're going to see the same problem, but this feels like almost a subclass of the general problem of how do we specify software in a convenient way. Um, and it's, it's not uncommon. You know, we hear about, for example, people building airplanes have physicists and more traditional engineers trying to pass requirements down to software engineers and those end up ambiguous. Um, and so you get a huge win if you can convince them that writing things down in a less ambiguous machine-readable format is useful. One of the hurdles there of course ends up being that sometimes that's a, that's a big step to take to change the way that you, that you work. Uh, I'm curious if, if you find that there's a willingness to, to change the way that people are working in order to make this integration, or if that's going to end up being a pretty heavy lift in this field,

Eric Davis (17:30):

I think it is a very heavy lift and our goal isn't to change how they're working with the systems it's to make minor adjustments or asked for a little bit more of a specification when they already give it to them, to us. So for instance, one of the, the ways that we actually build a model from a domain scientist specification is instead of asking them to a specification and a language that will make our job easier, we're learning to read what they're already writing in our papers. So we've already done some work for the government, for instance, where we take a paper that has a system of ordinary differential equations. Uh, we have actually something that will pull the raw LaTech out of the, uh, paper. So kind of the way that the mathematical language is represented in the source code for the paper they're submitting to the journal, and we can synthesize executable code directly from that.

Eric Davis (18:14):

Now, in some cases, the systems under specified, maybe they've got an assumption caught in their head that we don't catch. We're trying to also then build the tools to ask good questions to the user that help identify those assumptions when they're missing. A good example of this is we looked at, uh, a under specified system of equations that was missing some volumetric terms that were necessary to know about the interaction between susceptible and infected people. You think about, if I have one person who's susceptible and one who's infected in a small room versus a large room, it probably changes those social dynamics, some, uh, and you can't model social distancing without those volumetric, uh, notions. So, you know, how much space is there for them to take up? What's the probability they come into contact. So when we found those, we have these, the ability to learn from, uh, prior models that we've ingested and to look at differences in that mathematical representation.

Eric Davis (19:07):

So when we find there's a difference in structure between the two, uh, we can call that out to the scientist, or sometimes even correct those mistakes ourselves, by saying, I can't actually reproduce the results. You said I should've, but I took a guess and added this volume metric term. Cause I noticed a structural difference between your model and another model I've seen before. And when I made that difference, I got the results that you did. So we're looking at both automatic ways to correct these, but also just generating useful compiler error messages. You know, the compiler can tell me when I've used the wrong type of a variable and oftentimes suggest it's like, uh, you know, you had an integer here. It was in a float operation. I expected a float that can help me go through and kind of fix some of those. Typers why not do the same thing with domain knowledge.

Eric Davis (19:50):

I look at a piece of a model and I say, the shape of your model is very particular and it looks like a bacterial infection versus a viral infection. And I know there's different dynamics there. And I have an annotated version of say, like the snowman ontology of medical knowledge. And I can tell you, there is a shape I expect for a virus, but the shape you gave me was bacterial. Was this a mistake? Or was it your intention? And just being able to surface some of these, uh, these errors in a user-friendly way. Now it's a huge challenge. We'll probably be battling with this for the next several decades, but we're making amazing amounts of progress right now. So I think that there's a very promising outlook.

Joey Dodds (20:27):

One thing I'm hearing here is that, and this, I think I, this certainly goes against some instincts that I have is a software engineer and a PL researcher, um, is that you're getting a lot of gains from a laser focus on the domain itself and from not being afraid to specialize tools, to a specific domain, um, people that work in developing, you know, C compilers, for example, Haskell compilers, you want your thing to be able to work for everything. But even then a lot of times we look at languages like are sort of a mystified way, right? This is like, this is not generally useful. This doesn't make sense as a programming language to somebody who's spent a lot of time with programming languages. Um, but R is an incredibly successful language because it builds in domain knowledge for statisticians, and that makes it more usable for them. And you're doing, you're doing the same, the same for your tools. And I, and I think that in general, we shouldn't be too afraid of really building very specific tools for the people that need them. It obviously makes them more usable, but it's, it's, it's, it's, it's also hard, right? I can't go build a tool for data modeling, cause I don't know that domain. And so this is a, this is a big opportunity, I guess, for interdisciplinary.

Eric Davis (21:42):

And it's also a way I think, to fix some of those impedance mismatches. So, you know, I think what you just said is very insightful, right? You would not prefer to work in AR. So how would you work with a domain scientist who only understands R and, and not maybe not even that? Well, because ours is just the tool they learned to, to run their software. Well, one of the things we've been enabling with these techniques is we're not really imposing a language on the domain model or we're working with their language, but then we're trying to translate that to an abstract representation that can be implemented in many languages. So what we can do right now is take that system of ordinary differential equations, or even something like a pictorial representation of a compartmental model, which is another common way to represent epidemiological models.

Eric Davis (22:26):

And we can then compile that into high school and to Python or into C. So now I have different low-level representations, uh, from the domain knowledge, uh, perspective, but for a programming language, uh, expert, I'm saying, I'm going to take this. And auto-generate some code within a language that you may be more familiar with. Now you're turning pictures into code in some cases, yes, we actually have a, one of the early prototypes that we released for our epidemiological tools was allowing people to draw pictures of the compartmental models that show up in their papers anyway, uh, to ensure that the figures that they use will actually result in code that faithfully implements those pictures, uh, and vice versa. This is like a checker for drawings. It's actually something that's really important for biological papers. When you look at something like a cellular diagram or a compartmental model of epidemiology, you can almost imagine there's a compiler in the brain of the domain scientists.

Eric Davis (23:22):

They look at the model and they have expectations from that. They say, I've seen these sorts of drawings before I learned them. When I was a grad student, I used them when I write my own papers and they look at it and they generate an expectation. If that expectation isn't met by the code that was associated with it, how do they check where the problem is? Is, is it a code problem or did they draw the wrong picture by tying those two things and ensuring that the domain knowledge results in code and the code can result in domain knowledge. So we're going both ways in the process, we're helping to bridge some of these reproducibility issues by saying that all of the artifacts that exist about this will now result in transferable knowledge between these different representations.

Speaker 4 (24:06):

The follow on one. Yeah. One more thing I guess to say about this topic is yeah. That what I, I think a misconception that people doing engineering might have, and I think this, what you just said speaks to the fact that this isn't the reality you feel like when you fit something to a particular domain, you're, you're losing something in a sense, right? You're losing some degree of generality. And in the short term, that might be the case. You might have a less general useful tool, but it sounds like what your work is showing, um, is that you can regain the general generality pretty easily. Um, like you've gone back from these hyper specialized domain knowledge, fill tools back to something like C or Haskell where it can be interacted with, by a wide community. And that, and in that case, all you've done is one because you've helped somebody encode their knowledge and an efficient way, and then transformed it into something that's more widely usable, which was, which was, I guess, the goal to start with.

Eric Davis (25:00):

There, there is a counterpoint to that too, that I think adds another dimension, which is sometimes you'll have two domain, uh, representations that are fundamentally incompatible, maybe, uh, the language I'm using and the language you're using have different levels of expressivity. Uh, and they make different assumptions. And when those assumptions collide, I end up with two problems that simply aren't compostable, just because I'm, maybe I'm assuming something about timescales. I could be working on a continuous timescale. You could have a discourteous timescale. That's a fundamental part of your assumptive. When we try to combine these with our tool, those, uh, incompatibilities are surfaced. So there are cases where the domain knowledges aren't fully expressive, uh, and something represented in them. Won't be compatible with another system in another domain. But the nice thing is, in those cases, we can tell people where the errors are and where those problems are.

Eric Davis (25:50):

Uh, we've actually looked at some of these with agent based models. So agent based models are where every individual in the system has its own, uh, representation. And sometimes people look at these as every time the clock ticks agents all make a choice. Whereas a compartmental model makes the assumption that all agents are acting simultaneously and actions from agents can happen at any point on a continuous time interval. Those are fundamentally incompatible assumptions, and I can go from the continuous model to the discrete model, but I can't go from the discreet model back to the continuous because certain assumptions are lost in that representation. But if I try to compose those models, that's something my intermediate layer learns and can tell you and say, this model is compatible here, but it's not back the other models, not compatible with the first. And so if that was your intention, we've got to make a decision on whose, um, assumptions to work with.

Eric Davis (26:41):

And either that means we can't a model, or we can, but only with the limiting assumptions of that second model, which at least helps us with the credibility side that helps us answer those questions of what is the, the model intended for. How can we interpret the data and what uses are appropriate?

Joey Dodds (26:58):

Yeah. So if incompatibilities exists, do you want it? You want to know it and everybody should know it and you need to say it out loud and make it clear. And, and that's, uh, that's also a win. You don't always just get composability for free between these. Absolutely. All right. Shpat. I think that wraps up that topic.

Shpat Morina (27:15):

I was going to, I was wondering to two things, so first to bring it back to COVID, um, we're, well, we're hopefully at the, at the tail end of the, of the pandemic with the vaccinations ramping up and, and all that stuff, although who knows, but what do you see as potentially problematic in this stage that might come about as a, as a result of maybe models that aren't as accurate?

Eric Davis (27:41):

One of the hardest things that we can run into right now is models of vaccine distribution. And th this has been something that's been a problem. I think, um, at all levels of planning, uh, we've had really good distribution of the vaccine. All things considered, uh, really good participation from individuals, but we just don't have the data or the expectations for how this looks on a population wide rollout. So we already know that there are people who are resistant to getting vaccines. Um, a multidose vaccine has problems with, uh, dropout, which is a common problem in any medical procedure, right? When people just don't follow up on the treatment. You know, antibiotic resistance is a big reason, uh, that develops as people stop their antibiotics too early. What happens if someone only gets one dose of the vaccine and then neglects to go back for their second, what happens is, uh, the adoption rate of the vaccine starts to tail off, you know, there's good reasons to believe.

Eric Davis (28:32):

For instance, it will have a logistic curve as opposed to a linear curve of adoption, just because we're, we're doing well now, doesn't mean that a month from now, we can expect the same rate of vaccination compliance. Um, and we, we just don't have good models of this. Um, we had some really good models of quarantine coming into this from the source code one, uh, uh, incident in Asia. Uh, but one of the things we learned very quickly is that models of quarantine compliance in Southeast Asia simply aren't applicable to North America. In fact, if I looked at compliance in North America, I get very different levels of compliance for the West coast versus the Midwest versus the mid Atlantic or the South, uh, different populations respond differently to government orders, different populations respond differently to certain types of interventions. And so I think right now, what we're we're running into is this problem of not knowing what to expect.

Eric Davis (29:21):

Um, so the models that we have are very speculative and that increases the need for the kinds of things you've been mentioning around understanding where models are useful and where they aren't.

Joey Dodds (29:32):

Um, so the new situation we're finding ourselves in is, is making, is exacerbating this problem. If anything, it sounds like,

Eric Davis (29:39):

yeah, we're, we're entering this area of unknowns, uh, and difficulty with assessing the uncertainty around models. Um, and, uh, you know, this is probably going to spur brand new research and statistical methods for, you know, uncertainty, uh, and apply to situations, uh, that there really wasn't the same motivation for 10 years ago

Shpat Morina (29:59):

To change the topic a little bit, um, for listeners who may be working on things that involve data models, or maybe creating data models from, I don't know, some, some raw data, um, that without necessarily having access to some of the tools and approaches that you were working on developing, do you have any advice or pointers on what people can do right now to make sure that, uh, their models are a little bit more trustful?

Eric Davis (30:24):

Well, I think the, the best advice they can have right now would be to, to read a lot of the publications of the inter-agency modeling and analysis group. So, um, the, the CPMs, uh, uh, which you're going to have to give me a second to make sure I, uh, reproduce, uh, credibility myself, uh, credible practices of modeling and simulation. Um, so the NIH, if you search for inter-agency modeling and analysis group and credible practicing, and model and simulation and healthcare, uh, they have a lot of great guidelines on there for that. Um, the other advice I'd offer is to think beyond a lot of the best practices that have been established and things like machine learning and data science, a lot of the current techniques that we're relying on are oftentimes correct only by accident, because they were designed for areas where trustworthiness wasn't as critical.

Joey Dodds (31:12):

You can imagine if you are a major software corporation working on, uh, ad tech, you're concerned about, I want to build a model that's good enough that I capture a large portion of the people who, who might be responsive to the content I'm providing. Uh, a lot of our technology and our current software stack is built on that assumption. And that's a little dangerous when you start moving into these more human critical, uh, systems, we need a higher level. This is, you know, it's the same problem that kind of apply plagued the aerospace industry, uh, in earlier decades where they said software needs to be more reliable for using software for life critical systems. We need a higher level of trust. We need a higher level of credibility. Uh, and, uh, that's the situation we're finding ourselves now with a lot of machine learning and data science.

Eric Davis (31:56):

So I would say read up on the latest literature and question the assumptions that are underlying the methods that you're using, uh, and whether or not you are meeting those assumptions and requirements for credibility, uh, with those techniques that you have, almost everything we have, uh, in these spaces in statistical models have underlying assumptions, something as simple as a hypothesis tests assumes the normality of the underlying data. If you submit data to hypothesis test that isn't normally distributed, the results don't have meaning. So understand those requirements before you apply them, hopefully in five or 10 years, we'll have tools in front of you that check some of these assumptions for you like the compiler checks for type errors, but, you know, these technologies are still in their infancy. So, uh, you just, uh, I

Eric Davis (32:40):

Would say be more, be very informed on that and the risks and, and try to follow at least the, the current guidelines that are being put out.

Joey Dodds (32:47):

And it sounds like the, you know, there's a feeling that people are in situations where this doesn't matter as much, but, you know, maybe two years ago somebody would have thought, well, maybe it's not an emergency that I, that I nailed accuracy on my pandemic modeling as well. Um, and we learned over and over in the software world, um, you know, maybe you build something and it feels like it's not impactful at the moment you build it. But all of a sudden it's having a really out-sized and unexpected effect on people's lives. So this is across the board. Good advice is to take this sort of thing seriously, because you never know when, when the world could be trying to make decisions based on what, what you built. Um, in 2019, somebody building a model would maybe not have understand how much the world would need their model, um, and how quickly, uh, so it, so it's, it's really critical for people to take this challenge seriously, I think.

Eric Davis (33:35):

Absolutely.

Shpat Morina (33:37):

Yeah. Eric, I have a feeling that you're going to show up again in this podcast.

Eric Davis (33:40):

It'd be great to talk more about what we're working on and all the things we're working on right now, or our open source too. So, uh, hopefully in very short order, we'll be able to get some of these tools out to folks in the field.

Shpat Morina (33:52):

Yeah. We'll look through the link to some of that stuff, especially the information that you mentioned earlier. Um, Greg, thanks so much for joining us today. I know it's stayed up a little later than usual, so I appreciate that. Um, the, this was another episode of building better models, building better systems.