Building Better Systems

Episode #20: Ankush Desai — P: The Modeling Language That Could

Episode Summary

Joey and Shpat talk with Ankush Desai, a Senior Applied Scientist at AWS and one of the primary developers behind the P language. They dig into uses for P, bug finding, and what it takes for formal methods researchers to build useful tools for applied engineers.

Episode Notes

Watch all our episodes on the Building Better Systems youtube channel.

Ankush Desai: https://www.linkedin.com/in/ankush-desai/

Joey Dodds: https://galois.com/team/joey-dodds/

Shpat Morina: https://galois.com/team/shpat-morina/

Galois, Inc.: https://galois.com/

Episode Transcription

Shpat (00:00):

Hello, everybody. Welcome to another episode of building better systems. My name is Shpat Morina.

Joey (00:06):

And I'm Joey Dodds,

Shpat (00:08):

Joey and I work at Galios a computer science research lab focused on trust in critical systems today. Uh, we get to talk with AnKush Desai. Ankush is one of the main developers behind a language called P, um, which is a modeling language for concurrency that's, um, has been seeing some success in, in, in use in, in industry. Um, he's also a senior applied scientist at Amazon web services. And in this interview, we get into what P is and, uh, the uses and why, what has made it somewhat successful. Um, um, and then, uh, we also talk a little bit about, uh, um, bug finding in general and what it takes for formal research, formal methods, researchers, um, to build tools that are actually useful to, to applied engineers. Let's get into it,

intro (00:59):

Designing manufacturing, installing and maintaining the high speed electronic computer, the largest and most complex computers ever built.

Shpat (01:19):

Thanks for joining us Ankush.

Ankush Desai (01:21):

I'm I'm excited to be here. Thanks.

Shpat (01:25):

Maybe we can just jump in. I'd love to hear there's this modeling language for distributed systems called P that that is sort of your, your pet project. It's something you're, you're proud of. Can you tell us what it is?

Ankush Desai (01:39):

Uh, yeah, so, so P is, uh, basically a frame programming frame. It's a high level programming language. It's a state machine based programming language for, uh, creating formal models and writing specifications about, uh, basically concurrent distributed systems. So, so basically it's, uh, you can think of it as, uh, giving a high level language using which developers can create formal models of are complicated systems, but as communicating state machines. Uh, so which is how basically normally developers think about their complex system design, right as state machines that evolve from states and communicate with each other. So basically P provides this level language to capture that protocol logic, and then it has several backend analysis engines to, to show whatever you modeled in this language, uh, or whatever your formal design is. It, it, it satisfies the, the correctness specifications that you want your system to satisfy. And, and then basically it also has a code generation so that you can generate code and execute. But the main main thing is a high level language, which analysis engine, uh, with, uh, for creating former designs and so on.

Shpat (02:49):

Um, what it sounds like it's somewhat different from other modeling languages, um, that are out there. What, what makes it different aside from the sort of state machine part of it?

Ankush Desai (02:59):

Yeah, yeah. Uh, yeah, I mean, actually, so, so at the core of it, there is no real difference, right? I mean, there are so many different formal modeling languages, uh, that allow you to capture your former design in, in, in a, in a high level language and have backend analysis engines and so on. So in that aspect, E is not nothing different. The main thing I think that that has led to be being used by, uh, developers and in industry and in academia is, is because it's a programming language. So, so basically it's a high level it's state machine. It provides you this textual way of syntactical writing state machines, but the functions insides there, the state machines are basically imperative programs. So, and which developers are very much, uh, familiar with writing, right? So, so though you, the main thing YP worked, I think is because, uh, it's a modeling language, which feels like a programming language. So the pro, so the sort of developers don't be basically are not using, uh, a math or, or mathematical formulas when creating these models, but they're writing programs, but at a higher level of abstraction of state machines. Got it. I think that that's the real difference.

Joey (04:11):

The, the idea then is I'm an engineer and I'm about to build, or I've built a concurrent system with all these processes that are changing state. And this is a nice way to write that down, but eliminate a lot of the detail that I'd normally have to write to get to full implementation. And then also it lets you think about the properties of, of that. Yes,

Ankush Desai (04:33):

Yes.

Joey (04:34):

But, but one of the challenges also, right, is that, um, well, like if, think about properties of systems that are evolving is like a really challenging problem. Right. And so there's things like linear temporal logic and complicated logics that, you know, Mo modal ways of reasoning about things. Yeah. Yeah. The model MP is quite a bit simpler, right?

Ankush Desai (04:54):

Yes. Yes. I, I think the, you touched upon a very good point is that, uh, like P is programming language, it's a high level programming language. And when programmer is thinking about their system, P allows them to think about it at a higher language of state machines. And, and, and, and so basically they can use P as a thinking tool, right? Like, like if I remove all the complications from my system, and if I want to think about my system at, uh, higher level as state machines, then I can use P and, and that, that process itself of thinking about things at a higher level eliminates a lot of, uh, uh, gaps in your thinking and, and, and, and basically, uh, uh, yeah, yeah. And about writing specifications, uh, is, is that you're right, that you are creating formal models as state machines, but in P the other thing that is really nice is that even the specifications are written as state machines, like, like basically the specifications are written as runtime monitors, uh, which observe the state of the system. And then you assert global in variant from your system based on what observations these monitors are making, which is again, much more easier to, for the developers than to write mathematical formulas, uh, with in variants and, and, and, and, and so on. So, so in those cases, like creating this formal models and specific is much easier in, in P

Shpat (06:19):

So, um, sounds like P has had some success in industry in being maybe adopted, uh, maybe somewhat compared to things like, um, TLA plus, um, uh, what other, like, from your experience, like, you know, part of it being a programming Le uh, modeling language that feels like a programming language and people are familiar with it I'm sure plays a role. Yeah. But it, by successful, I also mean it, it, it helps find, um, issues maybe with the models, models that you're working yeah. With in your experience, what has caused that, uh, that it being more maybe amenable to, to industry. I'm curious your thoughts about it.

Ankush Desai (06:58):

I mean, so, so first of all, uh, you mentioned about like, being used more than TLA plus, but I would not put any comparison yeah. In the, in the sense that like liket plus has its own place. Right. Uh, yeah. And, and P is, uh, and, and in fact, like TLA plus like P takes a lot of inspiration from TLA plus, but one of the things like I mentioned was that ELA plus, like Leslie Leslie lamp says is, is like a way of thinking about system, your system mathematically. Um, and, and, and, and that has its own space because when you're reasoning about a system at a very higher level of abstraction, it's easier to think about it mathematically, right. But then as you start adding details into your system, uh, where, where you're going to the low level details of your system, how low level components interact with each other, and so on now still think of thinking about all that mathematically is super hard.

Ankush Desai (07:54):

So, so, so steel has a place where TLA and other kind of tools, like a lot and, and so on, which are mathematical modeling languages have their space because they provide you guarantees, but they work at a level of abstraction. And E what, and, and obviously P basically what it does is it gives you a programming language for modeling. So you can model even a detailed system in it. You can capture everything that you have in your implementation in P because, uh, the region of P was actually as a modeling and implementation language, uh, uh, and so on. So, but, but obviously when you add details into your model, you start to give out this, give up on proofs and, and, and, and guarantees, right? Because, because your system itself is model that is modeled and P is so complicated that it cannot provide you, uh, guarantees anymore.

Ankush Desai (08:47):

Uh, but, but in, in the reason why it wo like your question was why, why it worked out in industry is I think in, uh, uh, because, uh, having this, uh, way in which you can create detailed PMO, detailed models of your system, and then having an, uh, backend Explorer that can, uh, that uses various heuristics that are there in, uh, to find concurrency bugs in your system. There's two things together are helped in, in, in, in the application of, uh, like, like the success of P in industry, basically, uh, like the ability to cap create detailed models, and then using, uh, scalable searching techniques to find corner cut case, uh, concurrency bugs, basically

Joey (09:30):

When you're, when you're contrasting P with tools like TLA plus or alloy, it sounds like one of the differences you mentioned is that, that you can, you, you have that high level of abstraction available to you, but you can describe quite a bit more, um, in a straightforward way in P and part of that sounds like it just comes from the idea of, in whether, whether P did this on purpose or not. I guess maybe you can tell us, but the, the idea is like we have all these things we wanna say about concurrent systems, how would a programmer want to say them? And that's really kind the guiding principle in the, in the design of P is that, is that accurate?

Ankush Desai (10:08):

Yeah. Yeah. I mean, you would like to have the ability to capture, uh, implementation level details in your formal more as well, if you want to, if that implementation level detail is important to the correctness of the system. So, so, so, so, so, yeah, and, and basically, like you said, ed, uh, P makes it a bit easier to capture those details as compared to other languages.

Joey (10:33):

And was this, was this kind of an intent design, design decision? Like you, you, um, you know, someone was like, well, the there's the shortcoming, or how did, how did this kind of come to BNP? How did this, how did this, uh, yeah. What, what made it clear that this was the, an ideal way to do things?

Ankush Desai (10:52):

Uh, I think, I think, yeah, you're, you're right. I think the main reason for this well was we were, we were, when, when, when we started on it, I think we wanted P to be not just a modeling language, but also be an implementation language. So the idea was to, to use P as your describing your design and so on, but also keep evolving the P models to reach to an implementation and then do code generation to, uh, to basically, uh, gen deploy the generated code, uh, and so on. So, so, so we were not never thinking of PS just as a modeling language, but both as a modeling and an implementation language. So now, if you want this to be an implementation language, you, you have to allow the, the ability to, to, to capture this, uh, uh, uh, low level details as well.

Joey (11:43):

That makes sense. And you mentioned scalable backend analysis, so you give up on something, but presumably that's still okay. Uh, how does that, how does that all work out?

Ankush Desai (11:55):

Um, so even there it's, uh, it's interesting. What I meant by scalable backend was that we have multiple backends for P we just don't have one. Uh, uh, so, uh, if, and, and the backends are, we use different backends based on what the goal is, right? So, so if your goal is, uh, basically bug finding, like you wanna find in concurrency bugs in a design in, uh, and we wanna find them, uh, so in, in both cases, cases, we have this, there is this, uh, a lot of work that has been done in doing systematic concurrency testing, uh, uh, where you use different bounding technique. You use various, uh, search, uh, prioritization techniques to prioritize your, uh, exploration of search and so on. And these techniques are very efficient in finding concurrency bugs in your system. So, so P has a backend that is a variant of basically an explicit state model checker that, that explicitly enumerates different behaviors.

Ankush Desai (12:53):

In some cases, uh, after a certain depth does, uh, prioritized sampling of the sort space based on, uh, different things. And, and this is way more efficient in finding concurrency bugs as compared to any other technique, like symbolic execution or any of those things. But obviously when you are doing this kind of explicit state thing, you don't get guarantees, right? Because you're not gonna finish exploring all the state space. So when, when, when the goal is to find as many bugs as possible, we use this back in. Now, if you want, if your goal is to get some Sounders guarantees, then we have an execut symbolic execution engine back in that gives you the guarantee that, that all the states, uh, in the, like for a finite program, we will explore all the states of your system. But, uh, and, uh, yes, so, so, but then symbolic execution has its own scalability problem.

Ankush Desai (13:41):

So if you throw a way more detailed, uh, model to the symbolic execution engine, it's not gonna finish. So, so, so, and then we are in, in currently working with, uh, uh, a, uh, uh, uh, uh, sunset and other at Berkeley to build a verification, like a deductive verification engine for P where the idea would be to do proofs. But then again, like I said, like if the idea was to do proofs, we can like having to do proofs for detailed P programs would be hard. So now it would be like people will use the P verifier for creating very high program, um, uh, AB the programs at a very high level of abstraction. So, so basically, yeah. Yeah. So, so when, what I meant by scalable is scalable based on the goal. Uh, uh, so if the goal is bug finding, we have a scalable engine that runs on a distributed cluster. If the goal is, uh, symbolic execution, and then we have something where it's not as this efficient at, as fuzzing or test, uh, or, or, uh, sophisticated model checking, uh, for finding bugs, uh, and so on. Yeah.

Joey (14:44):

Yeah. And in practice, what, what you're seeing in industry, is there a most common degree of complexity and scalability that's, that's needed that sort of indicating the tool should, should focus in a specific Direction?

Ankush Desai (14:59):

I think there as well, I, I mean, and my thought has evolved. Like, I, I used to think a lot different when I was in grad school. I I'm thinking I, I, I then I thought very differently. And now I think it's very different and tell you an example of why I do that, but, but to answer your question, I feel all the three of them have their own place. I don't think any one of them is going is, is enough to solve the problem. And I, I really strongly believe on a portfolio backend, like in the sense that, uh, if a, let's say there is a principle engineer or a, or a senior per engineer in a team who is very first time trying to design a distributed protocol at an abstract level, let's say he is trying to come with a variant of PS or something like that.

Ankush Desai (15:40):

And he wants to show that my variant is correct. And there, he wants to convince the entire group that, uh, like the, he or she, they, that, that, that entire group that, uh, basically my protocol is correct. He would like to give a deductive proof, right. So then the proof he can, but he can use the same language, which everybody else understands and then show a deductive proof, right? So that there's value of this backend. Then, then, uh, there's another developer to say, Hey, let me add details into your protocol. And, but I would like to show that for finite instances of this, it works correctly. So maybe they will use symbolic execution engine. And now they're adding more and more implementation details into the Packo variants. And it reaches a point where it's more than one 50 K line, like 1 51, 1,500 lines of code. Then I think doing a proof for it is super hard symbolic execution engine. We don't know currently it doesn't scale. So then there is a place for the thread back in. So, so I, I really feel that based on the crowd, the, the, the developer community that is doing that work, uh, uh, all the three of them ha ha have its own place. Yeah. Yeah.

Joey (16:46):

Is it possible to have the models exist together? Like, can you do sort of a well refinement kind of thing where you have your high level model coexisting with your low level model, and you're able to show all that stuff at the same time, if you kinda commit to this approach over time.

Ankush Desai (17:03):

Yeah. Yeah. And I, I think, so this has not been done and I don't know how to do it right now, but I really think that it would be, like you said, it would be really great if we had a front end with the capability to use any methodology that you would like to analyze that program. Like you have front end with the same, same, like, it, it not, it may not repeat, it can be any other language which allows you to write inductive in variance and do a proof for it. Then it allows you to do a refinement to a more model. And there maybe the, you, you, you not doing proofs, but you're doing something else there. Then you again go down and, and eventually you reach to implementation, but when it becomes, uh, so detailed, but it's still in the same framework, it's still in the same programming framework. Uh, and there is connection between each layers, uh, and maybe the connection between each layer in maybe the very top layers it's still done using proof, but in the bottom layers, now it's so detailed that you don't do proof, but you do testing. Yeah. But, but I think, I think that that could be a very nice, uh, yeah. Yeah.

Joey (18:09):

So for all the, for all the concurrent systems researchers out there just, uh, you know, so hard up for topics to study there, there, there you go. Um, you can, you can try to build this, uh, this unified system. Um, yeah.

Ankush Desai (18:22):

Yeah.

Joey (18:23):

I don't know. I want to, I wanna just like pause for a minute, cuz I, I think so. I, I, we live in the world, I guess, often of sequential verification. And we're talking about here, here about concurrent verification, um, for people that haven't spent a lot of time in that world. Um, can you explain just why it's so hard to reason about these systems?

Ankush Desai (18:43):

Yeah, I mean, so, so in at least the way I, I, I basically, uh, explain to people, is that in a, in a sequential program, the only form of non-determinism is input, right? Like the, that you have to make sure that given a se like sequential function, given some non-deterministic inputs, you wanna make sure that the after the function is executed, the outputs are, are as, as expected, right? So the only non-determinism is different types of input that you can receive. Whereas in a concurrent system, you still have the input on a feminism because your system gets values and, and you have to make sure that your system is correct in the presence of input on a feminism. But in, in, in, in addition to that, you also have to make sure that it is correct with respect to the control. Non-determinism the sense of concurrency, like when you and this functions, they can execute in, uh, parallel so they can mutate states in parallel and so on.

Ankush Desai (19:39):

So not just the input on determinism, you have to make sure that, uh, the system is correct even in the presence of concurrency or control, uh, non-determinism in your program. And that's why it's super hard. Uh, so your in way that you write to prove this programs in the original sequential program, the in variance wear on inputs, and then saying that the precondition on the input and the post here, your in variance also have to capture the, the basically, uh, the inter leavings of messages that can happen in the system and so on. And that's where it, it's all the inter leavings of functions that can happen, uh, in the system. So, so, so that, and that's where thinking about, uh, one of the most hardest part when doing this proof for system is you now have to think about when I'm reasoning about the correctness of a program. I also have to reason about what the state of the other program would be when I'm, I'm doing this, sorry, process would be where I'm proving this process. Correct. And that's where the, the hard part is in doing verification for concurrent programs. Yeah.

Joey (20:40):

Yeah. I have a, I have a, a friend that's spending all their time doing like a, sort of a, a man manual proofs about concurrent programs. Um, and I, I call 'em sometimes and they say like, oh, you're just doing, you're just doing sequential verification. Huh? I'm like, uh, yeah, it's still hard. yeah. Yeah. I promise it's hard.

Ankush Desai (21:02):

Actually, I, I would, I would say one more thing here is that they are, there is a lot of work in, in academia, on converting, uh, uh, parallel programs into sequential representations and then doing proofs, like, and, and, and, and there is this idea of sequential of concurrent programs. Uh, basically if you can think about a concurrent program as a sequential program with a scheduler where basically what you do is at each step after you take a step of a thread, you convert your whole concurrent program into a sequential program where after each step, you reach a conditional conditional inside the scheduler, which says, Hey, which next not where you non deterministically choose which next process or threat to execute, and which can be like, it's a it's if have a, if, if choice or something like that, right? You, you Donally pick any threat.

Ankush Desai (21:53):

And then you take that thread, make a step again, make a, and that's in a loop. So it's a sequential program where at each conditional you're picking which next threat to go, and then you prove the sequential program. Correct. And, and, and there has been a lot of work on, on basically doing se of concurrent programs and then applying the same techniques that you used to prove sequential program. Correct. Now you can apply it to prove, uh, uh, concurrent programs. Correct. But again, again, the complexity becomes more because now you have, so like, you have represented a non-determinism corresponding to concurrency as basically controlled non-determinism in your loop. Right. Like you have this conditional, and so now your in variants become more harder.

Joey (22:36):

Yeah. Yeah. I can, I can imagine, uh, pointing an engineer this and saying like, yeah, you have how you have how many nest statements with symbolic, uh, cases there. Yeah. And yeah, the tool would, the tool would, uh, you know, it try, but it wouldn't, it wouldn't have much like .

Ankush Desai (22:52):

Yes.

Shpat (22:52):

Would you suggest, um, let's say there are listeners who are working on, on, um, developing really a concurrent system, maybe with more let's call 'em traditional sort of methods, at which point should they reach for P um, if they haven't at the beginning,

Ankush Desai (23:10):

I would say at the start, I would say when you are starting to think about your system, that's when you should start with P because, because what we have found, uh, even when we were using it P in the, in, in, in the past, like, uh, uh, with, with Microsoft or, or even right now, uh, in, in its what we have found is that, uh, basically he has its own value. Uh, like it's not the P checker that finds most of the bugs. It's actually the, the, of creating formal models, it's and writing specifications for the system itself. That, that would find you, uh, like maybe a large factor of the bugs that are there in your system. Uh, and, and it's basically, and basically I call it as B being used as a thinking tool, because whenever you say, Hey, uh, to, to a particular engineer that go create a P model, the first thing that he would have to, he, or she would have to, like they would have to do is basically go and think about at what level of abstraction do I want to create these models.

Ankush Desai (24:14):

Right. And what are the components with which my, my, uh, uh, system that I'm trying to model is interacting with, how is it interacting? What are the assumptions I'm making about those other components and so on? So, so, so even the process, so even before they write a single line of P code, uh, they are thinking about all this, and then once they start writing it, that process of writing and capturing this thing itself finds a lot of book, and this is nothing to do with PS, such I, any language they would've used in used. This is the process itself eliminates a lot of, uh, problems. Uh, so, so it's, it's always better to start with P early on to remove all these bug. And, and, and then you can, as your system evolves, you can keep modifying the P models and specifications and, and, and so on.

Shpat (25:02):

Makes a lot of sense.

Joey (25:04):

Yeah. I, I have a, I guess I'm gonna pitch my own thing here. I have an art, a one page article called you already know formal methods. And it says almost, almost exactly that it's like, don't, I, I don't care how you write it down. Like, just like, yeah, sit down and try to say what your system's doing, and you're gonna realize that, like you, weren't thinking through it quite as well as you could of if like you write it on paper, but then I think, yeah, yeah. If you can manage adding a degree of, of rigor and like structure, like what P provides will really help, like helps you just, the less you can get away with, with sloppy thinking, the better you are you're likely to do in this kind of process. And so, yeah, having, having a language with a little, you know, a bit of a type checker and a bit of structure will, will help even more in that process. And it's, it's always amazing that you don't even really ha like running the tool, find some of that really hard to find stuff, but you'll shake out so much of the easy stuff right off the bat. That's I think always surprising to people.

Ankush Desai (25:55):

Yeah. Yeah. In fact, I, I, when I was in grad school, I, I, I read several of these articles, which said that, Hey, writing specifications is super important. Writing specifications is super important. And, and most of the research that I did was on scaling P right. Like trying to, trying to make, make these analysis scale better. And, and so, so my focus was how can I use the checker to find as many bugs as possible? Right. And then when I came, like when I went, I came to industry and people started using P like, it kind of was like a D thing that here, whatever people and the great researchers in the community were saying, the importance of writing specification it's actually true. So , so it was, it was like, okay, great. Uh, I think, uh, yeah, so I, I now understand why formal methods community really emphasizes on writing specifications, because just writing itself is, is, is, is good enough.

Shpat (26:44):

Actually,

Ankush Desai (26:46):

It's a step it's not good enough, but it's a step towards getting there. Yes.

Shpat (26:50):

Yeah. That reminds me of a question that I, I didn't get to ask earlier, which is, I mean, P has some history now, right. I mean, at least 10 years, or maybe almost 10 years, how did you get started with it and how, how did it get started basically? What was your journey with it? Yeah. Yours personally

Ankush Desai (27:08):

Yeah. So, yeah. Yeah. So, I mean, I, I, you know, like I, I, I shared this thing with most of the people who say, Hey, why did you come up with me? Like, what is the, and I, I very proudly say, uh, this is, and I, I, I feel like, and, and I, I very strongly believe that that reason why it works out, or maybe it, it, it, uh, is that P was not designed by researchers. It was not designed by formal method researchers. It was designed by USB device driver developers. So, so the, the thing is, so, so, uh, when I, uh, before starting grad school, I was part of Microsoft research, uh, uh, uh, uh, working with Sheran Raska the, uh, and so on. Um, and, and at that time, uh, the USB three point, the, the, the device driver team in Microsoft, they wanted to go from USB 2.0 to USB 3.0.

Ankush Desai (27:59):

And, uh, and, and basically build a completely new set of device driver, I, for it, uh, which was completely concurrent. So USB 2.0, did not have as much concurrency. And they wanted to exploit concurrency for performance, but they were scared that when they are coming up with this concurrent design, they do not introduce concurrency bugs into it, which may lead to blue screen, uh, errors and, and so on. Right. Um, and, and so what they did was, uh, basically they wanted to use us, uh, uh, a concurrency analysis. So at that point, uh, the, uh, sh and Shaw had built, uh, this, uh, model checker called zing, which was an explicit state model checker, uh, for, for concurrent, uh, system. Uh, so what the USB device driver team did was when they were trying to create the design for the device drivers, they used visual diagrams.

Ankush Desai (28:50):

So, so visual is this, uh, state machine diagram, or basically diagram tool from Microsoft, right? So they created this state machines in visual, uh, which basically is what P state machines are right now. They created these state machines, uh, which described the whole U protocol, uh, stack. And they wrote, uh, like partial script to go from those steak machines to zing language, which was a model checker, uh, to analyze their design representation and, and find bugs in them. And, and that process found a lot of bugs, like more than a hundred bugs in, in the, in their, uh, design itself. Um, so the whole idea of having state machines, uh, concurrently executing state machines, uh, and, and, and, uh, yeah, so the, so the, the language of it was basically designed by the USB device driver team. The whole idea of thinking about things as state machine was their idea, what, what I think Shaz and I, and, and Shera, what we did was we gave it a formal semantics, basically what they were doing. We gave it a language semantics and wrote a compiler for it, uh, and basically, uh, took it from there. But I think the whole idea of having, uh, state machine as a language for writing things, and then having a backend analyzer was, was basically driven by the developers. Uh, and they came up with this whole, uh, uh, idea

Shpat (30:12):

That's fascinating to me, especially since it is finding use. And it's actually, you know, um, I'm gonna maybe ask a question, see whether anything generalizes for the, for the, let's say there's formal methods researchers, or just people working on the next generation of tools and whatever their, their research is about. Yeah,

Ankush Desai (30:31):

Yeah,

Shpat (30:32):

Yeah. Um, when you think about people doing that with a, with a tool, whether it be a modeling language or anything else, what can they learn from this aside, from just knocking on doors at big companies and saying, are you working on anything that we can take and formalize a little bit more? Yeah. I'm curious whether you thought about that at all.

Ankush Desai (30:50):

Yeah. I mean, so, so SHA and I had a, we had a lot of conversation about this that, Hey, why, why, what is the real thing behind this? I it's, I it's definitely not the language it's, it's definitely not P is not magical. There there's a P language is very much similar to state charts, which has been around for a while, right? Like, like the idea of writing your state, your protocol as a state machine is, is not our idea. Like, it, it, it is very well known. So what is it that is that, that is working here, right? Uh, what is it that, that we should give to the developers that will help them and, and reasoning about the system? And it came down to very three very basic ideas, which are again, uh, derived from existing ideas in, in our, uh, community, the, the basic idea was it the, which we need to give a language.

Ankush Desai (31:39):

It could be any language, a language that allows developers to write their system at a higher level of abstraction. It could be state machines, it could be actors, it could be communicating, uh, a processes, whatever is suited for that team. There has to be this high level language. Okay. Once you have that language, which is good enough for writing, creating formal models and writing specifications, what do you is a very simple mechanism of controlling non-determinism in that language so that you can explore different behaviors in that system. Okay. Like, like, and, and now when I say controlling non-determinism, you can control, non-determinism either using fuzzing. You can control, non-determinism either doing symbolic execution, uh, exploration and so on or, and so on. But, but what I meant was based on the flavor you want, you can just introduce a controller that, that allows you to do more than what would happen if you did, uh, testing in production, right?

Ankush Desai (32:37):

Like you need a way in which you can control longer, determinism explore different things and so on. So that's the second thing. Um, so the second is control, but, uh, even when you have control, the third thing is if you hit a bug, you need to have an ability to reproduce that bug. Okay. That's it. So those were the three basic thing that we found was to per important. If you give, if you build any framework, it could be analysis of real implementation. And anything, if you could do this three things where, where you have the control over what you explore in your system, so that you can systematically explore different things, and you can explore things which have low probability of appearing in a production system, right? So you build a controller that explores behaviors in your system, which are low probability of occurring, uh, in a production system.

Ankush Desai (33:25):

But the ability that whenever there is a failure or assertion violation, you can reproduce it. And, and that's the, and, and that's what model checking basically allows you to do, right? Like produce counter examples, which are reproducible. So, so that's the, that those are the three base, say things. If you provide that in any framework, uh, it would, it would, it would, would work with the developers. And, and I think, uh, that, that answers that, that the key thing, if you do testing you, you can see things failing, but debugging, those failures for a concurrent system is super hard because of the, uh, inability to do reproducible, uh, uh, bugs. Right. I mean, and run the concurrent program again. And again, in some cases it fails if some it doesn't, uh, yeah. Yeah. So, so I think that's that those are the three key key things. Yes. I think,

Joey (34:09):

I think I'm gonna, I'm gonna disagree with you a little bit that I think the language, I think the language does matter. Right. I think like, you know, we started off with this or, or we have this idea a minute ago, that was like, you write specifications, right. Like, that's a good thing do that. You'll like, you're gonna be better off than you were before. So like the very first question we should ask is like, how do we help people write specifications? Like yes, if you give 'em a crazy language that looks something like they've never seen before, then they're less likely to write specifications than if you give them something that looks like what they've worked a on every day. Right. Yeah. And then like, the next question is like, how do I make use of this thing? And if you say like, well, like learn this interactive prover and learn, um, you know, learn temporal logic and then write these specifications and like, just, you know, you're smart.

Joey (34:56):

You can do it. They'll be like, well, yeah, I, I probably could, but like, I've got, I got some other stuff to do. And if, if you're like, well, push the button and, and you can try to Fu it today and it'll, it'll find some bugs for you. And they're like, right. I can push a button and they're like, but, but then they ask the question, right? They're like, right. Well, I pushed the button and I found some stuff, but can I do more? And then all of a sudden you've got people and they're coming like, right. Well, how do I, how do I like, get, take this a little farther? And, and then you're like, then you got 'em on the hook in your, in your like yeah. You know, next year, they're next year, they're hacking in, in concurrent proofs and with temporal logic. And they, they don't even know what happened, but those first steps are super critical. Right. And I, it, yeah, it sounds like, you know, the, the sort of way P came to be was like engineers wrote what they really, this is how I think about writing down a system. Yeah. And you modified that into something that other engineers could use in the same way. So I, I think that matters quite a bit, but,

Ankush Desai (35:46):

Um, yes. You know, and that is also like, from what you said, right. Uh, I, I feel like if you have this single framework that programmers like, and it, it allows you to do fuzzing and proof, both it then opens up a very nice area, which people have been brewing in isolation right. Where you can use fuzzing to learn in variants to do proofs. Right. Like I, but, but if you don't have a single unified framework, how are you gonna combine this two ideas in a, in a practical way? I think, I think that is the other motivation. I feel that is that it would be great to have everything in a single, uh, framework. Yeah. Yeah.

Shpat (36:21):

We brought this up over and over again, again, this, in this, in this podcast, but if you're working on a tool like this, find a way to work with people who, who maybe are, you know, working our applied stuff, maybe not with that level of assurance, bring 'em in somehow and see what happens. You might hate what you see, but it is make for a better tool. um,

Ankush Desai (36:40):

Yeah. And I think, I think there also, it's very important to kind of not go to, to, to engineers with your tool, but, but go and ask them what your problem is, and then see if your tool is the right fit or what modified version of this tool would be a right.

Shpat (36:56):

Listen to people and see where they're coming from. Yeah. Wow.

Ankush Desai (36:59):

I think that

Shpat (37:01):

Makes a lot of sense.

Ankush Desai (37:03):

so I, yeah. I mean, there's, there's very interesting thing. Is that, um, yeah, like for example, uh, when I, uh, not everybody needs to use formal methods, right. It, it has its own place. Like, I mean, not like for example, uh, not every part of the software needs to be verified to be correct. Um, yeah, yeah.

Joey (37:28):

Yeah. Well, the software is correct. It's correct. Whether you verified it to be correct or not, I guess is

Shpat (37:33):

The one I, yeah. Just write, write, correct. Right. Correct. Software. Yeah. No,

Joey (37:35):

Ankush Desai (37:40):

Uh,

Joey (37:40):

Every everybody's doing cost benefit all the time. Right. And, and if the cost of a bug is super low, then why would I pay a huge cost to eliminate that bug? Right. Um, and, and it's, you know, it's in insane, like you just, like, you've gotta prioritize, right. We all have a million things we wanna do, and we all want our software to be correct. And at the end of the day, sometimes we have to make choices that we know result in, in possibly slightly less correct software.

Ankush Desai (38:05):

Yes. And I think that is what is very evident in an industrial setup, which I, which I found very interesting is. And, and even when, not just industrial setup, I think, uh, for example, when I was, uh, in, in, in, in doing my ma undergrad and masters, I, I, I worked on this, uh, uh, uh, uh, project with where we were trying to build nano satellite and, and, and, and we never used formal verification there. And, and, and basically like the, the thing was that we initially did a lot of, uh, reasoning about systems, uh, and, and try to make sure that our system's correct, but sooner, uh, basically the head of our project came in and said, said that, Hey, we have to do we in, when we started this project to build this student satellite, we were given a budget of two, like two crew Rus.

Ankush Desai (38:51):

Uh, we are, we have eaten up a lot of this money. We need to figure out a way in which we can deliver this in time. Uh, so, so what is the best way that we, we can deliver this in time, but with, with, with having some assurance about that, it would work. So, so, so we, you cannot spend all your energy on making the system, correct. I mean, obviously if it's security and privacy, like things like that super critical, you have to do something, uh, mathematically correct, but not for the whole system. So, so, so we, it would be nice if in some ways we can not just, I, I think what would be, I, I think the next thing would be interesting is how can we use formal methods to increase the velocity of the developers to deliver their, their services? Not, not like always prove them to be correct, but for critical components, you have to prove them. Correct. But then also for the things where you have time budgets, how can you increase the velocity of the developer, uh, uh, to, to deliver those in time. Yeah. And that, that could be interesting.

Joey (39:52):

Yeah. And I think that's where a lot of that, like that idea of like, let's lower, let's make it easy to make it part of your, let's make it part of your sort of day to day workflow to do system modeling. And even if, you know, even if you never get to the proofs, it's like, you're, you're in better shape, um, than you were. Yeah. And you've left something for the, for the next engineer as well. Like, uh, cuz maintenance is a problem for everybody. Right. And, and like, that's the, that's a place where sometimes those short deliveries, like you really, you really like take on that TETA in a way that's pro it might have been unacceptable. Right? Like you might have just made a mistake there. Whereas like just that little bit of modeling can leave something for the, for the next person to get that, get that toehold and kind of understand what's what's going on in the future.

Joey (40:31):

So I, I think, yeah, that's like, I mean the trick, the trick I play is I like for everybody needs formal methods, like what you're thinking of a, a formal methods, like it's, it's everything, right? Like you write that thing down on the paper that was formal methods. You like formal methods, you know, it's, I don't make the rules um, so that's the other way to take it is like, this is all formal methods, right? Like just be a little formal, like try to say what you meant to do. And that's, that's a step. Um, but yeah, we definitely scare like, you know, I write articles about like, Hey, we did this amazing proof with AWS or with this blockchain company. And it was huge. And it took engineers full time, you know, for half a year. And it was, it's amazing. And you like, it's, you know, the most rigorous thing anybody's ever done and I love that stuff, but like that's yeah. There's no question that most people shouldn't be doing that with their, with their time.

Ankush Desai (41:16):

Yeah.

Shpat (41:17):

I, I wanna ask that an adjacent question and put PS aside for one second. Well, maybe not. I mean, frankly, it's, it's part of it, but I know that you have very strong opinions about the following, um, in terms of bug finding, um, there are some techniques that work maybe unreasonably better than other techniques, um, specifically I'm thinking about, uh, random fuzzing I'm hoping you could talk a little bit about that.

Ankush Desai (41:45):

Yeah. Yeah. I think, I think this is also from, from experience. I think this is from how, uh, so for example, when we started working on P uh, the, the backend engine that we had was zing model checker and, and it was trying to do explicit enumeration and be exhaustive and, and, and, and show that, uh, given a finite program, uh, it, it is correct. Right? So we were, we were always, the goal was to do proof for small. So systems, then there was a time when, when Shas and a lot of other people try, uh, came up with this, uh, bounding techniques, uh, called context, bounding delay, bounding, and so on where, because the problem of doing complete exploration of a program was becoming hard. Why don't we do bounded exploration, but then why are doing bounded exploration, like say that we explored program still this bound and we have increased the bound and then say we explored the, uh, so that, so that we can give some bounded, uh, exploration, uh, guarantees to the developers, like saying that, Hey, I have explored, for example, in context bounding, when you say that I have explored everything to the context bond of two, it means that if there is a bug that can manage first with two context switches in your program, then if I say that I could not find any bug with context amount of two, then there is no, there is an absence of a bug in your system that can be manifested with just two context switches.

Ankush Desai (43:09):

There can be a bug with three context switches, but there would not be and so on. So they came up with these ideas and, and, uh, this was great. But soon, uh, there, like these, these techniques were being projected as not being bounded, uh, uh, verification techniques, but as bug finding techniques in the sense that if you, if you use context bounding, you can bug and find bugs faster, then tools that are, do, uh, exhaustive search. Right. But then that's that that's not the right way of thinking about it, right? Like, because the goal now your goal post has shifted your goal post initially, was to do proofs and then you have full proof. You have bounded proof. Now the goal post is not proof it's it's bug finding. So, so then if your goal post have shifted, the baseline should also have been shifted right now, the baseline should have been what other techniques can find bugs faster than context bounding.

Ankush Desai (44:05):

So, so there were a series of paper. And I think I also wrote a, I wrote a couple of like one of them on, on how delay bounding is better in finding bugs, like faster in finding bugs, then traditional, uh, depth for such exploration based techniques and it beats context mounting and so on. And, and, and, uh, but I think the, the problem was we were not thinking, right. Uh, and, and there were a lot of papers like this then Alistair, Don Nelson and, uh, his group in, uh, uh, Imperial, uh, uh, they, they, they wrote a paper where they took all the old benchmarks, uh, for context, bounding, delay, bounding, and so on. And they just ran random, uh, fuzzing on these benchmarks and showed that the same bugs can be found with random. Uh, I mean, OB obviously in some cases, context, founding founded faster than random search, but then difference was very small, right?

Ankush Desai (44:57):

So, and then in some cases, random search was able to find bugs faster than context bond than others. So, so the thing is, uh, then, then I, this was like a big alarm for, at least for me not alarm, but like, this was a good real for SHA and, and myself and others that, Hey, we are not thinking the baseline should have always been a random search. Right. Uh, so now I feel like if you are trying to say that this technique finds bug faster than any other, how about, does it really do that as in comparison to running a, a Fu on a, just a bit cluster because of fuzzing is the simplest idea, right. I don't have to give any bounds and I don't have to think about anything if it cannot bit fuzzing, then just don't, don't introduce it as a new idea. Right. At end. I think that, that's what my, my feeling is. Uh, when you,

Shpat (45:46):

When you're doing paper reviews or, or like program committees of conferences, you're gonna return papers and be please compare this to random buzzing and then, and then resubmit it. Yeah, yeah.

Ankush Desai (45:57):

Yeah. I, I, I, I, I, I feel bad when, when I still see papers, which, which, which say that we are bug finding tools, but their baseline is not fuzzing that, so, so yeah.

Shpat (46:07):

Makes A lot of sense.

Ankush Desai (46:10):

But, but what I'm saying is that I think, I think the, it was just that the, the baseline was not correct for those papers, but those ideas are still relevant when you are looking at those ideas in the right way. Like, if you look at bounded, uh, uh, context bounding as a bounded proof technique, it's, it's a super cool idea. Uh, yeah. Right.

Joey (46:30):

So the, the point is like, you might be able to bound something and they say something meaningful, like maybe your program is guaranteed to have only a certain number of context switches, for example, and this founding that's great. Definitely do that. Yeah.

Ankush Desai (46:42):

Yeah,

Joey (46:42):

Yeah. But yeah. And you will get exhaustive guarantees, but exhaustive guarantees about a limited sort of slice of what the program can do and buzzing will give you no such, uh, guarantees.

Ankush Desai (46:54):

Yes, yes.

Joey (46:54):

Yeah. I, I think many of us have submitted a, a paper that's like calls a founded proof proof, and, and there's like, there's a set of reviewers. That'll just be like, that's not proof if it's bounded, it's not proof.

Ankush Desai (47:05):

Yeah, exactly.

Joey (47:06):

Those are fun. Cool. Yes. Um, I think we've had a nice interview here. Is there anything else you'd like to talk about before we wrap up?

Ankush Desai (47:13):

Uh, no. No. I mean, I'm, I'm, I'm super, uh, yeah. Thankful it is. It is great. We had a nice conversation. I'm I'm yeah. Happy that I got a chance to talk to you all.

Joey (47:25):

all right, then. Thank you for joining us. It's been a wonderful conversation.

Shpat (47:29):

It's really nice to chat with you Ankush. Thanks for joining us.

Joey (47:32):

Yeah. Thank you. This has been another episode of building better systems. Thanks for joining us.