Building Better Systems

#7: Aditya Thakur – “If it goes too slow, they'll turn it off”: Analysis Tools That Work

Episode Summary

Dr. Aditya Thakur, a computer science professor at U.C. Davis, walks us through his work on developing analysis tools that he wished he had while working in industry at places like Google. Aside from program analysis, we talk about making a research group successful by exposing them to industry. Towards the end, he shares his work on techniques and tools for repairing a trained deep neural network once a mistake has been discovered. Along the way, we learn about things like abstract interpretation, non-determinism, the trickiness of parallelism, and other concepts pertinent to analysis in an approachable way.

Episode Notes

Dr. Aditya Thakur, a computer science professor at U.C. Davis, walks us through his work on developing analysis tools that he wished he had while working in industry at places like Google. Aside from program analysis, we talk about making a research group successful by exposing them to industry. Towards the end, he shares his work on techniques and tools for repairing a trained deep neural network once a mistake has been discovered. Along the way, we learn about things like abstract interpretation, non-determinism, the trickiness of parallelism, and other concepts pertinent to analysis in an approachable way.

You can watch this episode on our Youtube Channel:  https://youtube.com/c/BuildingBetterSystemsPodcast

Joey Dodds: https://galois.com/team/joey-dodds/ 

Shpat Morina: https://galois.com/team/shpat-morina/  

Aditya Thakur: http://thakur.cs.ucdavis.edu/

Galois, Inc.: https://galois.com/ 

Contact us: podcast@galois.com

Episode Transcription

Intro (00:00:02):

Designing manufacturing and rolling and maintaining the high-speed electronic computer, the largest and most complex computers that are built.

Joey (00:00:22):

Hello everyone. Welcome to the building better systems podcast, where we explore tools and approaches that make us more effective engineers and make our software safe and reliable. My name is Joey Dodds.

Shpat (00:00:31):

and I'm Shpat Morina.

Joey (00:00:33):

And today we're joined by Aditya Thakur and I have briefly met with AutoCheck. We were introduced by a mutual colleague and we've heard great things, uh, exciting research. And my favorite thing that I've learned so far is, uh, that he cares very deeply for his grad students and wants to feature them. And, and that made me like him straight off the bat. So I'm sure we're going to get along. Great. Uh, is there anything you'd like to share about yourself?

Aditya (00:01:03):

Yeah, sure. Uh, so I'm a professor. I'm currently an assistant professor in the computer science department at the University of California Davis. So I joined about three years ago, uh, before which I was at Google and I got my PhD from the university of Wisconsin Madison.

Joey (00:01:20):

That's excellent. And so we'll start with our first question, which is what is your approach to building better system?

Aditya (00:01:26):

So my research interests are in program analysis and verification. So I believe that, um, we need tools and techniques to help software developers write better code, write secure code C for good, uh, software is getting more prevalent, it's getting more complex. Uh, so I think anything we can do to make the job of software developers better, uh, would have a huge impact. So the specific research interest in study in program analysis and verification are an abstract interpretation. That's a picnic for static program analysis. Uh, apart from that, we're also developing bug finding tools. And more recently we are looking at, uh, approaches for dealing with deep neural networks since, you know, they are pretty hard and they might lead to different ways of writing software.

Joey (00:02:20):

So, we heard maybe a lot of phrases that people haven't heard before. I heard abstract interpretation, as well as bug finding, which to maybe someone that's not as familiar with formal methods, everything informal methods might sound like, but bug finding. Could you highlight a bit of what both of those are about?

Aditya (00:02:40):

Right. So, um, so AppSec interpretation is just a very general framework in which you can write a lot of static analysis dues. So the purpose of the static analysis can be very different. So if you're a compiler either, uh, or you would want to analyze your program and find so properties at particular program points that are always true. So if a compiler, if a static analysis using a compiler says that available X is five, it, it better always be five because you might use this to do some optimizations, you know, constant forwarding and so on. But another useful static analysis is to find bugs. And in some sense, when you're writing a bug finder, what you care about is does it find books, right? So you, you care about things like, uh, false, positive rates, false negative rates, and so on. So the, even though the underlying approach can be similar since the overarching metrics are different, the approaches, uh, tend to be slightly different and have a slightly different focus. And how do you

Joey (00:03:45):

Think about when to use, which of these two approaches or when you use the sort of fully formal approach versus the bug finding?

Aditya (00:03:53):

Right. I think in, in some ways they're complimentary, uh, so a lot of the research we do, uh, we like to have a mix of let's try and create a new formulism let's find, uh, come up with a Tatum, right? Let's try and come up with a framework that is agnostic to the specific applications. When it thinking about bug finding, you're, you're focusing on a particular class of bugs and, uh, there, uh, you are more worried about, you know, what's the, what's the particular application. What are there any, uh, you're also tuned into specific, uh, patterns that developers use, right? Um, and of course you might use some static analysis techniques, but in the end, if you have a specific class of bugs, then you are focused on that. Uh, so for instance, one of the class of bugs where we've been looking at is this problem of error handling in systems goal.

Aditya (00:04:51):

So in C code, like the Linux kernel open SSL, you don't have exception handling instead when functions have to indicate that there's an error, they've written a specific value. So for instance, Malak will return now when, when there's an error. Uh, but in most other cases there'll be an integer value, like say minus one or minus two, uh, which, which tells the color of the function, that there is an error. Now all of these nomenclatures are sparsely documented or they're in the minds of the developers and the type of bugs that can occur because of this type of error handling is that you might call a function, but you might not check for the error after the call. So you might not have something like if the return is zero, you don't handle the error and this can get picky. Cause these errors are propagated along long coachings.

Aditya (00:05:42):

And even finding out what the error handling specification is for function can be picky. So here, we're sort of trying to get a sense of what is this application computing it, uh, automatically then writing about checker. So in this work, we didn't, we don't have a theater, which says that the, the specification is sound right. We're not saying that, uh, the set of values that can be returned in a function, right? Whatever the set is that we are over approximating it, right? So we're just saying, I think this is reasonable. And then we're finding bugs and those bugs are through bugs. And that means when we report them to the developer, they find it useful. Right? If you are more in the formal sense, then you want to have some guarantees saying whatever, uh, invading, you're finding these approve in variant, whatever properties you are finding this or approximating the, the underlying program and so on.

Joey (00:06:42):

And that's, so you really get a lot more freedom in one of these approaches around making sure that there are different kinds of freedom, I guess, and bug finding you kind of, don't have to worry that much about catching every single problem. Right. But you do have to worry a lot often about reporting things that maybe didn't turn out to be. Yeah,

Aditya (00:07:01):

Exactly. Exactly. So there's a, there's a, you have to be more cognizant of false positives. And so on that you develop as you have a finite amount of time. Uh, and when you are, whenever you report something, you might want to make sure that it's approved bug. So you'd also have techniques that are ranking bug reports generated by the static analysis tools. So you might actually generate hundreds of bug reports, but you might want to rank them. And then developers only look at the top 10 and so on.

Joey (00:07:31):

And that's a whole research field in its own how you actually present those results in a way that gets people to respond and gets people to understand what the tool is saying.

Aditya (00:07:40):

Absolutely. And I think there's been a big push even in the industry right now, where people are focusing on how to integrate these tools with developer workflow. So, uh, integrating them with IDs, integrating them with review tools. So whenever you send out your code for review, there's, there'll be some automated too, that we present the results to you and the reviewer. And, uh, I know that, you know, Google and Facebook and so on are also looking at how do you fix, uh, these bucketed boards, because that's another, another advantage you have. If you're focusing on a particular class of bug, they might also be common fixes, uh, that you could try and automate. So I think there's a, there's a, a large spectrum of problems and approaches that you could apply. Um, having said that I think the, where you're trying to get those guarantees that has its unique challenges. And there are equally at an equal number of use cases for that, right? When you're, when you're tying to ship code, when you're trying to write a compiler optimizer, it has to be connected. Uh, you can sort of hand wave like, Oh, it works. It works most of the time. That's good enough. You can't really that doesn't fly there.

Joey (00:08:51):

And I think kind of the formal methods dream these days are one that we think about and discuss

Joey (00:08:56):

Quite often is the idea of moving smoothly from bug finding into verification, uh, bug finding. It's immediately obvious how it's useful to you if you're a developer, because you're, you find a bug in a minute or two that maybe would have taken you a few days to, to shake out, or maybe you never would have shaken it out, but once you do that enough, you say, well, it's a tool that can tell me when there's some bugs. How can the tools tell me that there's never going to be a bug again, or that there's no bugs at all in the code. It's sort of a natural extension right.

Aditya (00:09:25):

To the next right. Absolutely. Yeah. And I, I think there, uh, the place where both the tools intersect in some ways, if you have a verification tool and all it returns is yes, there's a bug.

Aditya (00:09:37):

No, that isn't a bug that's not really useful for a developer. Right. So the same way about finding tool, usually just, doesn't just say, there's a bug, it'll give you an input that causes the bug or an abstract place through the program or a line number, which says, Hey, here's the dot not be referenced and so on. So I think a lot of verification tools and techniques also support giving a witness or either a witness that there is a, there's a problem or some notion of, here's why I think the program is correct. And I think that's also an important and active area of research.

Shpat (00:10:12):

I'm curious at the outset, what made you go into this research and going after these things, it sounds like you were working in industry and then something was appealing about leaving that behind going into academia and doing research on these topics. I'm just curious how that came to be.

Aditya (00:10:29):

So I guess I'll, I'll, I'll answer that in two ways. One is why this field of computer science, and then I'll, I'll ask an answer your second question, which is why move back to academia. Um, so I think my, my love or passion for, for your form of methods or program analysis started way back in, in an undergraduate compiler class, right? So that's probably the first course where you are, uh, thinking about a program that is manipulating other programs. And that was quite nice. You, you were exposed to this in a theory of computation class, but compiler a sort of a practical manifestation of that, that, uh, that idea and, and also, uh, compilers and program analysis and follow methods, it gives you a nice spectrum where you can have a lot of theory. You can, you can come up with new formulas and come up with the theater approve, uh, some complexity theory result.

Aditya (00:11:29):

And then the next day you can get up and say, okay, let me write, uh, some, uh, really efficient C plus plus or cammo whatever code, uh, which actually implements these ideas. So there's, um, there's this nice spectrum from theory to formalism that you can shift between when you are in this field. And I think that was really appealing. And in terms of your second question was, you know, why I should come back to academia. So I think that the, the way to answer that question is why did I move to Google in the first place? So I think the reason for that was to apply some of the ideas, you know, in my PhD and in my early research, uh, in an industrial setting. So we were sort of trying to work with, uh, security engineers and develop those, uh, program analysis tools to help them.

Aditya (00:12:21):

And, and, and that was really interesting. And I also spent some time in a, sort of more traditional software engineering job, and I got exposed to, you know, shipping code and dealing with quality assurance, engineers and, uh, deploying and modifying code and looking at logs and debugging them. And I would say that was even more enlightening in terms of sort of through software engineer's point of view of how they, what they have to deal with and how their view is. And I think it was more like you're working there and then ideas start building up saying that, Oh, I wish I could spend six months building this tool that I'm I'm I feel I need now. Uh, but of course, when you're dealing with, uh, quarterly, uh, expectations and all chaos, and, you know, you have to ship something, you don't really have time. So I think those ideas started building up and I was like, okay, maybe I should try and go back to academia and try and solve some of these, uh, funded, foundational problems, uh, but still not forget, uh, what I learned in, in industry and make sure that, you know, if at all possible try and come up with some new theory, but also made sure that like a software engineer might want to use that use that.

Aditya (00:13:35):

Right. Uh, so they were, there were some problems, which I saw while I was a software engineer and, you know, perhaps we solved them in, uh, we had an engineering solution back there. Right. Uh, which might involve something like let's just throw more memory and more CPU's and pro to the cloud. And then, uh, some of the work that I've been doing at UC Davis recently on app, second reputation has been on scaling up, uh, these foundational dues and that has led to new, I Gordon's formalisms, and, and I think pretty impressive experimental results as well. So it's, it's sort of, uh, in some ways, industry gives you a lot of freedom to solve a lot of problems and you get exposed to that, but I felt I could have a nice impact when I was in academia and take that experience with me if, if the, does that sort of makes sense.

Shpat (00:14:27):

Yeah. So I want to get back to the impressive experimental results later. Cause that sounds interesting. Um, but I just wanted to say, it sounds like you're, you essentially took with you these concerns and the problems that, that maybe developers conventional engineers that don't, you know, aren't started in academia and aren't exposed to formal methods, taking those concerns

Shpat (00:14:52):

And apply, essentially looking through that lens through research and work. Yes. And is that, is that common?

Aditya (00:14:59):

I think there, there are a lot of researchers who do interact with industry, uh, you know, they have sabbaticals that industry and so on. And, uh, I, I can't, I don't know whether it's common, but I think there are people out there who do want to make formal methods, uh, usable. Uh, and, and I think it's, um, there's, there's also a big push in academia to make research reproducible. Uh, there, there are those papers, so they're encouraging not just new, uh, research itself, but also you can publish a tools, paper, which talks about usability and scalability and applications to a particular use case. So I think, uh, the following methods community is aware of that there needs to be a bridge between industry and academia. Uh, I don't know, off the top of my head, whether, you know, 50% of researchers think about problems this way. Uh, but I know that people value this in general

Joey (00:16:04):

Found the experiences you gained at Google to be valuable. Like if I were a researcher and I was sort of wanting to speculate maybe about what things are like when you're doing software engineering at a big company, but I only had secondhand accounts, would it, would it be helpful to take a sabbatical over the summer and try to work somewhere like that? Or was, is it just as good to hear about it?

Aditya (00:16:25):

I think it's definitely useful to experience that. I actually don't know how it would be if you were just doing a sabbatical cause you, I feel you need to go through the whole cycle of, of designing something, uh, you know, getting all the approvals from everyone, um, developing it, getting your code reviewed, uh, talking with other groups, uh, so that, because you might need a review, you're are touching on someone's code somewhere else and, uh, you need to get that permission and then you sort of dog food it, uh, and, and that whole cycle, and then three months down the line, everything is fine. And then you get a weird bug report, uh, and then you have to go and debug that, or someone changes some other library, which means you have to go and update your code. Uh, so it's, it's, I think you, it will be difficult to get that experience if you just in a three or six month sabbatical, and perhaps you're in your sort of research cycle where you're not sort of exposed to this whole life cycle, but at the same time, I don't think everyone needs to do this. You can, you know, there's, there's a lot of interdisciplinary research where physicists work with computer scientists and it's not that the computer has to go and redo

Aditya (00:17:50):

The, do a physics degree. You can, you can talk to people and, and so on, but, uh, I think it definitely helps and it has helped me. And, uh, and I feel that perhaps even my knowledge of industry is outdated. Now it's been three years since I left. So things might have changed drastically. Uh, and in that case, I guess you, you hope that you, your, your students go there and you can interact with them and get some, some sort of feedback from them. I can definitely say that I've imported some of the software engineering practice. Uh, so having better code reviews, having sort of a research model repository, um, having study guides, making sure that bills are reproducibles and making sure continuous integration is set up and, and so on. Uh, and that's sort of a first order concern of research as opposed to something you do after the fact. And I think that that emphasis is probably because of, you know, industrial, some industrial experience

Shpat (00:18:56):

That also sounds like immense value for your students. Um, yes. To get that experience with you.

Aditya (00:19:03):

Yes. Yes. So that's true. So, you know, I work with a lot of undergraduates and even early graduate students who are not having the written test cases consistently, right. Or they haven't reviewed anyone else's code. Uh, so even teaching them things like, you know, don't be a jerk, be nice when you are asking for every review when you're giving a review, uh, what to look for and how to read code. Right? So people are used to writing a lot of code and reading their own code, but being able to read someone else's code, modify it, uh, is, is especially valuable. And also it builds, you know, we, we shared a physical lab space, but, uh, in my first year, everyone would have their own GitHub repository. And no one really knew what anyone else was doing. Uh, we would, you know, we would have group meetings and there would be presentations, but you don't really know cause you don't, there's, there's no formal framework to go and look at each other's code and then play around with it. But now that there is a common GitHub repo, it's sort of like a virtual lab that people are, are working on, uh, people help each other, uh, review that code, revise the code. Um, there's sort of this default communication that happens because of that. Can I join your researcher? Yes, please. Yes.

Joey (00:20:26):

That sounds wonderful. Uh, it sounds like, uh, the best of both worlds and in some sense to get to learn research, but to be, to see those industrial practices filter down, uh, it can be when we do, when we do hiring and formal methods, it can be hard to find people that have a lot of, both of those. And a lot of times we find people that, that are much heavier than, you know, on one end of that spectrum than the other. Uh, and, and of course, like all, all range of people are valuable, um, and have different things to contribute. But being aware of both of these worlds definitely is, uh, is a huge help. So we've seen some really big results in the past some specific papers that have made huge claims about the size of code that they've been able to verify using abstract interpretation techniques. But you mentioned that one of your goals is to make abstract interpretation more scalable. Can you explain maybe what's going on there, what we've seen in the past versus how you're pushing things forward?

Aditya (00:21:24):

Sure. So, uh, I think definitely up second reputation has been one of the more scalable approaches to, to study program analysis. Uh, and, and one of the benefits of app secondary mutation is that you can control, you know, the abstraction, right? So, uh, it's a fairly, uh, you, you can, you can have something like intervals and then you can, uh, slowly use more complex, uh, analysis. Um, and, uh, there've been quite a few industrial tools that do have second amputation. So one being, you know, the tool by asked, uh, by the CUSOs by, uh, mastery that's been used, you know, to verify, um, Airbus code and so on. And there'd been a couple of other, um, techniques of that use abstract interpretation. So I think, uh, one thing is, uh, these tools are focused on specific types of programs. So they are focused on Secord, uh, which are structured and the specific fixed point I'll go to them.

Aditya (00:22:30):

They use, uh, lies on the structured C code, right? So what they're doing is they're, uh, they're getting their, um, they're doing their fixed point by walking over the ASD in a specific manner. And those same techniques you cannot use for general core, which have go-tos and, and, and unstructured control photographs. Uh, so that's actually, one of our more recent focus has been on these more general fixed point algorithms for, for some arbitrary code. Uh, having said that there's been a lot of very impressive work on, uh, both the theory and practice of abstract domains. So this is, these are data structures that hold on to the, the abstract values as you're computing, your fixed point, right? So there been a lot of impressive data factors and so on. So one of, one of our recent papers, which was in Papa last year was focusing on analyzing app second double dose.

Aditya (00:23:30):

So you have 64 calls in your machine. When you, when you run your app, second debater is going to use one core, right? Uh, so that's a lot of wasted CPU sitting there doing nothing. And again, there were specific approaches, specific types of abstract interpretations, which were paddle paralyzable. Uh, but there wasn't a gender framework for parallelization. And one of the reasons was that if you paralyze and you're not careful, you'll get an up second arbiter. That is non-deterministic. So what that means is depending on your scheduler, each time you run your compiler, your app, second operator, you might get a different answer, right? And if it's used in a compiler, what that means is you get a differently optimized code each time if you're using this in order to do verification or bug finding, you'll get a different report each time, right? So you run it, uh, it's output something, you call your manager over saying, Hey, I found a bug and of course, something else will pop out.

Aditya (00:24:32):

So what we designed was a sort of general abstract interpretation engine, which was paddle, it was deterministic. And we also had some nice theoretical results, which said that in some technical sense, we were optimal in our parallelism that you, you know, the sort of scheduling constraints we had were minimal. Right? So, uh, so we, we sort of implemented that and we regard, you know, like you can get panic speed off if you have 16 threads. Of course it depends on the application and so on, but I think there's a lot to be done. Um, uh, another one of our recent papers that got into, uh, SAS, the static analysis symposium is on memory efficiency. So if you're planning to do inter procedural up second irritation, you can take, you know, over a hundred gigs of memory when you're trying to do something, uh, as simple as do, as using intervals to find buffer overflows.

Aditya (00:25:27):

So you just want to make sure that all your accesses are bounded. You know, then you're not overflowing the RA, uh, and you're just using intervals, nothing too fancy. Uh, if you run it on real world opensource benchmarks, uh, programs, then you can take, you know, 40, 50, 60 gigs or 100 gigs. And what we devised was, again, a genetic approach where you would cleverly deallocate memory during the fixed point computation, uh, and still be, sound still be precise and so forth. So we got, you know, up to, I think, 25 X reduction in memory, uh, sometimes, you know, you'll be taking 30 gigs of Ram, and now you, you have less than a gig, uh, and that can be the difference between turning on and turning off some of these inter procedures analyses in practice. So I think there's definitely been a lot of work.

Aditya (00:26:17):

And, uh, an app second reputation is also a sort of on a lot of techniques fall under that second reputation is such a nice, gentle, beautiful cherry, but having said that they've also been industrial tools. They make certain assumptions in order to get some of their scalability results. And they also have done a lot of, uh, neat, uh, theoretical and engineering tricks with that abstract domains. And our most recent work was focused on the six point computation and making that more efficient, you know, via panelization as well as more memory efficient, by being more careful with how you're allocating and be allocating memory during fixed point. So

Joey (00:26:59):

Why do I, why do I need a fixed point computation? What's the, what's that

Aditya (00:27:04):

Right? So, uh, so let's, maybe I should step back and, and explain, uh, poorly, but hopefully good enough. Uh, what abstract interpretation actually does. Uh, so before talking about abstract interpretation, let's look at what concrete interpretation is, right? So let's say you have a statement in your program. It increments X by one. So you have X equals X plus one. So your concrete interpretation, your semantics of your language say that if you have a complete state, right, a state, which says, for example, X has a value pin and Y has the value 20, then your concrete semantics says, after you execute, execute X plus one X will have the value 11 and Y will continue having the value 20, right? That's your concrete semantics. It takes a single concrete state looks at the concrete semantics of your program and gives you a single output, complete state, uh, as you mean, your semantics is deterministic, yada yada.

Aditya (00:28:04):

Now what is abstract interpretation? And second limitation, you have an abstract state and abstract state is a finite representation of a large number of concrete States. For example, an abstract state could mean X is lies within the interlude zero to 10, and Y lies in the interlude 20 to 30, right? So this is representing a large number of concrete States. And now what you want to do is you want to take a single step, but in your abstract semantics, right? So now your abstract semantics has to know about intervals here, right? And again, intuitively what you would want the abstract semantics to say is after you execute, execute X plus one, the value of X is in the interval, 11 to 21, right? And why is it still in the interval 20 to 30, whatever I had before. Right? So now the same way you can concretely execute your program, using a single complete state, right at the input.

Aditya (00:29:10):

Uh, you will abstractly execute your program, using an abstract state at the input. And, and ideally what you want your input abstract state to capture is all possible inputs, your program. So usually you would say something like if X is your input, X can be anything. So it could be minus infinity to infinity. And then what you're trying to do is abstractly execute your program. And it turns out, well, if you concretely try to do this, you will never tell, right. You will just keep, they'll always be a new concrete state that you'll have a PRI and so forth. So what app second deputation gives you is certain conditions on these abstract States, such that you can guarantee domination and guaranteeing termination in this context means computing a fixed point. So now suppose you have, uh, a loop in your program. What you'd want is an abstract state, such that if you go around the loop again and you execute the program, abstractly, the abstract state you get is contained in the one you already have.

Aditya (00:30:11):

So let's say you have a loop, which says for, I equals zero, I less than 10, I plus plus do something right. And your abstract state at the loop header is I's between zero to 10. Then as you go around the loop, you will continue to be, uh, between zero and 10. And so what you've done is you've reached a fixed point. You, you, you have some abstract state where when you abstractly execute your program, you get the same abstract state. So it's a fixed point and a fixed point corresponds to in this case, a loop invading and more generally a programming, right? It's a state where once you execute, you get the same ingredient or a tighter ingredient. Uh, so that's can be used as, as, uh, an ingredient. And again, if you, if you're assuming that any possible input can occur in the input at the import, then it is approved invading, cause you're sort of accounting for all possible inputs.

Shpat (00:31:13):

I guess what's the value of knowing that? So concretely I, in that example is changing because you have I plus plus, but abstractly it's within Z between zero and 10. And that sounds like it has some value when it comes to analyzing it. Um, that I don't, that I can't quite connect.

Aditya (00:31:33):

Perfect. So the way these in, so there's a language of ingredients, which, which defines your, how expressive you are, your apps, there's sort of a language of abstract values, which determines how expressive your ingredients are. So in this case, uh, perhaps you have a statement which is [inaudible] equals zero inside the loop. And now, you know, that the length of your ad is 10. And that coupled with the fact that I, the value of I can always be, is always between zero and nine will tell you that your Addie access never overflows. Uh, and more generally you can have more expressive ingredients that, you know, X is less than Y or, uh, X is not equal to zero. That will be useful if you're trying to make sure that you're not dividing by zero, uh, and so forth. So it's, uh, when you're writing in a certain, your program, you can come up with an appropriate abstract domain, which will help you prove that assert.

Aditya (00:32:32):

Uh, and of course, people have come up with all sorts of interesting effect domains and not just numerical quantities like X, less than Y, but you can also have an abstraction with talks about, see the heap that this variable is pointed points to a link list, right? So you can define an app second to petition, which shows that if you give me a link list as an input, and you have some C code, which is reversing a link list, then what you get out is also link test, right, and so forth. So you can not just talk about the values, but you can also talk about the shape of the heap. This is sort of what shape analysis does. Uh, and those can be used for a whole verifying whole bunch of other properties.

Joey (00:33:15):

So when you say you're improving the fixed point algorithm, that's not a small detail of the implementation. That's the you're improving. The core of exactly is driving the abstract interpretation and the site.

Aditya (00:33:27):

Great. So there are roughly three components to an abstract interpreter. So one is designing the actual abstract domain. So there are certain operations you need, like join and meet and so on. Uh, and representing those data sectors, representing AppSec domains, using Keva data sectors is, is extremely important. Um, the other part is this abstract semantics, which I sort of glossed over, uh, that can be extremely picky as well. Cause then you need this formal guarantee that what you're doing is an old approximation of the underlying compete semantics. So for something like X equals X plus one, it's not too bad, right? But you can have pointers, you can have Bitwise operators and so on. So in which case you have to make sure that given this input abstract value, the output abstract value is in some sense sound. So people used to come up with sort of hand proofs, uh, showing that it's sound.

Aditya (00:34:27):

Uh, and in fact, during my PhD dissertation, uh, me and my advisor, Palm reps, we worked on some techniques for automating the construction of these abstract transformers and the specific context there was actually machine code verification. So you're trying to have x86 cord, uh, and you're trying to analyze or abstract design and app second operator for this low level x86 code. So that's another component is designing these abstract transformers and in practice, again, there's a trade off. You can have precise abstract transformers, or you can have less precise ones, but they might be more efficient to execute. And then both of these components are in some sense, fed into the fixed point engine where you're computing this fixed point and the order in which you visit nodes, uh, program nodes, when you're computing the fixed point, uh, in, in our case, we were trying to do parallelism, uh, that is all in the fixed point. So computationally app, second depredation boils down to this fixed point computation.

Joey (00:35:30):

And you mentioned something earlier, that was, who was kind of intriguing. You said something along the lines of, if it goes Too slow, they'll turn it off. Which when we think about verification projects, like you mentioned a big verification project, they're thinking of, like, we looked over the whole code, we saw that it was good. And then we wrote the paper and we shifted. But obviously by saying, we're going to turn it off. You're not just thinking of abstract interpretation as, as a one and done sort of sort of approach. Can you tell me a little more about that,

Aditya (00:36:03):

especially if you're trying to use, uh, an app second deputation based tool say in a compiler or any other sort of developer too, you would expect that this will be run each time you build a core, right? So if, if your compiler takes 10 minutes to run, uh, and say, you, you know, you use like the actual three and it takes 10 minutes to run, it will be like, Hmm, maybe I should just use actual one.

Aditya (00:36:29):

Uh, it's sort of something like that, right. Where you might want to do a more precise analysis, but it takes too long. Uh, and you'd rather just done it off because, uh, it, it, it's not worth it. And maybe you might use it you're just before deployment and, or, you know, it might have some other trade-offs. And also if you're trying to do this as part of your review, uh, then you would want to run this incrementally. And there are, I think people have also started looking at that as how do you make some of these, uh, ideas incremental? Uh, so the, the Facebook info group has done some nice work on, uh, designing compositional, uh, static analysis tools for which would fit well into these sort of review, uh, tools as well

Shpat (00:37:18):

Since when the, in this kind of teaching mode, um, we're already in the weeds in some of these things, you know, might land well with some of the listeners or Watchers. Um, some of them aren't, but I have a, so I've been around, I've heard a little bit infer and other analyzers, um, like that. And this idea of deterministic versus non-deterministic is something that I th that I don't quite grasp. You gave an example earlier about you run an analyzer based on these techniques. Then when you run it again, you might get slightly different results and you might not get the same bug stone to the untrained ear. That sounds really worrying. And, and I get a lot of questions. I wonder, what's your like general high level explanation of why that is? What does that mean? And then what can be done about it? Cause it sounds like you be worked on, on that.

Aditya (00:38:16):

So th this issue of non-determinism is specific to when you're trying to catalyze certain types of abstract interpretations. So, uh, I think most tools are sequential. And, uh, and I think, especially if you're trying to deploy some of these dues, I think, uh, you make sure it's deterministic. So, uh, the, the reason that Pat will abstract interpretation can become non-deterministic is, uh, for slightly technical reason, because you're, you have to do this thing called widening, which I'll get to, and then these whitening operators are non-monotonic. Uh, and then when you have such operators, you can have non-determinism right. So let me sort of try to unwind some of that. So,

Shpat (00:39:03):

But before we do, just for my clarification, it doesn't mean that the results are wrong. It just means that sometimes you don't get some of them,

Aditya (00:39:11):

Right, so they're, they, they will still be sound, right. So, uh, if, if are a program analyzer is generating, say an invading, some that X is greater than zero, right? So it will be proved that X is greater than zero. And if you run it again, maybe it might come back and say something like X is greater than pain, right. And excavating zero, doesn't sort of invalidate the fact that X is greater than pain and so on. Uh, so they even still be correct in that sense, but it'll definitely be annoying as, uh, as a user, if, if that, if that sort of answers your question.

Shpat (00:39:52):

Yeah. I think it does, to an extent, definitely

Joey (00:39:54):

In a lot of cases, there's more than one right. Answer basically.

Aditya (00:39:57):

Oh, definitely to the question.

Joey (00:39:59):

Well, in every case there's more than one right answer or every interest in case. Um, and so you just might get a different right answer. And the user experience sounds like it's way better. If you always get the same one, or at least if you have some control over, what's going to come out.

Aditya (00:40:14):

Right. So this particular work, um, I mean, this was what done with my students Sung Kook Kim and Arnaud Venet a collaborator at Facebook. Uh, so that's been integrated into this open source app stack interpreter that Facebook has developed with Sparta and Sparta is, is used in this Redex, Android optimizer. So in, in that context, it was really important that the second operator, which formed the part of the static analysis tool change is using the optimizer was the domestic, because we would, your, your compiler as much as possible, it should be deterministic.

Joey (00:40:51):

And in those use cases, you actually saw some really concrete speed ups based on the parallelization.

Aditya (00:40:57):

Yes, definitely. So in our, uh, you know, we at least got more than like, it was more than two X speed up when you had four threads. Uh, and this was, you know, using quite a number of open source benchmarks, uh, that we had. So this was not sort of academic, uh, uh, benchmarks, but like open source score that we analyzed. Uh, so I think we at least got to Expedia up when we were using four threads. Uh, but depending on the application, you would get up to 10 X speed up. Uh, and again, it, it, it was nice to have that theoretical dissolve, uh,

Aditya (00:41:34):

Cause we had some confidence that in some ways we were optimal theoretically. So, uh, it, it wasn't that we were just unlucky or an algorithm wasn't quite good enough when we didn't get speed up. We would go look at the program and you'd say, Oh yes, that is actually not a lot of parallelism that the app second operator can exploit when it's analyzing the score.

Shpat (00:41:55):

So essentially this work basically was, was getting that same determinism yes. By using a parallel kind of computation of all these things, the same as you would when it was sequential.

Aditya (00:42:09):

Right. So in fact, we had a theorem, it said that the answer, so there was an existing sequential algorithm. And what we showed is B we sort of analyzed it. The panel ism was optimal and the result was the same as the sequential one. So in, in practice, that was also important because in some sense it was, uh, backward compatible. So the, you know, whenever you switch out the sequential with the Babbel code, not only is it deterministic, but it's the same as the sequential one, which you're used to yesterday. So I think that that was quite nice. I mean, it was nice from an academic theoretical point of view, but, uh, you know, the, these open source tools, uh, were using the sequential algorithm and we were able to just replace it. And, uh, so it was, it's been it's in this part, our tool from Facebook, and it's also been integrated into, uh, an open source app, second operator from NASA, which is called icon. Uh, so that, that was also kind of nice to have, and they are use cases more on verification rather than code optimization.

Shpat (00:43:14):

Well, that's, I mean, to me, at least that sounds like an ideal outcome of, of, you know, what will be academic research is that it gets almost immediately applied to real things in the real world and shows real results.

Aditya (00:43:29):

Yeah, that'd be, yeah, it was very exciting. Uh, I think the way the project started off was actually by me not understanding the, the particular sequential algorithm, uh, which I sort of heard about in my PhD. And I was like, okay, I kind of understand it. Um, yeah, I look at the paper, it makes sense I can work it out, but you don't, I never truly dropped it. I was talking to our no. And he said something similar. It's like, yes, yes, I understand it, but I don't think I completely understand it. And it, it sort of reminded me of another algorithm, which I read many years ago, uh, in the ComPilot literature. Uh, and then just find to understand the sequential algorithm, uh, what we realized that it needs to be sequential. It could be parallel. And then sort of, we build off of that.

Aditya (00:44:23):

So the original motivation was not actually to build a battler deterministic, abstract interpreter. The original motivation was just, uh, I don't quite understand what, you know, it's a very commonly used, uh, I got them, uh, by Birdsong, uh, in the, it developed in the nineties and all major second operators use it in some ways. So Infor has it, uh, Sparta I cost and so on. Uh, so it was sort of started off as an academic curiosity. Uh, but then slowly it was like, Oh, the sequential, I've got them in some ways it's computing total order over the program points. So it's sequential, right? Everything is ordered and that need not be the case. You can actually have a partial order. Once you have an awful order, you can sort of analyze things. And that's how it sort of came about. Um, and I'm, I'm quite happy with, with the fact that we were able to, uh, you know, push it to, uh, to both the Facebook and the NASA tool, uh, and the POPL paper that definitely doesn't hurt either.

Shpat (00:45:29):

Yeah. I can believe that that's great results.

Joey (00:45:33):

It's also a really nice story for people who are wondering how you go do research. Um, a lot of when you watch successful researchers, a lot of times it feels like, well, they just had this great idea and then they, they knew what they needed to do. And then they went and did it. And when reality, in reality, when you talk to people that have had success, a lot of times it was like, well, we kind of started poking at this thing. And then before we knew it, the natural result was that we kept diving deeper and deeper. And all of a sudden we found this cool application. Very few people say, here's my huge problem. I'm going to like, eat it all in one, go, and we're going to be a re we're going to get a POPL paper.

Aditya (00:46:09):

Yeah, absolutely. I think, and that's the, I think that's the most exciting part about visa is that you don't know where you exactly going to land upon. Um, I mean, as an advisor, I try to make sure that my students see the light at the end of the tunnel. It just that the light might change, uh, as you are doing your research. I think, uh, they, they feel a lot more confident knowing that it will eventually go somewhere, but of course every week or every month, it might change a little bit, but where we're trying to go. Uh, and I think the collaboration with Arnaud and Facebook has been very excellent and your client and, and the fact that we were able to push it to that open source tool was also quite nice. Yeah. So having that collaboration, um, is, is definitely very helpful. So

Joey (00:46:59):

This is, I think we've covered from what I understand about half of the work that you're doing so far is around improving scalability. And then there's another chunk of your work that's centered around different applications and new and unique applications for, for the techniques that you've been working on. Can you tell us a bit about what you've been doing there?

Aditya (00:47:20):

Um, the way I've viewed as if you, if you want static analysis to be applicable, uh, to real developers, that couple of problems. So one is scalability. So being able to get the results you want soon ish, uh, and, and, and depending on the use case, soon-ish might mean a few seconds or a few minutes, or, or know a few hours. Um, that's one problem. Uh, in some ways, if I worked on some of these techniques to get more precise analysis, you know, by talking about these abstract transformers and so on, and people have developed some very nice abstraction, so you can reason very deeply about program. So, uh, I felt, you know, focusing on scalability would be more helpful in that case. The other problem is this whole problem of false positives. So you, you run your static analysis tools, you want to find bugs and that it spits out like a thousand reports.

Aditya (00:48:17):

And maybe one of them is like, I'm exaggerating a bit like, uh, you know, uh, high, false, positive rates are a well-known problem in static analysis. And there are a couple of problems with this one is I feel you can't turn on certain precise analysis because of scalability. So it sort of ties back to what we were talking about earlier. And then the other problem is being able to have a reasonably precise model of the environment. So all software works in the context of light. It calls, libraries, it calls operating system calls. So it's not a, it's not a closed system. It's, it's always impacting with the outside world and what the outside world, again, depends on where you decide to have that boundary. So a lot of the false positives are because you're not modeling the P and post conditions of these external functions that you're calling.

Aditya (00:49:11):

Right? So, and this is a sort of, again, a well-known problem that certain people are looking at, uh, a lot of academic research sort of hand waves around that people have this notion of soundness or soundness with some assumptions and, and making sure that you're making these assumptions explicit and so forth. But in my experience, uh, coming up with these, these external, uh, models is quite important. And time-consuming, um, so while I was at Microsoft research, this was before my PhD, we were working on this static driver verifier. Uh, so this was meant to pick, uh, see driver code and make sure that it satisfies certain properties. So you don't see the blue screen of death in windows and there, the external world was the operating system. And, uh, if you don't have the right model of those, what those functions are doing, you can get a whole bunch of false positives.

Aditya (00:50:10):

And interestingly, it also depends on the particular static analysis you're doing, what sort of model you should have. So I think based on that, we've been doing some work related to this error handling, uh, that I mentioned earlier. So you can think of this error handling specification has sort of trying to model the outside world, right? And, uh, we've been looking at techniques which will combine sort of traditional static analysis with sort of machine learning approaches, where you're trying to, uh, import notions of, of embeddings, which you see in natural language processing. So you have this notion of word embeddings, right? Uh, where King and queen up close together in, in an embedding, uh, space. Uh, and it's sort of far away from an orange, which is a different semantic notion. So you're trying to use something like that, but talk about functions. So map, program functions into an embedding space and come up and import machine learning algorithms to actually compute these embeddings so that they have some semantic meaning and then use.

Aditya (00:51:15):

And the intuition is that if, if you are embedding, if in your embedding, your functions are close together, then they might be semantically similar as well. And then you can sort of say that they behave in the same way. So we've been using this to essentially expand our error specifications. So, you know, the specification of, of one function, and then you have a bunch of other functions that are close to it in your embedding. And then the hypothesis was that, well, maybe they also behave the same way. Uh, and at least that early results seem to indicate that that is the case. So that's sort of trying to supercharge our static analysis tools, using some machine learning technology.

Joey (00:52:00):

And you kind of casually talked about this project with Microsoft. And you mentioned that it might not result in as many blue screens if it's the project I'm thinking of talking about that casually is kind of a, my understanding is that project had a huge impact, right? Is that the, is this, is this the same project I'm thinking about?

Aditya (00:52:18):

So that was a fairly long running project. So I, so I think it started in the early two thousands, uh, Tom Ball and Sriram Rajamani started this old static that I will verify along with others. And over the years, they've sort of improved on their, their verification engines. Uh, and while I was at Microsoft research in India, they had a new way of doing things and I was sort of, uh, sort of charged with, um, they had some academic results and I was sort of charged with integrating it with the core sort of tool and focusing on scalability there. And again, that, that sort of also, uh, helped me develop this, uh, appreciation for scalability where, you know, on the, on the first day when we sort of landed, it timed out after four or five hours. And then I think I was there for about a year. And I think on the last day for the same benchmark, I think it dominated in like 0.1 seconds. Right? So over the course of roughly a year, we had a lot of sort of engineering ways of scaling things up as well as, uh, new research techniques that, that resulted

Aditya (00:53:26):

In better scalability. Uh, and I think this was a while back and I'm sure there'd been a lot more work done for that project.

Joey (00:53:35):

You've also done work in, uh, neural nets. We seen some of your papers there. Can you tell us a little about what that's about?

Aditya (00:53:44):

Right. Uh, so as we know, I mean, deep neural networks have, have been very successful at a whole range of problems, image recognition, natural language processing, and they increasingly being used in safety, critical software. So autonomous vehicles, uh, medical analysis, healthcare, so on. So, uh, I think it, it seemed like an absolute move to try and see. Well, the nature of software is changing social, the nature of program analysis, right. For such software. So more recently we've been looking at the following problem, which is you're given a deep neural networks, let's say it's, it's, um, an image recognition network. You've painted it. You spend a week training it, hundreds of GPU hours, you deploy it. Um, and now someone comes and says, Hey, here's an image. It should be classified as a cat, but you're saying it's an orange. Uh, and you're like, okay, now what, so the, the question is you, you have a deep neural network, you found a bug in the deep neural network and the task is what you do, right?

Aditya (00:54:52):

So in, in as software engineers, we were quite used to this workflow of, of writing code, making sure it's correct as much as we can deploying it, getting a bug report, uh, thinking really hard, writing a few lines of code to patch the code, uh, getting rid of the bug, making sure that that bug is gone, updating your test cases and so forth, and then deploying it. And as much as possible, we try to make sure that our, our code is monotonically increasing. It's not always the case. Uh, but at least there are lots of processes in place to make sure that, that it does occur in practice. If you have regression tests, unit tests, and so forth,

Joey (00:55:31):

Monotonically increasing, meaning it's always getting better.

Aditya (00:55:35):

Yes, yes. It's a metric. Yeah. Yeah. It's always getting better in some metric. Yes. Um, and, uh, the sort of high level goal as well, deep neural networks are part of our software engineering stack now. Uh, and if you get a bug report, what do you do? So currently what you could do is add this to your training set and retrain it and hope for the best. So they caught a couple of problems with this one is you may not actually fix this bug, right? You might, you might still just be unlucky and maybe then you have to go and tune some hyper-parameters and pre, uh, you might get worse in other ways, right? So you might fix this particular bug, but then another, uh, unforeseen issue comes up. And of course, it's time consuming where you have to go and retrain it all over again.

Aditya (00:56:27):

So what we, what we've been doing is trying to come up with these, what we call provable patching of deep neural networks, where you're given a P2P network, you're given some specification, let's say you're given 10 images and you want them to be classified in a particular way. Can you go and patch the network? Can you surgically go and change the weights and diameters of the BP of the network so that you're guaranteed that these 10 images are classified correctly, but that's not quite enough. What we also guarantee is that the changes to the weights are as small as possible. So there are minimal changes. And the reason you want that is you want the patch to be local in some sense, right? You don't want to destroy the way destroy, how the chin completely changed, how the neural network behaves for other unrelated images, right? So that's sort of what we've been working on. And we've had some pretty impressive, preliminary results there where we've taken image net networks, we've taken, uh, you know, 800 images, which were classified incorrectly, and then being able to modify the network so that these 800 are correctly classified. And the accuracy on, on the usual validation set does not decrease. So it still behaves well on the set of images that it, that it performed well on before, except it's also doing well on these new 800 images. So that's sort of what we are looking at All right. Now,

Shpat (00:58:00):

that sounds amazing. How did this work come to be?

Aditya (00:58:03):

So this is work, which I've been doing with, with my student Matthew Sotoudeh. So the project started off, I guess, a couple of years ago in a different direction. It started off by trying to basically do a second reputation of, of deep neural networks, right? So I, I had one hammer and I saw a different nail and I was like, let's see what happens. Uh, it turned out that didn't go quite well. Uh, neural networks have these huge dimensions, uh, things don't quite scale and work as well as we hoped. Uh, and also other people started sort of doing similar things in that space and it didn't seem as fresh. And, uh, I think at one point we were, we were a little bit, uh, sort of stuck, uh, and, uh, we said, well, we, you know, ideally we would want to take some fairly high dimensional Polito and push it through a neural network. That's what abstract interpretation would do. Right? You have some abstract value and you want to push it through your network and get the presenting abstract value. And that's sort of useful when you're trying to do analysis and, uh, that didn't quite work. So then we said, can we push a line through just a line in a high dimensional space, can be posted through the network and figure out what

Aditya (00:59:22):

The corresponding line is at the output. Right? So the line basically represents an infinite set of inputs and, uh, that it turned out, uh, we could do, and it could scale up quite well. And we had some nice applications of this primitive, uh, and we, we published this in, uh, new rips earlier this year. And, uh, what that led to was this, this general insight, which is if you can sort of symbolically capture a system, then you can synthesize it. Right? So it's, it's sort of what, what a lot of the programs, synthesis approach to do that. If you can, if you can verify something, then you can synthesize it, right. That was sort of a high level hand-wavy thing, uh, that we wanted to do, because what we could do is at least for this line, we could sort of exactly capture how the network would behave in some ways, but that didn't seem quite, we weren't quite there yet.

Aditya (01:00:18):

And, uh, but it sort of got us thinking about this problem of synthesizing networks or fixing them and, and so forth. Uh, but the, the other insight which enables a lot of this work is what we call decoupled deep neural networks. So if you think about, uh, an operator, so a railroad operator in a deeper network, so all it does is if the input value is less than zero, it's going to output zero. If the input output is greater than zero, it's going to just copy that input to the output. So that's, so how, uh, this rectified linear unit operator works. So if you look deeply into what the value is doing, it's doing two things it's controlling, whether or not that node is active, right? Whether it has an output or not. And it's also controlling what the output is, right? So it's doing this activation and it's doing the value.

Aditya (01:01:17):

And, uh, the, the sort of, uh, insight we had is what, if you separate those out, what if you decouple activation from the value? So in some sense, a decouple neural network, you can think of it as you have your original DNN and you're sort of duplicating it. And then you have these sort of stamp edges, which are, uh, where the, you know, the top network is controlling the activation of the bottom network and the bottom network, all it does is compute the value. So decoupling, this turns out to be the key in being able to sort of symbolically manipulate these neural networks in the way we want. So you, when you decouple them, you keep the activation as is and changing the value, right. And controlling the output by just changing the value, turns out to be producible to a linear program. So, cause the, the, the sort of the fact that a radio network or, or all of these Nandina operators are doing these two, uh, playing these two roles, it turns out to be why it's sort of tricky, uh, because anytime you move anything, you're not only changing the value that is output, but also changing whether or not that node is activated or not.

Aditya (01:02:33):

And if you think about what, at least what a deep neural network is doing is that it's sort of partitioning your space. And the partitioning depends on which nodes are active and which nodes are not active. And in each of these partitions, you actually have a find function, right? Cause once, once you, uh, once you've decided whether or not it's active or not, it's just a, it's either zero or it's, you know, Y you could do X. So within each of these partitions, it's a fine, and the interesting part is that you have a bunch of these partitions and the meaning side was, if you fix the partitions, then you can manipulate the value to fund the particular affine function in this partition. And that turns out to be easy to do. So it's sort of like started off as again, trying to do some analysis, getting frustrated and saying, can you at least do a line?

Aditya (01:03:30):

And then understanding that Railo is sort of doing these two things and then sort of, uh, coming up with this problem, this notion of decoupling, and then Matthew had this very nice generalization where you can go beyond radio networks, you can deal with other non-linear operators like Sigmar and tonnage and so forth. So, you know, the example I gave was about images of finite set of images that you wanted to classify, but we also handled something like, Hey, here is a line, right? Uh, and this line should be classified in a particular way. And the reason this is useful is, uh, you can have images that are say corrupted by fog. So there's this dataset called MNS, uh, MEC for corruption. And they're basically amnesty images, but they're corrupted by fog and so forth, right? So you could say, Hey, I want to patch my networks that it's correctly classified these sculpted images, but you might also want to say the images along the line from the uncorrupted to the Coptic also should be classified as three or whatever it is. And the, and those interpolated images you can think of as images with different levels of fog. So you're sort of getting, being that any image along the line is classified correctly, right? And this sort of comes back to what on Europe's paper was trying to do, which was sort of trying to get this exact characterization of a neural network along, uh, these low dimensional polytopes like lines and planes and so forth. Uh, so it sort of cycle back to back work we were doing.

Joey (01:05:08):

So this all sounds like a complete departure from what you're doing and the rest of your, in the rest of your work. Do you feel like it is that, or do you think that there's a, a purpose that brings these things together?

Aditya (01:05:22):

So, um, in general, you sort of let the problem guide you. So I, I, I don't believe in siloing myself in, in a particular subfield. Um, so as I said, it started off as saying, Hey, let's do a second operation of, of deep neural networks and so forth, but over time, it, you just follow you, follow the problem, you follow the solution and see where it takes you. Uh, and also it's not just me working on this. So, uh, my students, they also obviously have a huge influence on where the problem and the solution goes. Uh, so it's sort of like that sort of collaboration where you're like, Hey, what if we could, or what if we can, uh, you know, would this be possible, uh, and so forth. And, and we just sort of let it flow that way. And, uh, you know, this is something which, uh, I can credit my advisors.

Aditya (01:06:19):

Who've always encouraged me to sort of just follow, follow the problem, follow the solution and see where it goes. So you don't mind my PhD advisor, Tom reps. He was, he had very broad interests. Uh, I remember in my first couple of years of my PhD, I was working on just on machine code verification, like model checking. And towards the end, we will, we had results which were talking about, you know, Pamela sat solvers and this connection between, um, uh, app second deputation and SMT solvers. And we were working on separation logic and, and, uh, and so forth. So, and, and, uh, if you had told me on my first day of my PhD, that you will be working on this, I'll be like, no, I, that's not, that's not what my advisors asking me to do right now. That's not what I've done previously. Uh, but that's sort of, uh, something which I've learned from my advisors and even my master's advisor, uh, you know, professor [inaudible] and he was also very encouraging, saying, you just displaying follow the problem. Like don't, don't restrict yourself in any way and make sure you're making progress. And, you know, something interesting will come up and something useful will come up as well.

Joey (01:07:30):

And it sounds like in passing this on to your students, you've been able to empower them to sort of help control the direction of these projects. So they've really been a valuable part of finding these new directions and finding these exciting applications, even though maybe somebody that showed up expecting to be doing an abstract interpretation is now publishing and, uh,

Aditya (01:07:52):

Machine learning, absolute conferences. Yeah, absolutely. And that's something which I make explicit to them saying that, Hey, I'm interested in some of these ideas, I'm probably better at solving problems in a few areas, but that doesn't mean we will be stuck in those problems or those solutions. Right? So even, uh, this, this problem where we're trying to find these of functions, uh, that was a collaboration with, uh, professor Cindy Rubio at UC Davis, who was working on these sort of handling bugs and so forth. Uh, and then we sort of looked at this problem and I had this hint of an idea about how to use, uh, what to work for functions, but I didn't quite have an application in mind. Um, and then we sort of landed, landed up collaborating on this problem with, uh, with the student who advised, uh, Danny the fees and, and, and sort of that, that also was fairly natural in that sense. So, uh, it's usually not the case where you have a set problem and a solution in mind and you stick with it for, for years to come. Uh, if you're lucky that can happen, but, uh, you might, or other, you might start off with that mindset. It's always nice to have something concrete in mind, uh, some concrete insight that you can breed off, off, but then you just go with the flow and see where it takes you.

Joey (01:09:17):

Yeah. It's wonderful that you encourage them upfront because it, if you didn't, they might otherwise not feel as free to mention some of the, some of the ideas that they come to them. So I'm sure that I'm sure that in being really upfront about that you set that expectation. They say, Oh, I had a, I had an idea and it sounds kind of crazy, but now I know, I know I can bring that to my advisor and they won't tell me I'm they won't tell me I'm crazy. Or they won't tell me that's not what we do in this.

Aditya (01:09:43):

Yeah, absolutely. Yeah. Yeah. And, and it also prepares them for the kind of crazy things I tell them in our, in our meetings,

Shpat (01:09:53):

Uh, if people want to either kind of read more about either of the, or any of the projects that you talk about in the work is you talked about or play with some of that that's, that's kind of further along, what can they do? Where can they find that?

Aditya (01:10:09):

So, uh, I have, uh, my UC Davis website has all the publications, uh, and we is also linked to my research group website where you have links to the source code. Uh, and, uh, they, of course they can always email me, uh, and, uh, I'll be happy to connect with them and so forth. So the website is tacos dot CS dot UC Davis study. EDU,

Shpat (01:10:35):

We'll put that link.

Joey (01:10:37):

We're nearing the end. Yeah. Is there anything else you'd like to add before we close out?

Aditya (01:10:42):

So I guess one thing is, uh, it sort of connects with, with the question you had, which is, I think, uh, definitely you want to encourage people, you know, in the industry or people who, uh, want to explore formal methods to, to reach out to me and other researchers. Uh, I think there's, there's probably a lot more collaboration that could happen in terms of, uh, usability, scalability, specific problems that, uh, we could work. Um,

Aditya (01:11:14):

I understand, you know, from, as an academic, there are different metrics and different goals and different constraints. Uh, and, and those obviously before from those, if you're, if you're working in a company or you're working towards a problem, but I think there's enough common ground, uh, where something can, can, can occur. And there are, I think that lots of researchers out there who are more than willing to collaborate and work, uh, in, in some capacity with, with folks outside of academia. Uh, so I would sort of encourage them to reach out in some ways to, um, uh, you know, find issues, send out requests and so forth. Yeah. I love that. You said that is one of the like faculty, I think in some people's mind is like this, like this, this busy professor they're doing research. They don't want to hear, they don't want to hear from me.

Aditya (01:12:10):

They don't want my, this, this person's so deep into abstract interpretation and program analysis. There's no way my question is going to be like smart enough for them, but everybody dreams of the day that they get this email from, uh, a student, even like an undergrad or a, or somebody working in industry, that's just like, Hey, I have heard about abstract interpretation and I just want to learn more, what can I do? So they helped me and, and the vast majority of people are faculty because they want to teach, or they want to be involved and spread the word in some sense. And so that sort of thing is just so welcome and so exciting when it comes to encouraging people to do that as wonderful. Absolutely. Absolutely.

Shpat (01:12:49):

Well, I really enjoyed that conversation. Thank you so much for, for joining us and, and, and kind of sharing with us what you're doing.

Aditya (01:12:56):

Yeah. And, and hopefully some of that was useful. I know, I think they do ramble on a bit, or maybe get distracted by certain things and whatever, but hopefully that was informative enough to sign that you love what you're doing.

Shpat (01:13:12):

That's that's right. And you're, you're a good storyteller for sure. So, uh, you know, the, the end result, people will know what got cut out, but, but, uh, but I'm sure it will make sense. It was a pleasure. Thanks so much for joining us.