Building Better Systems

Episode #21: Nikhil Swamy — Fully In Bed With Dependent Types

Episode Summary

Today we're joined by Nikhil Swamy, Senior Principal Researcher in the RiSE group at Microsoft Research. We are very excited to hear about what he's been working on. In particular, we're going discuss a language that he's co-created and continually develops called F* (pronounced F star). F* is a dependently typed language that you can both program and prove things about the programs that you write. We'll talk about what makes that language special and unique from other similar languages, as well as some of the applications of F*.

Episode Notes

Today we're joined by Nikhil Swamy, Senior Principal Researcher in the RiSE group at Microsoft Research. We are very excited to hear about what he's been working on. In particular, we're going discuss a language that he's co-created and continually develops called F* (pronounced F star). F* is a dependently typed language that you can both program and prove things about the programs that you write. We'll talk about what makes that language special and unique from other similar languages, as well as some of the applications of F*.

 

Watch all our episodes on the Building Better Systems YouTube channel

Nikhil Swamy: https://www.microsoft.com/en-us/research/people/nswamy/

F*: https://www.fstar-lang.org/

Joey Dodds: https://galois.com/team/joey-dodds/

Shpat Morina: https://galois.com/team/shpat-morina/ 

Galois, Inc.: https://galois.com/ 

Contact us: podcast@galois.com

 

 

Episode Transcription

[Shpat] Hello, and welcome to another  episode of Building Better Systems   My name is Shpat Morina. [Joey] And I'm Joey Dodds. [Shpat] Joey  and I work at Galois. Galois is a computer science  

0:11

R&D lab focused on hard problems, trust,  cyber security and a lot of other fun stuff.  

0:19

[Joey] Today we're joined by Nikhil Swamy who is  Senior Principal Researcher in the RiSE group   at Microsoft Research. This is kind of an exciting  one for me because back when I was in grad school,  

0:29

Nik came in and visited and it was really  inspiring to see somebody doing industrial  

0:34

research like Nik was. So very excited to be  talking with him today and to hear a bit about  

0:40

what he's been working on. In particular we're  going to talk about a bit about his background  

0:46

and then we're going to discuss a language that  he's co-created and continually developed called  

0:52

F* (pronounce F*). F* is a dependently typed  language that you can use to both program and  

0:58

prove things about the programs that you write.  We'll talk about what makes that language special  

1:04

and unique from other similar languages as well  as some of the applications of F*. Let's dive in.

1:11

Designing, manufacturing, installing and   maintaining the high-speed electronic computers — the largest and most complex computers ever built.

1:30

[Joey] Thanks for joining us Nik. [Nikhil] Yeah, it's  great to be here, thanks for having me.  [Joey] So before we dive into what you've what  you're working on now, I'd love to hear a  

1:39

little bit about your background and how you got  interested in the field of programming languages.  

1:44

[Nikhil] That's a good question. So my background  broadly is that I grew up in India and I 

1:52

came to the US as an undergrad and I went to a school called Hampshire College  

2:01

in Massachusetts, which is kind of a liberal  arts school — no grades, make your own major  

2:06

kind of place. I's a lot of fun and this  school is very kind of oriented towards,  

2:16

you know, self-directed study in some sense and I  found a couple of great professors there who are  

2:23

interested in — who got me interested in computing and physics and math and things like  

2:32

this. I worked with them on this — it was like in  the late 90s — I was working with them on quantum  

2:40

computing and applying sort of search algorithms  to find better than classical quantum programs.

2:50

And then trying to prove that the programs that  we actually found were correct and solved whatever  

2:56

problem we were trying to solve like, you know,  that if it's a circuit to solve something that it   was actually solving it with some finding  the right answer with some problem correct  

3:05

probability this kind of stuff. So  that's when I kind of really got into,

3:10

or understood, that "hey, it's actually  interesting to try to prove stuff about programs"  

3:16

and I learned a bunch at that time about, you  know, automated theorem improving techniques and  

3:22

well actually I thought I learned a bunch  but actually I learned surprisingly little  

3:28

and you know then learned that "wow, I actually  hadn't learned, like there's so much more" and   you know I went to grad school at Maryland and  thinking about like well you know what initially  

3:40

thinking about like "hey, I might not try to  do some things like in the AI space but I then

3:50

kind of had a you know some sort of moment Iguess  in grad school as that you do sometimes where  

3:56

I realize that like "hey, you know being  able to prove properties about programs is  

4:02

a really was a really like powerful idea for  me" and I worked with Mike Hicks my advisor  

4:09

at the University of Maryland on a programming  language called Cyclone. Cyclone is like a  

4:18

a C-like programming language with a type system  geared towards proving memory safety and with  

4:29

ideas in Cyclone that you know were maybe somewhat  a bit ahead of its time in some sense things like  

4:35

you know linear types and you know being able  to run being able to use sort of custom memory  

4:44

management strategies everything from unique  pointers to reference counting to you know even  

4:50

in some cases conservative GC. Lots of ideas from  Cyclone sort of in region based memory management   and so on. I think you know now maybe find a home  in languages like Rust and I learned a lot at that  

5:04

time from Mike and the other people working  on Cyclone about you know about programming  

5:11

languages and type systems and being able to  prove things about programs before you run them 

5:18

and yeah that's how I got into this stuff. And  then for my thesis I was more you know building on  

5:26

what I'd learned about merton Cyclone. I started  to explore some ideas in independent types and see  

5:34

how you can add just a little bit of dependent  types to a programming language and get it to  

5:40

prove properties that you prove. I was at the  time interested in security properties as to

5:47

how can you get away with adding  just a little bit of dependent types   and be able to prove I don't know things like  the correctness of access control enforcement or  

5:57

information flow control  policies, things like this.   [Joey] If we don't have anyone this familiar, could  you give us a super quick glimpse because  

6:04

I think it's going to come up again in this  conversation: what is a dependent type? Right.

6:14

[Nikhil] All right it's kind of a profound  question in some ways but  

6:20

in a simple sense it's a way to kind of encode a  logic in the type system for programming language  

6:31

and the word dependent maybe you know maybe it's a  little bit scary but it's actually a really simple   thing. It's like you can have a function who’s  you know, usually a function if you give it a type  

6:42

signature it's a function that takes an integer  and returns an integer. Let's say that's a common,   it's like a simple type for a function but you  can have functions that that are, you know,  

6:52

whose return type can depend on the argument  and it can do things like if you call it with  

7:00

the integer 17 then it returns, I don't know a vector whose length is 17.  

7:09

Or you could even be things like, "hey,  you know you call it with a boolean  

7:17

and if you call it with true, it returns a  string and if you call it with false it returns  

7:23

I don't know, an integer, let's say. So that's why  the word has heading dependent comes from but this 

7:31

mechanism which you can kind of think of as like  the way the examples I gave you are very kind of  

7:38

programming centric you know but  this basic mechanism allows you in 

7:45

a very kind of foundational way  to encode within the type system  

7:51

very powerful logics that let you build all of  mathematics in some sense inside this logic which  

8:01

doubles as a programming language. Now dependent  types of because it is very powerful thing they  

8:07

you know you don't see them that often in  mainstream programming languages — you're   beginning to — but they you know the kind  of languages that me and my colleagues work  

8:17

on you know there's I'm sure we'll get into more  of them but you know they're fairly I don't know

8:26

they're pretty fancy, I mean to put it bluntly.  So I was trying in my thesis work to sort of see  

8:32

well you know I don't want full dependent types  what can I you know can I just inject a tiny bit   of dependent types into my language and you know  how much mileage can I extract from that? Like  

8:43

you know, what kinds of things can I prove  if I get you know dependent types light  

8:48

and then you know then I guess  it's 10 - 15 years later.

8:57

I'm no longer doing dependent types  light it's kind of you know the

9:05

heavy dependent types, yeah. The entry,  the gateway drug, or something you know.   You start doing it a little bit and then you  know after a while you're like you're fully in  

9:13

bed with dependent types and it's all you know  [Shpat] Fully in bed with dependent types. I think we  

9:20

got our title for the podcast. [Joey] Cool, and so now  you're at Microsoft Research and have been for  

9:27

a while and you've been working on increasingly  more dependent types in a language called F*.  

9:35

[Nikhil] That's right. Yeah, yeah. I've been at  Microsoft Research since about 2008 and   I worked there at a group called RiSE  research and software engineering and  

9:47

I work on F* but broadly on you know programming language techniques,  

9:56

programming language and program  verification techniques to prove

10:02

programs correct, to prove them  secure, to prove you know things like  

10:07

resource bonds and about them things like this  proving things about programs through program  

10:14

language design yeah through programming language  design where you know in these new programming  

10:19

languages to sort of the languages facilitate  writing programs and their proofs together  

10:30

to you know make it possible and maybe easier  to actually write programs that are correct.  

10:36

A lot of this work from where you started to the  working Microsoft sounds like part of it has maybe  

10:42

culminated, as the wrong word, but it has led to  this language that you're working on which is F*.  

10:50

I wonder if it would be a good place to start —  for you to tell us a little bit about it. Yeah,   sure. So F* is this programming language it  looks like superficially it looks like a you know  

11:02

many other functional programming languages  things like languages in the in the ML family,  

11:07

things like OCaml or F-sharp or  even Haskell to a certain extent,  

11:14

Ao which is to say that it's a mostly  functional, statically typed language

11:26

where you know what's maybe  unique about it is that it  

11:31

has you know I'd say like three maybe unique  features. the first is that it's dependently typed 

11:40

meaning that the type system is more powerful  than these other languages that I've mentioned.   So far things like OCaml or F sharp and so on it  has a type system that is you know it's got full  

11:53

dependent types similar to other languages  that you may have heard, of things like  

11:58

like Lean or coq or agda or you know  other differently type languages but

12:05

it is different from the the dependent types  that you see in these other languages in that  

12:10

F* is based on an extensional type theory which  is slightly technical, we can talk about that  

12:16

at some point if you want, different from  the type theories that are implemented by  

12:22

these other languages that I mentioned. I think  this extensionality thing is kind of important   because it enables you know making F* sort of a  programmer first kind of language rather than a  

12:34

language in which you say maybe you know some  of the other other languages they're kind of   set up as you know proof assistance in which  you can do in sort of you know mathematics  

12:46

at a very deep abstract level F* is more kind of  programmer-centric where you're not you wouldn't  

12:55

necessarily fire up an F* session to I don't know  formalize category theory or something. I mean you  

13:01

can if you want but that's not what people use it  for it's more like you know you use it to write,  

13:11

for building programs that you want to execute  and this in doing that like if as you're writing  

13:17

programs and you're writing sort of software it  it turns out you know as probably you're both  

13:23

aware like when you're writing a piece of software  you're not there typically and you want to prove  

13:28

it correct there's maybe a handful of interesting  theorems that you want to prove but the bulk  

13:35

of the things that you're trying to prove are  really, really tedious low level you know tons of,  

13:41

you're mired in details, hundreds thousands of  small proofs that help towards maybe you have some  

13:48

interesting insight about a big invariant and like  you know there's there's a handful of big proofs.  

13:54

So one thing that F* really aims to do is to try  to reduce the overhead of the human overhead of  

14:01

having to write these thousands of small proofs  all over the place and the way that it does this   is by using and this is kind of the one of the  unique things about F* is by using an SMT solver.  

14:11

An assembly solver is an automated theorem prover that if you know you feed it a problem in   some logic and it's usually some you know  things like first order logic plus some  

14:20

reasoning capabilities about arithmetic  things like this you feed it to the SMT   solver and the SMT solver can try to find , decide  whether a particular formula is valid or not.  

14:30

So what F* does is that it analyzes your program  in this, its dependently type logic produces  

14:37

these you know hundreds a large number of proof  obligations that pop up when you're trying to  

14:44

verify even you know 10 lines of code and feeds  that to this SMT solver which tries to do the  

14:50

proof for you. Now I mentioned at the beginning  that extensionality was really important here and  

14:55

one of the reasons is that the particular flavor  of type theory that F* uses is designed in a way  

15:01

to enable this kind of you know dispatching proofs  to an external theorem solver that can you know  

15:11

implicitly prove many properties about  your program while you're writing them.   So that's kind of so dependent types  and and SMT solving that's kind of two  

15:19

important things about F*. The third thing  is that you know when I want, me and others,  

15:26

want to write software in F* that is efficient and  you know can be low-level software that's doing  

15:35

I don't know. Things like efficient network  communication or cryptography or you know  

15:40

things like this. So this kind of software is in  many ways inherently effectful or imperative in  

15:48

a sense you know although I said at the beginning  that F* sort of a mostly functional language the  

15:54

way in which we use F* is by encoding within the  functional language other sub languages that allow  

16:02

you to reason about computational effects.  You know things that your mutable state or

16:09

you know raising exceptions or doing IO  Things like this. You can reason about that,  

16:17

you can write programs that do these things  in F* and reason about them within F* and  

16:23

so F* has this effect system and that's the I  think this is the other unique thing about F*   that it's dependent types SMT solving and effects  and that package together is, we use it to

16:40

to build a variety of things in the  kind of domains that I've mentioned.  

16:47

You know like a focus for us  recently has been low-level  

16:53

secure communication software that's ranging  from things like cryptographic protocols to

17:04

say a network packet parsing things that we  

17:12

we implement in F* proof correct and then execute  as you know you can get out of F* you can get C  

17:20

code or assembly code or whatever depending on  which of these fragments or style you program in  

17:26

you get out this imperative low-level  software that you can then deploy and run.  

17:32

So it's in somewhat active use  for real things, it sounds like.

17:40

My question, if that's somewhat correct.   Well it's a it you know I'd say there's there's a  small community of active users who are you know

17:56

I mean my colleagues at MSR, we use — several  of us use F* to do things with it like you know,  

18:03

what something that we've done recently is how  there's an F* a sub-language for specifically  

18:13

for writing low-level parsers correctly  parsers of things like network message formats  

18:20

so we have a verified parser generator that  we've implemented inside F* and we use this,  

18:27

Is this everparse? This is called everparse that's  right yeah. So you know we use this to to build  

18:34

the network packet formats used in various parts  of the windows kernel and we extract you know high  

18:44

performance C code for parsing those packets  and for the last maybe 18 months or so we have  

18:55

verified parcels produced by F* you know running  inside hyper-v, inside the windows kernel  

19:00

in several releases of windows so far. So there are pockets of such you know people who  

19:08

even outside MSR there's some people who  use F* to deploy verified cryptography in  

19:14

the Linux kernel or in Firefox you know.

19:22

[Shpat] I have a couple questions about  that which is that's awesome. You,  

19:27

at a higher level than F*, have so first  of all you know when we're talking about   dependent types you said they don't end  up in programming languages that are  

19:38

more generic or more used because  they're a little complicated, it's my   simplified version of what you said. But it  sounds like you've made this language suitable for  

19:49

programming you know in the large and yet you know  designed sort of dependent type theory into it.

20:01

How did you make that happen and is it  the case that this is more approachable?  

20:08

[Nikhil] I mean I think there's still a long ways to  go I'd like to think that at least some of   what we do in F* makes dependent types more  approachable to certain kinds of people.  

20:21

You know it's still like I said a very small  community of people who are using this stuff but I   think I'm optimistic. I think we're you know we're  on a trajectory where as we see more successes  

20:36

across the the several differently type languages  that we have out there today and as you see you  

20:42

know students coming out of universities around  the world like the gaining more experience with  

20:49

this kind of stuff where you know you look at if  I even just from a personal experience like if I 

20:58

think about the interns that I had when I  joined MSR in 2008 you know super smart students  

21:09

but you know at the time programming like this  in a differently typed language was like the   bleeding edge like not everybody had experience  with it and but today the interns that you get  

21:19

like everyone has done many of them have done you  know some part of software foundations or they've  

21:26

you know programmed in coq or something and it's  it doesn't feel alien to to people. Many people  

21:34

anymore it's just like okay I just I can just  hack my way through this proof no problem and so  

21:39

I think the the community of people who have a you  know a certain kind of baseline skill to approach

21:50

these things that were previously quite esoteric  

21:57

is you know that community is growing and growing  I think quite fast. So I hope that you know in a  

22:05

few years like languages like F* or Lean or you  know so many others will no longer be as esoteric.

22:18

[Shpat] That's a great word for it. [Joey] Well it's a really  important perspective because sometimes when we  

22:24

look at these things today it's still a small  community right. By most metrics of programming  

22:31

language communities there is kind of hope because  yeah if you look 10 years ago dependent type was  

22:38

like yeah it kind of felt new and now it's  yeah now you have a talent pool to draw from  

22:44

and there's no reason that growth that growth  should slow down so that is a really important   perspective to say like, "yeah, it's not it's not  huge yet, but we've made progress." Things like  

22:54

F* and coq and I think lean is bringing a whole  new group of people sort of into the independent  

23:04

type sort of world and yeah it's really exciting  to see that growth and definitely important to  

23:09

acknowledge all the progress that's been made in  the last 10 - 15 years right. [Nikhil] Yeah, yeah I think  

23:14

maybe the answer to me one way in which I see sort  of more growth happening is you know and maybe

23:24

as a functional programmer this is not that  surprising but I really see like the you know  

23:32

domain-specific languages is a big part of the  answer that you know you have these frameworks   that are super powerful you can formalize all of  mathematics in them if you want but most people  

23:40

don't want to formalize all of mathematics  they want to do some very specific thing  

23:46

you know like I don't know writing in our  for instance that taking everparse again was  

23:52

you know I'm not going to unleash the  full power of independent types on   for someone who just wants to write a packet  parser you know I'll design some framework  

24:03

within my you know some language within my  very powerful framework that's tailored to   this very specific usage where you kind of that  sort of rains in the complexity and makes the  

24:19

allows a person to get value from it without  having to understand all of it. [Joey] I'm laughing   a little because you sort of mentioned that and I  think I think Leo had this experience where maybe  

24:27

he thought the same but then the entire math  world sort of showed up at his door and said   we want to formalize all the mathematics with  dependent types. [Nikhil] That's true, that's true but  

24:35

yeah fair enough. [Joey] Yes but yeah, certainly  not everybody wants to do that and the uses  

24:42

seem to be fairly different so yeah I mean I  guess on an unrelated question you know since  

24:48

since we're mentioning lean and I think maybe  this gets back to the extensional intentional

24:53

kind of thing but you know we did talk to Leo  a while back and one of the things I've sort of  

24:59

been wondering about since we talked with him is  like Leo obviously understands SMT solvers very  

25:05

well and knows what they can do from the sounds  of it he started out trying to write something  

25:11

that could make use of SMT solvers and they could  let you get more work done before the SMT solvers   do their thing but he ended up in a world where  like lean doesn't speak to SMT solvers very well  

25:22

or at all at the moment and is that sort of the  extensional intentional distinction there or did  

25:30

some did something else happen in your opinion to  sort of cause them to to move away from that? [Nikhil] I  

25:35

think you could make this SMT thing work inside  an intentional type theory in certain ways.

25:45

I think within lean too it could  I think it could well happen I  

25:52

talked to Leo a lot you know he's talking a  ton about both SMT solvers and type theories  

26:01

he was saying the same he  was saying the same about you

26:07

he's being kind it's definitely the other way but

26:14

yeah I think I mean if I was to guess I  mean I'd say you know some we may still  

26:21

someday have a SMT solver inside lean I think it's just that lean has really taken  

26:28

off in this math community where the priorities  have been to you know to serve the needs of you  

26:36

know to meet and serve the needs of that  community and maybe SMT's offers are not  

26:43

have not made it to the top of the priority  list yet but they may I expect someday they will  

26:50

but the kinds of things that you know  and and other and people have built SMT  

26:56

assisted proof automation in in in other  frameworks from you know there are some in is  

27:04

notably it's not dependently type theory dependent  type theory but you know SMT solvers work very  

27:10

well there but also there's some some some work  in cock doing things like this but the thing in F*  

27:19

that makes it different than say lean or coq  or other defending type languages is that this  

27:27

you can use an SMT you know proofs  done by an SMT solver or proof's done  

27:33

they can be done by an essential solver but  they could be done by any other means to buy   a tactic or whatever you can use them  to implicitly convert types everywhere  

27:41

that's the main thing like you know you can  you know if I have a you know a vector of  

27:49

whose length is n plus n I can implicitly treat it  as a vector of length m plus n because of course m  

27:56

plus n is equal to n plus m and you know an  assembly solver will tell me that instantly  

28:03

but in in an intentional type theory   if you want to do this simple manipulation of  converting these two types from one to the other  

28:12

there's a lot of well Joey you're smiling because  you've probably had to confront this kind of thing  

28:19

before because it's kind of a mess I mean to do  these these what seem like obvious conversions  

28:25

is a bunch of bureaucracy in an intentional type  theory you have to explicitly have these equality  

28:31

witnessing conversions and things like this and  and these things just go away in an extensional  

28:37

type theory and maybe makes it more immutable  for me to just call an SMT solver at this point  

28:42

to just decide if n plus m equals m plus n  and just be done with it without having to  

28:50

you know mangle my program [Joey] Gotcha so  it's not necessarily impossible in this  

28:55

intentional setting but it's quite a bit easier  I guess yeah in the extensional setting to just  

29:01

take what the take what the solver says and  kind of go with it and yeah not have to do   quite so much bookkeeping around you know why  is the why is what the solver says correct how do  

29:11

I propagate it through my terms and things  like that yeah that makes a lot of sense  

29:17

one of the [Shpat] One of the great things  about being a podcast host is that  

29:23

when I am out of my depth or don't understand  something I can move us on so here it is  

29:28

me moving us on turned on topic. [Nikhil] Thank you. [Shpat] But jokes and jokes aside pseudo jokes aside the  

29:40

I want to see if so there's probably a lot  of people listening who do work on their own   languages or they're doing research and trying a  few new things with maybe toy languages that one  

29:51

might they one day might become I get a larger  user base I'm wondering if we can talk about  

29:59

how you think especially since if F* is  getting some use about how you think about  

30:04

how do you evaluate language design from the  lens of is it useful and will people use it  

30:12

I wonder if you have thoughts there to share. [Nikhil] Yeah  it's a good question I mean I have some thoughts I  

30:19

don't my thoughts aren't necessarily you know I  don't have any sort of deep worlds of wisdom here  

30:26

it's but I think one way one thing  I can say is that you know somehow  

30:38

co-evolving your language design with  a problem that you really want to solve  

30:45

is really important so that you're not  just you know yeah you're not just building  

30:52

something that you can't keep evaluating you  need to be able to evaluate what you're doing  

30:58

like every day you know is this actually helping  somebody solve some problem you know that often  

31:04

means pairing up with some domain expert because  sometimes you know as a as a language designer  

31:09

you're you know you can sort of for many of us  I think we're just the aesthetics of some of of  

31:16

you know lambda calculators and all this can can  be quite alluring and you just kind of go off and  

31:22

design something because like it's beautiful  or something and that's fine and it's fun  

31:28

but I think finding a way to sort of pair  up with a domain expert from from some other  

31:33

domain who's really trying to solve some problem  whether it's you know whether it's doing math  

31:40

or whether you know that for instance in the  lean community that's been that's been a a big  

31:46

driver I think for a star that driver has been  a a lot about you know has been cryptography  

31:54

and you know people who are interested in  all sort of implementers of cryptography not  

32:00

you know not necessarily cryptographers  as such but you people who want to build   efficient implementations of crypto protocols or  or crypto primitives and so I think finding a way  

32:13

to you know and that that sort of cycle has been  I think really helpful for F* you know there's  

32:19

we proposed some you know some new way of doing  proofs and immediately you know our friends at  

32:26

Inria or CMU will try it and tell us you know  if it makes sense or not and we'll iterate  

32:37

so I think you know and this is not a by no means  is it like a huge user communities maybe we're  

32:42

talking you know this kind of the the close circle  of people who repeatedly give us feedback on  

32:50

on new language features is something like you  know maybe 50 people nothing nothing crazy right   but but that even that is enough I think to  to to keep you honest in your language. [Joey] When  

33:04

we interviewed Alastair Reid he had this well I  guess he had a paper called it but I like the idea   as well as kind of the idea of meeting developers  where they are and that's really I think that's  

33:15

kind of what DSLs are all about to some extent  is like we have this language it can express   anything in the world you can write any program  in the world with it and say anything you can  

33:24

express in a logic about it right yeah and that's  great and that you know that'll get you a puff of   paper maybe an ICF paper but that's not enough  to of course get somebody get somebody using it  

33:36

and and the idea they were all about is like we  need to minimize the time between when someone  

33:41

starts using this thing and when they see some  some sort of good results basically and so yeah  

33:47

DSLs are a way of bringing that time way down and  minimizing the things they need to learn I guess  

33:54

have you done areas other than other than  cryptography where you've seen success is   it possible to layer up different sort of DSLs  on top of F*? [Nikhil] Yeah so you know one of the things  

34:03

that we're looking at or the sort of systems  programming but systems programming that you   with with concurrency and that you can sort  of prove correct in in a separation logic  

34:16

which is a way to write imperative programs in  concurrent programs in netstar and reason about  

34:21

them in this in this separate in this thing called  the separation logic separation logic is it's an   active area of research in the PL community  in the program verification community and  

34:32

it's a way to sort of structure proofs that   that are that's more modular if I were  to just put it in a in a nutshell and our  

34:44

things that we you know are from a purely sort  of language design perspective we we started  

34:50

to look into this seriously maybe you know  after our experience trying to do proofs in  

35:00

for crypto protocols and so on in a  framework that didn't have separation logic   and we were able to get things done and we  have a large code base that's done without  

35:09

separation logic and all these these this code  that's deployed in various places doesn't use   separation logic but we notice that our proofs  are way harder and hard to maintain because  

35:20

they lack this kind of modularity so we've been  looking at you know so motivated by that sort of  

35:26

you know empirical observation that we have this  code base it's kind of many hard-fought victories  

35:33

how can we make it a little simpler maybe by  enhancing the logic in which we work and so so  

35:42

that's something we've been looking at  recently and and using it to do things like  

35:49

you know it's it's building on on what we've done  with crypto and also parsers and so on but it's  

35:56

trying to apply some of these ideas in what we  have a project where we're trying to use this to  

36:01

do things like high assurance monitoring of cloud  services like can can you write a can you write a  

36:10

monitor for a service that's going to observe the  interactions that a client has with that service  

36:15

and you know in and provide some guarantees about  that interaction cryptographic guarantees about  

36:21

that interaction where for instance if the  if we notice that the server is responding  

36:28

in a way that's incompatible with the trace of  messages that have that the server has seen so far  

36:34

then we except for some small probability of  you breaking a cryptographic hash or something  

36:41

our monitor will will notice this and and and  signal an error you know signal that you know  

36:47

the service has deviated from expected behavior  so things like this so it's so still you know  

36:54

there's still some cryptographic components to  it but moving beyond just crypto primitives or   protocols [Joey] Yeah that's I guess that's one of  my that's sort of one of my favorite uses of  

37:03

of formal methods maybe almost ever I guess  is like the state machine work you all did   with TLS where you did about the same thing you  made a verified implementation and maybe it wasn't  

37:14

quite high performance enough or there were some  other demands that it wasn't quite meeting to   to run in production but what  you all were able to do is  

37:22

know exactly what a perfect TLS conversation looks  like because you made a verified implementation   and then just have a bunch of TLS conversations  and whenever the whenever the other side messed up  

37:31

you said ah they messed up and you you all found  like a lot of serious TLS implementation bugs  

37:38

in systems that are running in production which  was I think a really cool result and a really   way to really great way to make a lot of formal  verification sort of directly useful to production  

37:49

systems basically even if people weren't kind  of ready to to use the the verified thing yet .  [Nikhil] Yeah that that's really cool work it's I wish I  could take credit for it it's not actually mine  

37:58

it's worked by my close colleagues and friends at  MSR, Cambridge and Indriya in Paris it's Antoine,  

38:10

Karthik Bhargava, and Cedric Fornay and others  but yeah that's really cool stuff. [Joey] They did do the  

38:18

verification in F* at least to some extent right  so you can get second-degree credit.{Nikhil] I guess sure

38:26

[Joey] But this sounds like this a similar sort of  application where yeah you know but it is as  

38:33

nice as it is to deploy verified code directly  the existence of verified code allows you to do  

38:39

a lot of different things even even if people  aren't ready to ready to run it right away   [Nikhil] Right yeah totally yeah I although you know so yes  I second I totally I buy what you're saying Joey  

38:51

but I'm kind of also you know I think you know we  can get to a point where you know verified code is  

39:00

you know it is sort of has no excuses like it's  correct it's as fast it's you know maybe you know  

39:11

there should be no reason for you to to not  deploy this in a production setting I think   we're getting there in several sort of I mean I  think the main the main barrier is it you know  

39:25

is the component that you're trying to  verify is it high value enough like in   terms of is a flaw in that component you  know severe enough that that it warrants  

39:38

the investment of so much human expertise into  producing this verified component producing and  

39:44

then maintaining it and so there are some places  where this makes sense like you know if there's a   I don't know a bug in your deep in your in  the virtualization stack that's isolating  

39:57

cloud tenants from each other then you  know that's that's that's a bug that could  

40:04

cost billions and you know maybe in  those situations it's worth you know  

40:11

strategically replacing some key component of that  virtualized virtualization stack with a verified  

40:17

component and the same goes for cryptography and  you know so there are these these I think these  

40:24

sort of select few areas where where they  where the economics is beginning to make sense  

40:31

[Joey] The other the other side of economics I guess  that the people take into account here is  

40:37

if their team isn't the one that developed the   code but they need to maintain it they  need to they need a decent story there  

40:45

and one of the arguments I guess is you're less  likely to need to maintain it because we're more   likely to have gotten it right but sometimes  that's not not sufficientI'm curious you know  

40:54

you all got those parsers deployed apparently  teams you know like likely teams other than your  

41:00

teams and Microsoft is now sort of responsible  for those in some sense and I'm curious how that's  

41:07

I'm curious how that's gone and and what you know  what was convincing in terms of you should use  

41:12

this code you're gonna be able to maintain it or  will be able to help you maintain it yeah I don't   know how those conversations go. [Nikhil] Right yeah I think  the maintenance thing is a is a big one like and I  

41:27

mean I think this this argument that it's  verified so the maintenance cost is lower  

41:33

it's a bit facetious I don't really buy it like  this you know things change sure the code is   verified but you're going to find that you know  the your spec will change because the needs of  

41:43

this component are going to change and and then  yeah so but so maintenance is is a is a big issue  

41:51

I think in the specific case of the one that you  mentioned that was deploying these parses you know  

41:56

the answer again for us was like I mean it's  the DSL thing again you know we're not going to  

42:03

if it's just an arbitrary piece of code that's  going to be you know someone has to maintain   then this is if you're not an F* expert you know  then that's that or you don't have an F* expert  

42:15

on your team that's really that's a really  hard sell but if what you have is instead  

42:24

you your your writing you know you write a format description in some c like syntax and the  

42:32

proofs happen automatically behind the scenes then  the maintenance story is is much easier the the  

42:39

verification tool is just a tool in as as part  of your your your compiler workflow you're not  

42:45

interacting with it directly and so so that's  a bit of an easier sell people who don't have  

42:52

to be you know have star experts to to maintain  and evolve this code we still work closely with  

43:02

the teams that use our code and you know if the  code has to be evolved we're we're usually part  

43:09

of the code review to make sure that it's being  done in a reasonable way things like this the  

43:15

other context I think is is also you know in cases  where where the component you're delivering has  

43:22

a very clear spec like you're you want to deliver  an implementation of sha or something you know and  

43:30

there my colleagues who who do this who have done  this in many in many situations often deliver  

43:37

c code that comes out of the tool chain and the c  code is designed to be you know we take a lot of  

43:44

pains to make it look pretty and human readable  and so on my colleague Jonathan Proczenko is  

43:55

that he's he's responsible for a lot of this  like building a tool that emits clean looking  

44:01

c code from F* yeah that's maybe another  way you deliver the artifact and maybe   if things are stable enough like you don't expect  the specter change of this kind of primitive then  

44:13

the maintenance question becomes easier as well  [Joey] Right but if yeah if you deliver reasonable  

44:19

looking c code the worst case for that team  is that you all disappear no one knows how to   maintain it they've still got a good piece of  code that was verified at one point and that's  

44:28

that's it's not the result we sort of dream  of but it's it's a good result still right  

44:36

[Shpat] If people want to give F* a shot where  should they start? [Nikhil] F-starlang.org there's  

44:43

we have an online tutorial actually just pushed a  bunch of new chapters there last week so there's

44:53

a I think what's up there right now like if you  wanted to learn how to write purely functional  

44:59

programs in star and prove them correct I think  there's a bunch of material up there that can   help you get started I think doing things that are  we also have tutorials out there that sort of help  

45:09

you write say c like programs in F* and prove them  correct and get c code out of it in this subset  

45:20

of F* that we call low star but these things  this this I have to say it gets substantially  

45:25

harder like if you're writing pure F* programs  things are things can be kind of nicely forward  

45:32

relatively yeah they can be yeah straightforward  or maybe even if not straightforward it can be  

45:38

kind of pleasant and elegant you know that you  set it up right but as you try to do fancier  

45:43

things with you know efficient low-level code  and proofs about them things things are still   hard I would say sure [Shpat] That seems to be the  case with with most with most languages that  

45:53

are a little bit on the cutting edge like that  [Joey] So relating to having it be a little harder to  

46:00

prove the sort of low-level imperative programs  one of the things we find in our tools a lot is  

46:06

one of the hardest parts of tool  development is is what you do when   the person doing the proof is is stuck so you know  when we talk about tutorials it's kind of nice  

46:16

because you go through and there's the answer and  you know it's going to work where we spend most   of the time in our proofs is when it doesn't work  and we have to figure out what to do next is that  

46:26

something you put some work into in F* and making  that story reasonable for your users [Nikhil] Yeah that's  

46:33

a great question I wish I mean put some thought  into it but there's so much more to do and I mean  

46:45

I think that's kind of the also the sort of the  double-edged sword of having you know lots of  

46:50

SMT based automation you know you you're you're  trying to get this automated theory through fear  

46:55

improvement to solve undecidable problems and  it's not sometimes it's just not going to do it  

47:01

and and then you know how do you ex if a proof  fails how do you explain to user why it failed and  

47:11

how to nudge the user from where they are towards  the proof that that may succeed and I think we you  

47:19

know we this is a we should F* is not particularly  good at this I should be frank about it like we  

47:25

will try to get an error message out of the SMT  solver and tell you oh we fail to prove this  

47:31

particular property about you this this conjunct  in your formula that you tried to prove here   we were not able to prove it and that's it and  you know we don't and and sometimes if the SMT  

47:43

solver just times out it may not even tell you  why which part of the formula it was working on  

47:49

when it timed out and so sometimes we'll say well  here's this chunk of code you know we're trying  

47:54

to prove all these things in it but we could we  can't really zero in on like exactly which part   of it failed so I think this kind of diagnosis  of proof failures is it's it's a big thing  

48:05

that we that there's lots more that we should  be doing here that we got so so usually this  

48:12

this this often involves like you know you  once you gain some experience with the tool  

48:19

you kind of get a sense of like you know that the  solver is and and you know all the other things  

48:25

that starts doing this it's not just the SMT  solver it's also you know how does the unifier  

48:30

work and how you know when does when is it going  to be able to prove things by you know symbolic  

48:36

execution and you know things like this you get a  sense of like all these you have all these proof  

48:42

tools available to you and you kind of know which  work well in which situations and when approved   fails you say all right well you know I know I'm  trying to do a proof here about modular arithmetic  

48:52

and I know the solver isn't great at doing the SMT  solver isn't great at doing that so I'm going to   you know for this part of my proof I'm going to  resort to some other technique maybe you know

49:04

use a library of lemmas that we have around  that are going to tell you specific facts   about model arithmetic and do proofs that way  or maybe at that point you realize that hey  

49:16

I should just abstract away a lot of these  details like I'm not actually I don't really care  

49:21

to be reasoning about you know modular arithmetic  in integers I just care about I can just work  

49:26

in some much more abstract setting and my  proofs become simpler and then I can just   instantiate it to my the specific setting  I have in mind so so I think people learn

49:39

as in as in most tools I guess you learn  sort of your way around you know and how to  

49:49

yeah how to navigate complex proofs [Joey] Right and  the and the question I think is really how do we  

49:56

how do we get the the tools to provide  a little more of that rather than   having humans need to rely on on instinct so  much but that was definitely well that was  

50:07

definitely a tricky question because I think  in my opinion no one's doing this very well so  

50:12

you know I would have been really excited if  you were like yeah we know the answer and you   all can can copy it because I don't think we're  doing great at it I think you know people I guess  

50:20

are referring to some of these tools as sort  of the auto active verification tools and   I don't know if anybody in in that space has a  great answer to this question of like how do we  

50:29

how do we bridge the gap and like switch over  from switch over from the SMT solver failing to  

50:35

the proof the person doing the proof like getting  some useful information out of that and go back   and forth in a in a good way but it's definitely  a really exciting question I think that a lot of  

50:44

people are struggling with right now [Nikhil] Yeah I  think it's a really important question it's   also the sort of thing where the incentives  in the academic community are not unless  

50:53

or at least in the core PL community are maybe not  structured well you know that well for exploring  

50:58

these kinds of questions because there's  no hard there's no clear answer there's no  

51:03

theorem you can prove about this it's you know the  you need usability studies and things like this  

51:09

you try to extract as much information as you  can from the solver and present it to the user   in some useful way like this problem is super  underspecified right and you just have to  

51:20

I think here's where you kind of have to sit in a  very tight loop with with with your users and you  

51:26

know figure out if what what you're doing is is  helping or not but it's a it's I think yeah it's  

51:35

a really important space and it's kind of wide  open. [Joey] cool well I think we'll leave our listeners   with that as a as a question to ponder and to  you know to reach out when they when they have  

51:44

the tool that is that has solved this once and for  all but yeah yeah definitely the definitely how to  

51:50

get the incentives right as a challenge in this  in this space as well as the the problem itself.  

51:55

But this has been a this has been a wonderful  episode thanks so much for joining us Nik.   [Nikhil] Thank you, yeah thanks Joey, thanks Shpat.  I had a really good time too. [Shpat] Glad to hear.  

52:07

This was another episode of Building  Better Systems. Join us again next time.