Building Better Systems

Episode #21: Nikhil Swamy — Fully In Bed With Dependent Types

Episode Summary

Today we're joined by Nikhil Swamy, Senior Principal Researcher in the RiSE group at Microsoft Research. We are very excited to hear about what he's been working on. In particular, we're going discuss a language that he's co-created and continually develops called F* (pronounced F star). F* is a dependently typed language that you can both program and prove things about the programs that you write. We'll talk about what makes that language special and unique from other similar languages, as well as some of the applications of F*.

Episode Notes

Today we're joined by Nikhil Swamy, Senior Principal Researcher in the RiSE group at Microsoft Research. We are very excited to hear about what he's been working on. In particular, we're going discuss a language that he's co-created and continually develops called F* (pronounced F star). F* is a dependently typed language that you can both program and prove things about the programs that you write. We'll talk about what makes that language special and unique from other similar languages, as well as some of the applications of F*.

Watch all our episodes on the Building Better Systems YouTube channel

Nikhil Swamy: https://www.microsoft.com/en-us/research/people/nswamy/

F*: https://www.fstar-lang.org/

Joey Dodds: https://galois.com/team/joey-dodds/

Shpat Morina: https://galois.com/team/shpat-morina/

Galois, Inc.: https://galois.com/

Contact us: podcast@galois.com

Episode Transcription

[Shpat] Hello, and welcome to another episode of Building Better Systems My name is Shpat Morina. [Joey] And I'm Joey Dodds. [Shpat] Joey and I work at Galois. Galois is a computer science

0:11

R&D lab focused on hard problems, trust, cyber security and a lot of other fun stuff.

0:19

[Joey] Today we're joined by Nikhil Swamy who is Senior Principal Researcher in the RiSE group at Microsoft Research. This is kind of an exciting one for me because back when I was in grad school,

0:29

Nik came in and visited and it was really inspiring to see somebody doing industrial

0:34

research like Nik was. So very excited to be talking with him today and to hear a bit about

0:40

what he's been working on. In particular we're going to talk about a bit about his background

0:46

and then we're going to discuss a language that he's co-created and continually developed called

0:52

F* (pronounce F*). F* is a dependently typed language that you can use to both program and

0:58

prove things about the programs that you write. We'll talk about what makes that language special

1:04

and unique from other similar languages as well as some of the applications of F*. Let's dive in.

1:11

Designing, manufacturing, installing and maintaining the high-speed electronic computers — the largest and most complex computers ever built.

1:30

[Joey] Thanks for joining us Nik. [Nikhil] Yeah, it's great to be here, thanks for having me. [Joey] So before we dive into what you've what you're working on now, I'd love to hear a

1:39

little bit about your background and how you got interested in the field of programming languages.

1:44

[Nikhil] That's a good question. So my background broadly is that I grew up in India and I

1:52

came to the US as an undergrad and I went to a school called Hampshire College

2:01

in Massachusetts, which is kind of a liberal arts school — no grades, make your own major

2:06

kind of place. I's a lot of fun and this school is very kind of oriented towards,

2:16

you know, self-directed study in some sense and I found a couple of great professors there who are

2:23

interested in — who got me interested in computing and physics and math and things like

2:32

this. I worked with them on this — it was like in the late 90s — I was working with them on quantum

2:40

computing and applying sort of search algorithms to find better than classical quantum programs.

2:50

And then trying to prove that the programs that we actually found were correct and solved whatever

2:56

problem we were trying to solve like, you know, that if it's a circuit to solve something that it was actually solving it with some finding the right answer with some problem correct

3:05

probability this kind of stuff. So that's when I kind of really got into,

3:10

or understood, that "hey, it's actually interesting to try to prove stuff about programs"

3:16

and I learned a bunch at that time about, you know, automated theorem improving techniques and

3:22

well actually I thought I learned a bunch but actually I learned surprisingly little

3:28

and you know then learned that "wow, I actually hadn't learned, like there's so much more" and you know I went to grad school at Maryland and thinking about like well you know what initially

3:40

thinking about like "hey, I might not try to do some things like in the AI space but I then

3:50

kind of had a you know some sort of moment Iguess in grad school as that you do sometimes where

3:56

I realize that like "hey, you know being able to prove properties about programs is

4:02

a really was a really like powerful idea for me" and I worked with Mike Hicks my advisor

4:09

at the University of Maryland on a programming language called Cyclone. Cyclone is like a

4:18

a C-like programming language with a type system geared towards proving memory safety and with

4:29

ideas in Cyclone that you know were maybe somewhat a bit ahead of its time in some sense things like

4:35

you know linear types and you know being able to run being able to use sort of custom memory

4:44

management strategies everything from unique pointers to reference counting to you know even

4:50

in some cases conservative GC. Lots of ideas from Cyclone sort of in region based memory management and so on. I think you know now maybe find a home in languages like Rust and I learned a lot at that

5:04

time from Mike and the other people working on Cyclone about you know about programming

5:11

languages and type systems and being able to prove things about programs before you run them

5:18

and yeah that's how I got into this stuff. And then for my thesis I was more you know building on

5:26

what I'd learned about merton Cyclone. I started to explore some ideas in independent types and see

5:34

how you can add just a little bit of dependent types to a programming language and get it to

5:40

prove properties that you prove. I was at the time interested in security properties as to

5:47

how can you get away with adding just a little bit of dependent types and be able to prove I don't know things like the correctness of access control enforcement or

5:57

information flow control policies, things like this. [Joey] If we don't have anyone this familiar, could you give us a super quick glimpse because

6:04

I think it's going to come up again in this conversation: what is a dependent type? Right.

6:14

[Nikhil] All right it's kind of a profound question in some ways but

6:20

in a simple sense it's a way to kind of encode a logic in the type system for programming language

6:31

and the word dependent maybe you know maybe it's a little bit scary but it's actually a really simple thing. It's like you can have a function who’s you know, usually a function if you give it a type

6:42

signature it's a function that takes an integer and returns an integer. Let's say that's a common, it's like a simple type for a function but you can have functions that that are, you know,

6:52

whose return type can depend on the argument and it can do things like if you call it with

7:00

the integer 17 then it returns, I don't know a vector whose length is 17.

7:09

Or you could even be things like, "hey, you know you call it with a boolean

7:17

and if you call it with true, it returns a string and if you call it with false it returns

7:23

I don't know, an integer, let's say. So that's why the word has heading dependent comes from but this

7:31

mechanism which you can kind of think of as like the way the examples I gave you are very kind of

7:38

programming centric you know but this basic mechanism allows you in

7:45

a very kind of foundational way to encode within the type system

7:51

very powerful logics that let you build all of mathematics in some sense inside this logic which

8:01

doubles as a programming language. Now dependent types of because it is very powerful thing they

8:07

you know you don't see them that often in mainstream programming languages — you're beginning to — but they you know the kind of languages that me and my colleagues work

8:17

on you know there's I'm sure we'll get into more of them but you know they're fairly I don't know

8:26

they're pretty fancy, I mean to put it bluntly. So I was trying in my thesis work to sort of see

8:32

well you know I don't want full dependent types what can I you know can I just inject a tiny bit of dependent types into my language and you know how much mileage can I extract from that? Like

8:43

you know, what kinds of things can I prove if I get you know dependent types light

8:48

and then you know then I guess it's 10 - 15 years later.

8:57

I'm no longer doing dependent types light it's kind of you know the

9:05

heavy dependent types, yeah. The entry, the gateway drug, or something you know. You start doing it a little bit and then you know after a while you're like you're fully in

9:13

bed with dependent types and it's all you know [Shpat] Fully in bed with dependent types. I think we

9:20

got our title for the podcast. [Joey] Cool, and so now you're at Microsoft Research and have been for

9:27

a while and you've been working on increasingly more dependent types in a language called F*.

9:35

[Nikhil] That's right. Yeah, yeah. I've been at Microsoft Research since about 2008 and I worked there at a group called RiSE research and software engineering and

9:47

I work on F* but broadly on you know programming language techniques,

9:56

programming language and program verification techniques to prove

10:02

programs correct, to prove them secure, to prove you know things like

10:07

resource bonds and about them things like this proving things about programs through program

10:14

language design yeah through programming language design where you know in these new programming

10:19

languages to sort of the languages facilitate writing programs and their proofs together

10:30

to you know make it possible and maybe easier to actually write programs that are correct.

10:36

A lot of this work from where you started to the working Microsoft sounds like part of it has maybe

10:42

culminated, as the wrong word, but it has led to this language that you're working on which is F*.

10:50

I wonder if it would be a good place to start — for you to tell us a little bit about it. Yeah, sure. So F* is this programming language it looks like superficially it looks like a you know

11:02

many other functional programming languages things like languages in the in the ML family,

11:07

things like OCaml or F-sharp or even Haskell to a certain extent,

11:14

Ao which is to say that it's a mostly functional, statically typed language

11:26

where you know what's maybe unique about it is that it

11:31

has you know I'd say like three maybe unique features. the first is that it's dependently typed

11:40

meaning that the type system is more powerful than these other languages that I've mentioned. So far things like OCaml or F sharp and so on it has a type system that is you know it's got full

11:53

dependent types similar to other languages that you may have heard, of things like

11:58

like Lean or coq or agda or you know other differently type languages but

12:05

it is different from the the dependent types that you see in these other languages in that

12:10

F* is based on an extensional type theory which is slightly technical, we can talk about that

12:16

at some point if you want, different from the type theories that are implemented by

12:22

these other languages that I mentioned. I think this extensionality thing is kind of important because it enables you know making F* sort of a programmer first kind of language rather than a

12:34

language in which you say maybe you know some of the other other languages they're kind of set up as you know proof assistance in which you can do in sort of you know mathematics

12:46

at a very deep abstract level F* is more kind of programmer-centric where you're not you wouldn't

12:55

necessarily fire up an F* session to I don't know formalize category theory or something. I mean you

13:01

can if you want but that's not what people use it for it's more like you know you use it to write,

13:11

for building programs that you want to execute and this in doing that like if as you're writing

13:17

programs and you're writing sort of software it it turns out you know as probably you're both

13:23

aware like when you're writing a piece of software you're not there typically and you want to prove

13:28

it correct there's maybe a handful of interesting theorems that you want to prove but the bulk

13:35

of the things that you're trying to prove are really, really tedious low level you know tons of,

13:41

you're mired in details, hundreds thousands of small proofs that help towards maybe you have some

13:48

interesting insight about a big invariant and like you know there's there's a handful of big proofs.

13:54

So one thing that F* really aims to do is to try to reduce the overhead of the human overhead of

14:01

having to write these thousands of small proofs all over the place and the way that it does this is by using and this is kind of the one of the unique things about F* is by using an SMT solver.

14:11

An assembly solver is an automated theorem prover that if you know you feed it a problem in some logic and it's usually some you know things like first order logic plus some

14:20

reasoning capabilities about arithmetic things like this you feed it to the SMT solver and the SMT solver can try to find , decide whether a particular formula is valid or not.

14:30

So what F* does is that it analyzes your program in this, its dependently type logic produces

14:37

these you know hundreds a large number of proof obligations that pop up when you're trying to

14:44

verify even you know 10 lines of code and feeds that to this SMT solver which tries to do the

14:50

proof for you. Now I mentioned at the beginning that extensionality was really important here and

14:55

one of the reasons is that the particular flavor of type theory that F* uses is designed in a way

15:01

to enable this kind of you know dispatching proofs to an external theorem solver that can you know

15:11

implicitly prove many properties about your program while you're writing them. So that's kind of so dependent types and and SMT solving that's kind of two

15:19

important things about F*. The third thing is that you know when I want, me and others,

15:26

want to write software in F* that is efficient and you know can be low-level software that's doing

15:35

I don't know. Things like efficient network communication or cryptography or you know

15:40

things like this. So this kind of software is in many ways inherently effectful or imperative in

15:48

a sense you know although I said at the beginning that F* sort of a mostly functional language the

15:54

way in which we use F* is by encoding within the functional language other sub languages that allow

16:02

you to reason about computational effects. You know things that your mutable state or

16:09

you know raising exceptions or doing IO Things like this. You can reason about that,

16:17

you can write programs that do these things in F* and reason about them within F* and

16:23

so F* has this effect system and that's the I think this is the other unique thing about F* that it's dependent types SMT solving and effects and that package together is, we use it to

16:40

to build a variety of things in the kind of domains that I've mentioned.

16:47

You know like a focus for us recently has been low-level

16:53

secure communication software that's ranging from things like cryptographic protocols to

17:04

say a network packet parsing things that we

17:12

we implement in F* proof correct and then execute as you know you can get out of F* you can get C

17:20

code or assembly code or whatever depending on which of these fragments or style you program in

17:26

you get out this imperative low-level software that you can then deploy and run.

17:32

So it's in somewhat active use for real things, it sounds like.

17:40

My question, if that's somewhat correct. Well it's a it you know I'd say there's there's a small community of active users who are you know

17:56

I mean my colleagues at MSR, we use — several of us use F* to do things with it like you know,

18:03

what something that we've done recently is how there's an F* a sub-language for specifically

18:13

for writing low-level parsers correctly parsers of things like network message formats

18:20

so we have a verified parser generator that we've implemented inside F* and we use this,

18:27

Is this everparse? This is called everparse that's right yeah. So you know we use this to to build

18:34

the network packet formats used in various parts of the windows kernel and we extract you know high

18:44

performance C code for parsing those packets and for the last maybe 18 months or so we have

18:55

verified parcels produced by F* you know running inside hyper-v, inside the windows kernel

19:00

in several releases of windows so far. So there are pockets of such you know people who

19:08

even outside MSR there's some people who use F* to deploy verified cryptography in

19:14

the Linux kernel or in Firefox you know.

19:22

[Shpat] I have a couple questions about that which is that's awesome. You,

19:27

at a higher level than F*, have so first of all you know when we're talking about dependent types you said they don't end up in programming languages that are

19:38

more generic or more used because they're a little complicated, it's my simplified version of what you said. But it sounds like you've made this language suitable for

19:49

programming you know in the large and yet you know designed sort of dependent type theory into it.

20:01

How did you make that happen and is it the case that this is more approachable?

20:08

[Nikhil] I mean I think there's still a long ways to go I'd like to think that at least some of what we do in F* makes dependent types more approachable to certain kinds of people.

20:21

You know it's still like I said a very small community of people who are using this stuff but I think I'm optimistic. I think we're you know we're on a trajectory where as we see more successes

20:36

across the the several differently type languages that we have out there today and as you see you

20:42

know students coming out of universities around the world like the gaining more experience with

20:49

this kind of stuff where you know you look at if I even just from a personal experience like if I

20:58

think about the interns that I had when I joined MSR in 2008 you know super smart students

21:09

but you know at the time programming like this in a differently typed language was like the bleeding edge like not everybody had experience with it and but today the interns that you get

21:19

like everyone has done many of them have done you know some part of software foundations or they've

21:26

you know programmed in coq or something and it's it doesn't feel alien to to people. Many people

21:34

anymore it's just like okay I just I can just hack my way through this proof no problem and so

21:39

I think the the community of people who have a you know a certain kind of baseline skill to approach

21:50

these things that were previously quite esoteric

21:57

is you know that community is growing and growing I think quite fast. So I hope that you know in a

22:05

few years like languages like F* or Lean or you know so many others will no longer be as esoteric.

22:18

[Shpat] That's a great word for it. [Joey] Well it's a really important perspective because sometimes when we

22:24

look at these things today it's still a small community right. By most metrics of programming

22:31

language communities there is kind of hope because yeah if you look 10 years ago dependent type was

22:38

like yeah it kind of felt new and now it's yeah now you have a talent pool to draw from

22:44

and there's no reason that growth that growth should slow down so that is a really important perspective to say like, "yeah, it's not it's not huge yet, but we've made progress." Things like

22:54

F* and coq and I think lean is bringing a whole new group of people sort of into the independent

23:04

type sort of world and yeah it's really exciting to see that growth and definitely important to

23:09

acknowledge all the progress that's been made in the last 10 - 15 years right. [Nikhil] Yeah, yeah I think

23:14

maybe the answer to me one way in which I see sort of more growth happening is you know and maybe

23:24

as a functional programmer this is not that surprising but I really see like the you know

23:32

domain-specific languages is a big part of the answer that you know you have these frameworks that are super powerful you can formalize all of mathematics in them if you want but most people

23:40

don't want to formalize all of mathematics they want to do some very specific thing

23:46

you know like I don't know writing in our for instance that taking everparse again was

23:52

you know I'm not going to unleash the full power of independent types on for someone who just wants to write a packet parser you know I'll design some framework

24:03

within my you know some language within my very powerful framework that's tailored to this very specific usage where you kind of that sort of rains in the complexity and makes the

24:19

allows a person to get value from it without having to understand all of it. [Joey] I'm laughing a little because you sort of mentioned that and I think I think Leo had this experience where maybe

24:27

he thought the same but then the entire math world sort of showed up at his door and said we want to formalize all the mathematics with dependent types. [Nikhil] That's true, that's true but

24:35

yeah fair enough. [Joey] Yes but yeah, certainly not everybody wants to do that and the uses

24:42

seem to be fairly different so yeah I mean I guess on an unrelated question you know since

24:48

since we're mentioning lean and I think maybe this gets back to the extensional intentional

24:53

kind of thing but you know we did talk to Leo a while back and one of the things I've sort of

24:59

been wondering about since we talked with him is like Leo obviously understands SMT solvers very

25:05

well and knows what they can do from the sounds of it he started out trying to write something

25:11

that could make use of SMT solvers and they could let you get more work done before the SMT solvers do their thing but he ended up in a world where like lean doesn't speak to SMT solvers very well

25:22

or at all at the moment and is that sort of the extensional intentional distinction there or did

25:30

some did something else happen in your opinion to sort of cause them to to move away from that? [Nikhil] I

25:35

think you could make this SMT thing work inside an intentional type theory in certain ways.

25:45

I think within lean too it could I think it could well happen I

25:52

talked to Leo a lot you know he's talking a ton about both SMT solvers and type theories

26:01

he was saying the same he was saying the same about you

26:07

he's being kind it's definitely the other way but

26:14

yeah I think I mean if I was to guess I mean I'd say you know some we may still

26:21

someday have a SMT solver inside lean I think it's just that lean has really taken

26:28

off in this math community where the priorities have been to you know to serve the needs of you

26:36

know to meet and serve the needs of that community and maybe SMT's offers are not

26:43

have not made it to the top of the priority list yet but they may I expect someday they will

26:50

but the kinds of things that you know and and other and people have built SMT

26:56

assisted proof automation in in in other frameworks from you know there are some in is

27:04

notably it's not dependently type theory dependent type theory but you know SMT solvers work very

27:10

well there but also there's some some some work in cock doing things like this but the thing in F*

27:19

that makes it different than say lean or coq or other defending type languages is that this

27:27

you can use an SMT you know proofs done by an SMT solver or proof's done

27:33

they can be done by an essential solver but they could be done by any other means to buy a tactic or whatever you can use them to implicitly convert types everywhere

27:41

that's the main thing like you know you can you know if I have a you know a vector of

27:49

whose length is n plus n I can implicitly treat it as a vector of length m plus n because of course m

27:56

plus n is equal to n plus m and you know an assembly solver will tell me that instantly

28:03

but in in an intentional type theory if you want to do this simple manipulation of converting these two types from one to the other

28:12

there's a lot of well Joey you're smiling because you've probably had to confront this kind of thing

28:19

before because it's kind of a mess I mean to do these these what seem like obvious conversions

28:25

is a bunch of bureaucracy in an intentional type theory you have to explicitly have these equality

28:31

witnessing conversions and things like this and and these things just go away in an extensional

28:37

type theory and maybe makes it more immutable for me to just call an SMT solver at this point

28:42

to just decide if n plus m equals m plus n and just be done with it without having to

28:50

you know mangle my program [Joey] Gotcha so it's not necessarily impossible in this

28:55

intentional setting but it's quite a bit easier I guess yeah in the extensional setting to just

29:01

take what the take what the solver says and kind of go with it and yeah not have to do quite so much bookkeeping around you know why is the why is what the solver says correct how do

29:11

I propagate it through my terms and things like that yeah that makes a lot of sense

29:17

one of the [Shpat] One of the great things about being a podcast host is that

29:23

when I am out of my depth or don't understand something I can move us on so here it is

29:28

me moving us on turned on topic. [Nikhil] Thank you. [Shpat] But jokes and jokes aside pseudo jokes aside the

29:40

I want to see if so there's probably a lot of people listening who do work on their own languages or they're doing research and trying a few new things with maybe toy languages that one

29:51

might they one day might become I get a larger user base I'm wondering if we can talk about

29:59

how you think especially since if F* is getting some use about how you think about

30:04

how do you evaluate language design from the lens of is it useful and will people use it

30:12

I wonder if you have thoughts there to share. [Nikhil] Yeah it's a good question I mean I have some thoughts I

30:19

don't my thoughts aren't necessarily you know I don't have any sort of deep worlds of wisdom here

30:26

it's but I think one way one thing I can say is that you know somehow

30:38

co-evolving your language design with a problem that you really want to solve

30:45

is really important so that you're not just you know yeah you're not just building

30:52

something that you can't keep evaluating you need to be able to evaluate what you're doing

30:58

like every day you know is this actually helping somebody solve some problem you know that often

31:04

means pairing up with some domain expert because sometimes you know as a as a language designer

31:09

you're you know you can sort of for many of us I think we're just the aesthetics of some of of

31:16

you know lambda calculators and all this can can be quite alluring and you just kind of go off and

31:22

design something because like it's beautiful or something and that's fine and it's fun

31:28

but I think finding a way to sort of pair up with a domain expert from from some other

31:33

domain who's really trying to solve some problem whether it's you know whether it's doing math

31:40

or whether you know that for instance in the lean community that's been that's been a a big

31:46

driver I think for a star that driver has been a a lot about you know has been cryptography

31:54

and you know people who are interested in all sort of implementers of cryptography not

32:00

you know not necessarily cryptographers as such but you people who want to build efficient implementations of crypto protocols or or crypto primitives and so I think finding a way

32:13

to you know and that that sort of cycle has been I think really helpful for F* you know there's

32:19

we proposed some you know some new way of doing proofs and immediately you know our friends at

32:26

Inria or CMU will try it and tell us you know if it makes sense or not and we'll iterate

32:37

so I think you know and this is not a by no means is it like a huge user communities maybe we're

32:42

talking you know this kind of the the close circle of people who repeatedly give us feedback on

32:50

on new language features is something like you know maybe 50 people nothing nothing crazy right but but that even that is enough I think to to to keep you honest in your language. [Joey] When

33:04

we interviewed Alastair Reid he had this well I guess he had a paper called it but I like the idea as well as kind of the idea of meeting developers where they are and that's really I think that's

33:15

kind of what DSLs are all about to some extent is like we have this language it can express anything in the world you can write any program in the world with it and say anything you can

33:24

express in a logic about it right yeah and that's great and that you know that'll get you a puff of paper maybe an ICF paper but that's not enough to of course get somebody get somebody using it

33:36

and and the idea they were all about is like we need to minimize the time between when someone

33:41

starts using this thing and when they see some some sort of good results basically and so yeah

33:47

DSLs are a way of bringing that time way down and minimizing the things they need to learn I guess

33:54

have you done areas other than other than cryptography where you've seen success is it possible to layer up different sort of DSLs on top of F*? [Nikhil] Yeah so you know one of the things

34:03

that we're looking at or the sort of systems programming but systems programming that you with with concurrency and that you can sort of prove correct in in a separation logic

34:16

which is a way to write imperative programs in concurrent programs in netstar and reason about

34:21

them in this in this separate in this thing called the separation logic separation logic is it's an active area of research in the PL community in the program verification community and

34:32

it's a way to sort of structure proofs that that are that's more modular if I were to just put it in a in a nutshell and our

34:44

things that we you know are from a purely sort of language design perspective we we started

34:50

to look into this seriously maybe you know after our experience trying to do proofs in

35:00

for crypto protocols and so on in a framework that didn't have separation logic and we were able to get things done and we have a large code base that's done without

35:09

separation logic and all these these this code that's deployed in various places doesn't use separation logic but we notice that our proofs are way harder and hard to maintain because

35:20

they lack this kind of modularity so we've been looking at you know so motivated by that sort of

35:26

you know empirical observation that we have this code base it's kind of many hard-fought victories

35:33

how can we make it a little simpler maybe by enhancing the logic in which we work and so so

35:42

that's something we've been looking at recently and and using it to do things like

35:49

you know it's it's building on on what we've done with crypto and also parsers and so on but it's

35:56

trying to apply some of these ideas in what we have a project where we're trying to use this to

36:01

do things like high assurance monitoring of cloud services like can can you write a can you write a

36:10

monitor for a service that's going to observe the interactions that a client has with that service

36:15

and you know in and provide some guarantees about that interaction cryptographic guarantees about

36:21

that interaction where for instance if the if we notice that the server is responding

36:28

in a way that's incompatible with the trace of messages that have that the server has seen so far

36:34

then we except for some small probability of you breaking a cryptographic hash or something

36:41

our monitor will will notice this and and and signal an error you know signal that you know

36:47

the service has deviated from expected behavior so things like this so it's so still you know

36:54

there's still some cryptographic components to it but moving beyond just crypto primitives or protocols [Joey] Yeah that's I guess that's one of my that's sort of one of my favorite uses of

37:03

of formal methods maybe almost ever I guess is like the state machine work you all did with TLS where you did about the same thing you made a verified implementation and maybe it wasn't

37:14

quite high performance enough or there were some other demands that it wasn't quite meeting to to run in production but what you all were able to do is

37:22

know exactly what a perfect TLS conversation looks like because you made a verified implementation and then just have a bunch of TLS conversations and whenever the whenever the other side messed up

37:31

you said ah they messed up and you you all found like a lot of serious TLS implementation bugs

37:38

in systems that are running in production which was I think a really cool result and a really way to really great way to make a lot of formal verification sort of directly useful to production

37:49

systems basically even if people weren't kind of ready to to use the the verified thing yet . [Nikhil] Yeah that that's really cool work it's I wish I could take credit for it it's not actually mine

37:58

it's worked by my close colleagues and friends at MSR, Cambridge and Indriya in Paris it's Antoine,

38:10

Karthik Bhargava, and Cedric Fornay and others but yeah that's really cool stuff. [Joey] They did do the

38:18

verification in F* at least to some extent right so you can get second-degree credit.{Nikhil] I guess sure

38:26

[Joey] But this sounds like this a similar sort of application where yeah you know but it is as

38:33

nice as it is to deploy verified code directly the existence of verified code allows you to do

38:39

a lot of different things even even if people aren't ready to ready to run it right away [Nikhil] Right yeah totally yeah I although you know so yes I second I totally I buy what you're saying Joey

38:51

but I'm kind of also you know I think you know we can get to a point where you know verified code is

39:00

you know it is sort of has no excuses like it's correct it's as fast it's you know maybe you know

39:11

there should be no reason for you to to not deploy this in a production setting I think we're getting there in several sort of I mean I think the main the main barrier is it you know

39:25

is the component that you're trying to verify is it high value enough like in terms of is a flaw in that component you know severe enough that that it warrants

39:38

the investment of so much human expertise into producing this verified component producing and

39:44

then maintaining it and so there are some places where this makes sense like you know if there's a I don't know a bug in your deep in your in the virtualization stack that's isolating

39:57

cloud tenants from each other then you know that's that's that's a bug that could

40:04

cost billions and you know maybe in those situations it's worth you know

40:11

strategically replacing some key component of that virtualized virtualization stack with a verified

40:17

component and the same goes for cryptography and you know so there are these these I think these

40:24

sort of select few areas where where they where the economics is beginning to make sense

40:31

[Joey] The other the other side of economics I guess that the people take into account here is

40:37

if their team isn't the one that developed the code but they need to maintain it they need to they need a decent story there

40:45

and one of the arguments I guess is you're less likely to need to maintain it because we're more likely to have gotten it right but sometimes that's not not sufficientI'm curious you know

40:54

you all got those parsers deployed apparently teams you know like likely teams other than your

41:00

teams and Microsoft is now sort of responsible for those in some sense and I'm curious how that's

41:07

I'm curious how that's gone and and what you know what was convincing in terms of you should use

41:12

this code you're gonna be able to maintain it or will be able to help you maintain it yeah I don't know how those conversations go. [Nikhil] Right yeah I think the maintenance thing is a is a big one like and I

41:27

mean I think this this argument that it's verified so the maintenance cost is lower

41:33

it's a bit facetious I don't really buy it like this you know things change sure the code is verified but you're going to find that you know the your spec will change because the needs of

41:43

this component are going to change and and then yeah so but so maintenance is is a is a big issue

41:51

I think in the specific case of the one that you mentioned that was deploying these parses you know

41:56

the answer again for us was like I mean it's the DSL thing again you know we're not going to

42:03

if it's just an arbitrary piece of code that's going to be you know someone has to maintain then this is if you're not an F* expert you know then that's that or you don't have an F* expert

42:15

on your team that's really that's a really hard sell but if what you have is instead

42:24

you your your writing you know you write a format description in some c like syntax and the

42:32

proofs happen automatically behind the scenes then the maintenance story is is much easier the the

42:39

verification tool is just a tool in as as part of your your your compiler workflow you're not

42:45

interacting with it directly and so so that's a bit of an easier sell people who don't have

42:52

to be you know have star experts to to maintain and evolve this code we still work closely with

43:02

the teams that use our code and you know if the code has to be evolved we're we're usually part

43:09

of the code review to make sure that it's being done in a reasonable way things like this the

43:15

other context I think is is also you know in cases where where the component you're delivering has

43:22

a very clear spec like you're you want to deliver an implementation of sha or something you know and

43:30

there my colleagues who who do this who have done this in many in many situations often deliver

43:37

c code that comes out of the tool chain and the c code is designed to be you know we take a lot of

43:44

pains to make it look pretty and human readable and so on my colleague Jonathan Proczenko is

43:55

that he's he's responsible for a lot of this like building a tool that emits clean looking

44:01

c code from F* yeah that's maybe another way you deliver the artifact and maybe if things are stable enough like you don't expect the specter change of this kind of primitive then

44:13

the maintenance question becomes easier as well [Joey] Right but if yeah if you deliver reasonable

44:19

looking c code the worst case for that team is that you all disappear no one knows how to maintain it they've still got a good piece of code that was verified at one point and that's

44:28

that's it's not the result we sort of dream of but it's it's a good result still right

44:36

[Shpat] If people want to give F* a shot where should they start? [Nikhil] F-starlang.org there's

44:43

we have an online tutorial actually just pushed a bunch of new chapters there last week so there's

44:53

a I think what's up there right now like if you wanted to learn how to write purely functional

44:59

programs in star and prove them correct I think there's a bunch of material up there that can help you get started I think doing things that are we also have tutorials out there that sort of help

45:09

you write say c like programs in F* and prove them correct and get c code out of it in this subset

45:20

of F* that we call low star but these things this this I have to say it gets substantially

45:25

harder like if you're writing pure F* programs things are things can be kind of nicely forward

45:32

relatively yeah they can be yeah straightforward or maybe even if not straightforward it can be

45:38

kind of pleasant and elegant you know that you set it up right but as you try to do fancier

45:43

things with you know efficient low-level code and proofs about them things things are still hard I would say sure [Shpat] That seems to be the case with with most with most languages that

45:53

are a little bit on the cutting edge like that [Joey] So relating to having it be a little harder to

46:00

prove the sort of low-level imperative programs one of the things we find in our tools a lot is

46:06

one of the hardest parts of tool development is is what you do when the person doing the proof is is stuck so you know when we talk about tutorials it's kind of nice

46:16

because you go through and there's the answer and you know it's going to work where we spend most of the time in our proofs is when it doesn't work and we have to figure out what to do next is that

46:26

something you put some work into in F* and making that story reasonable for your users [Nikhil] Yeah that's

46:33

a great question I wish I mean put some thought into it but there's so much more to do and I mean

46:45

I think that's kind of the also the sort of the double-edged sword of having you know lots of

46:50

SMT based automation you know you you're you're trying to get this automated theory through fear

46:55

improvement to solve undecidable problems and it's not sometimes it's just not going to do it

47:01

and and then you know how do you ex if a proof fails how do you explain to user why it failed and

47:11

how to nudge the user from where they are towards the proof that that may succeed and I think we you

47:19

know we this is a we should F* is not particularly good at this I should be frank about it like we

47:25

will try to get an error message out of the SMT solver and tell you oh we fail to prove this

47:31

particular property about you this this conjunct in your formula that you tried to prove here we were not able to prove it and that's it and you know we don't and and sometimes if the SMT

47:43

solver just times out it may not even tell you why which part of the formula it was working on

47:49

when it timed out and so sometimes we'll say well here's this chunk of code you know we're trying

47:54

to prove all these things in it but we could we can't really zero in on like exactly which part of it failed so I think this kind of diagnosis of proof failures is it's it's a big thing

48:05

that we that there's lots more that we should be doing here that we got so so usually this

48:12

this this often involves like you know you once you gain some experience with the tool

48:19

you kind of get a sense of like you know that the solver is and and you know all the other things

48:25

that starts doing this it's not just the SMT solver it's also you know how does the unifier

48:30

work and how you know when does when is it going to be able to prove things by you know symbolic

48:36

execution and you know things like this you get a sense of like all these you have all these proof

48:42

tools available to you and you kind of know which work well in which situations and when approved fails you say all right well you know I know I'm trying to do a proof here about modular arithmetic

48:52

and I know the solver isn't great at doing the SMT solver isn't great at doing that so I'm going to you know for this part of my proof I'm going to resort to some other technique maybe you know

49:04

use a library of lemmas that we have around that are going to tell you specific facts about model arithmetic and do proofs that way or maybe at that point you realize that hey

49:16

I should just abstract away a lot of these details like I'm not actually I don't really care

49:21

to be reasoning about you know modular arithmetic in integers I just care about I can just work

49:26

in some much more abstract setting and my proofs become simpler and then I can just instantiate it to my the specific setting I have in mind so so I think people learn

49:39

as in as in most tools I guess you learn sort of your way around you know and how to

49:49

yeah how to navigate complex proofs [Joey] Right and the and the question I think is really how do we

49:56

how do we get the the tools to provide a little more of that rather than having humans need to rely on on instinct so much but that was definitely well that was

50:07

definitely a tricky question because I think in my opinion no one's doing this very well so

50:12

you know I would have been really excited if you were like yeah we know the answer and you all can can copy it because I don't think we're doing great at it I think you know people I guess

50:20

are referring to some of these tools as sort of the auto active verification tools and I don't know if anybody in in that space has a great answer to this question of like how do we

50:29

how do we bridge the gap and like switch over from switch over from the SMT solver failing to

50:35

the proof the person doing the proof like getting some useful information out of that and go back and forth in a in a good way but it's definitely a really exciting question I think that a lot of

50:44

people are struggling with right now [Nikhil] Yeah I think it's a really important question it's also the sort of thing where the incentives in the academic community are not unless

50:53

or at least in the core PL community are maybe not structured well you know that well for exploring

50:58

these kinds of questions because there's no hard there's no clear answer there's no

51:03

theorem you can prove about this it's you know the you need usability studies and things like this

51:09

you try to extract as much information as you can from the solver and present it to the user in some useful way like this problem is super underspecified right and you just have to

51:20

I think here's where you kind of have to sit in a very tight loop with with with your users and you

51:26

know figure out if what what you're doing is is helping or not but it's a it's I think yeah it's

51:35

a really important space and it's kind of wide open. [Joey] cool well I think we'll leave our listeners with that as a as a question to ponder and to you know to reach out when they when they have

51:44

the tool that is that has solved this once and for all but yeah yeah definitely the definitely how to

51:50

get the incentives right as a challenge in this in this space as well as the the problem itself.

51:55

But this has been a this has been a wonderful episode thanks so much for joining us Nik. [Nikhil] Thank you, yeah thanks Joey, thanks Shpat. I had a really good time too. [Shpat] Glad to hear.

52:07

This was another episode of Building Better Systems. Join us again next time.