Building Better Systems

Episode #21: Nikhil Swamy — Fully In Bed With Dependent Types

Episode Summary

Today we're joined by Nikhil Swamy, Senior Principal Researcher in the RiSE group at Microsoft Research. We are very excited to hear about what he's been working on. In particular, we're going discuss a language that he's co-created and continually develops called F* (pronounced F star). F* is a dependently typed language that you can both program and prove things about the programs that you write. We'll talk about what makes that language special and unique from other similar languages, as well as some of the applications of F*.

Episode Notes

Today we're joined by Nikhil Swamy, Senior Principal Researcher in the RiSE group at Microsoft Research. We are very excited to hear about what he's been working on. In particular, we're going discuss a language that he's co-created and continually develops called F* (pronounced F star). F* is a dependently typed language that you can both program and prove things about the programs that you write. We'll talk about what makes that language special and unique from other similar languages, as well as some of the applications of F*.


Watch all our episodes on the Building Better Systems YouTube channel

Nikhil Swamy:


Joey Dodds:

Shpat Morina: 

Galois, Inc.: 

Contact us:



Episode Transcription

[Shpat] Hello, and welcome to another  episode of Building Better Systems   My name is Shpat Morina. [Joey] And I'm Joey Dodds. [Shpat] Joey  and I work at Galois. Galois is a computer science  


R&D lab focused on hard problems, trust,  cyber security and a lot of other fun stuff.  


[Joey] Today we're joined by Nikhil Swamy who is  Senior Principal Researcher in the RiSE group   at Microsoft Research. This is kind of an exciting  one for me because back when I was in grad school,  


Nik came in and visited and it was really  inspiring to see somebody doing industrial  


research like Nik was. So very excited to be  talking with him today and to hear a bit about  


what he's been working on. In particular we're  going to talk about a bit about his background  


and then we're going to discuss a language that  he's co-created and continually developed called  


F* (pronounce F*). F* is a dependently typed  language that you can use to both program and  


prove things about the programs that you write.  We'll talk about what makes that language special  


and unique from other similar languages as well  as some of the applications of F*. Let's dive in.


Designing, manufacturing, installing and   maintaining the high-speed electronic computers — the largest and most complex computers ever built.


[Joey] Thanks for joining us Nik. [Nikhil] Yeah, it's  great to be here, thanks for having me.  [Joey] So before we dive into what you've what  you're working on now, I'd love to hear a  


little bit about your background and how you got  interested in the field of programming languages.  


[Nikhil] That's a good question. So my background  broadly is that I grew up in India and I 


came to the US as an undergrad and I went to a school called Hampshire College  


in Massachusetts, which is kind of a liberal  arts school — no grades, make your own major  


kind of place. I's a lot of fun and this  school is very kind of oriented towards,  


you know, self-directed study in some sense and I  found a couple of great professors there who are  


interested in — who got me interested in computing and physics and math and things like  


this. I worked with them on this — it was like in  the late 90s — I was working with them on quantum  


computing and applying sort of search algorithms  to find better than classical quantum programs.


And then trying to prove that the programs that  we actually found were correct and solved whatever  


problem we were trying to solve like, you know,  that if it's a circuit to solve something that it   was actually solving it with some finding  the right answer with some problem correct  


probability this kind of stuff. So  that's when I kind of really got into,


or understood, that "hey, it's actually  interesting to try to prove stuff about programs"  


and I learned a bunch at that time about, you  know, automated theorem improving techniques and  


well actually I thought I learned a bunch  but actually I learned surprisingly little  


and you know then learned that "wow, I actually  hadn't learned, like there's so much more" and   you know I went to grad school at Maryland and  thinking about like well you know what initially  


thinking about like "hey, I might not try to  do some things like in the AI space but I then


kind of had a you know some sort of moment Iguess  in grad school as that you do sometimes where  


I realize that like "hey, you know being  able to prove properties about programs is  


a really was a really like powerful idea for  me" and I worked with Mike Hicks my advisor  


at the University of Maryland on a programming  language called Cyclone. Cyclone is like a  


a C-like programming language with a type system  geared towards proving memory safety and with  


ideas in Cyclone that you know were maybe somewhat  a bit ahead of its time in some sense things like  


you know linear types and you know being able  to run being able to use sort of custom memory  


management strategies everything from unique  pointers to reference counting to you know even  


in some cases conservative GC. Lots of ideas from  Cyclone sort of in region based memory management   and so on. I think you know now maybe find a home  in languages like Rust and I learned a lot at that  


time from Mike and the other people working  on Cyclone about you know about programming  


languages and type systems and being able to  prove things about programs before you run them 


and yeah that's how I got into this stuff. And  then for my thesis I was more you know building on  


what I'd learned about merton Cyclone. I started  to explore some ideas in independent types and see  


how you can add just a little bit of dependent  types to a programming language and get it to  


prove properties that you prove. I was at the  time interested in security properties as to


how can you get away with adding  just a little bit of dependent types   and be able to prove I don't know things like  the correctness of access control enforcement or  


information flow control  policies, things like this.   [Joey] If we don't have anyone this familiar, could  you give us a super quick glimpse because  


I think it's going to come up again in this  conversation: what is a dependent type? Right.


[Nikhil] All right it's kind of a profound  question in some ways but  


in a simple sense it's a way to kind of encode a  logic in the type system for programming language  


and the word dependent maybe you know maybe it's a  little bit scary but it's actually a really simple   thing. It's like you can have a function who’s  you know, usually a function if you give it a type  


signature it's a function that takes an integer  and returns an integer. Let's say that's a common,   it's like a simple type for a function but you  can have functions that that are, you know,  


whose return type can depend on the argument  and it can do things like if you call it with  


the integer 17 then it returns, I don't know a vector whose length is 17.  


Or you could even be things like, "hey,  you know you call it with a boolean  


and if you call it with true, it returns a  string and if you call it with false it returns  


I don't know, an integer, let's say. So that's why  the word has heading dependent comes from but this 


mechanism which you can kind of think of as like  the way the examples I gave you are very kind of  


programming centric you know but  this basic mechanism allows you in 


a very kind of foundational way  to encode within the type system  


very powerful logics that let you build all of  mathematics in some sense inside this logic which  


doubles as a programming language. Now dependent  types of because it is very powerful thing they  


you know you don't see them that often in  mainstream programming languages — you're   beginning to — but they you know the kind  of languages that me and my colleagues work  


on you know there's I'm sure we'll get into more  of them but you know they're fairly I don't know


they're pretty fancy, I mean to put it bluntly.  So I was trying in my thesis work to sort of see  


well you know I don't want full dependent types  what can I you know can I just inject a tiny bit   of dependent types into my language and you know  how much mileage can I extract from that? Like  


you know, what kinds of things can I prove  if I get you know dependent types light  


and then you know then I guess  it's 10 - 15 years later.


I'm no longer doing dependent types  light it's kind of you know the


heavy dependent types, yeah. The entry,  the gateway drug, or something you know.   You start doing it a little bit and then you  know after a while you're like you're fully in  


bed with dependent types and it's all you know  [Shpat] Fully in bed with dependent types. I think we  


got our title for the podcast. [Joey] Cool, and so now  you're at Microsoft Research and have been for  


a while and you've been working on increasingly  more dependent types in a language called F*.  


[Nikhil] That's right. Yeah, yeah. I've been at  Microsoft Research since about 2008 and   I worked there at a group called RiSE  research and software engineering and  


I work on F* but broadly on you know programming language techniques,  


programming language and program  verification techniques to prove


programs correct, to prove them  secure, to prove you know things like  


resource bonds and about them things like this  proving things about programs through program  


language design yeah through programming language  design where you know in these new programming  


languages to sort of the languages facilitate  writing programs and their proofs together  


to you know make it possible and maybe easier  to actually write programs that are correct.  


A lot of this work from where you started to the  working Microsoft sounds like part of it has maybe  


culminated, as the wrong word, but it has led to  this language that you're working on which is F*.  


I wonder if it would be a good place to start —  for you to tell us a little bit about it. Yeah,   sure. So F* is this programming language it  looks like superficially it looks like a you know  


many other functional programming languages  things like languages in the in the ML family,  


things like OCaml or F-sharp or  even Haskell to a certain extent,  


Ao which is to say that it's a mostly  functional, statically typed language


where you know what's maybe  unique about it is that it  


has you know I'd say like three maybe unique  features. the first is that it's dependently typed 


meaning that the type system is more powerful  than these other languages that I've mentioned.   So far things like OCaml or F sharp and so on it  has a type system that is you know it's got full  


dependent types similar to other languages  that you may have heard, of things like  


like Lean or coq or agda or you know  other differently type languages but


it is different from the the dependent types  that you see in these other languages in that  


F* is based on an extensional type theory which  is slightly technical, we can talk about that  


at some point if you want, different from  the type theories that are implemented by  


these other languages that I mentioned. I think  this extensionality thing is kind of important   because it enables you know making F* sort of a  programmer first kind of language rather than a  


language in which you say maybe you know some  of the other other languages they're kind of   set up as you know proof assistance in which  you can do in sort of you know mathematics  


at a very deep abstract level F* is more kind of  programmer-centric where you're not you wouldn't  


necessarily fire up an F* session to I don't know  formalize category theory or something. I mean you  


can if you want but that's not what people use it  for it's more like you know you use it to write,  


for building programs that you want to execute  and this in doing that like if as you're writing  


programs and you're writing sort of software it  it turns out you know as probably you're both  


aware like when you're writing a piece of software  you're not there typically and you want to prove  


it correct there's maybe a handful of interesting  theorems that you want to prove but the bulk  


of the things that you're trying to prove are  really, really tedious low level you know tons of,  


you're mired in details, hundreds thousands of  small proofs that help towards maybe you have some  


interesting insight about a big invariant and like  you know there's there's a handful of big proofs.  


So one thing that F* really aims to do is to try  to reduce the overhead of the human overhead of  


having to write these thousands of small proofs  all over the place and the way that it does this   is by using and this is kind of the one of the  unique things about F* is by using an SMT solver.  


An assembly solver is an automated theorem prover that if you know you feed it a problem in   some logic and it's usually some you know  things like first order logic plus some  


reasoning capabilities about arithmetic  things like this you feed it to the SMT   solver and the SMT solver can try to find , decide  whether a particular formula is valid or not.  


So what F* does is that it analyzes your program  in this, its dependently type logic produces  


these you know hundreds a large number of proof  obligations that pop up when you're trying to  


verify even you know 10 lines of code and feeds  that to this SMT solver which tries to do the  


proof for you. Now I mentioned at the beginning  that extensionality was really important here and  


one of the reasons is that the particular flavor  of type theory that F* uses is designed in a way  


to enable this kind of you know dispatching proofs  to an external theorem solver that can you know  


implicitly prove many properties about  your program while you're writing them.   So that's kind of so dependent types  and and SMT solving that's kind of two  


important things about F*. The third thing  is that you know when I want, me and others,  


want to write software in F* that is efficient and  you know can be low-level software that's doing  


I don't know. Things like efficient network  communication or cryptography or you know  


things like this. So this kind of software is in  many ways inherently effectful or imperative in  


a sense you know although I said at the beginning  that F* sort of a mostly functional language the  


way in which we use F* is by encoding within the  functional language other sub languages that allow  


you to reason about computational effects.  You know things that your mutable state or


you know raising exceptions or doing IO  Things like this. You can reason about that,  


you can write programs that do these things  in F* and reason about them within F* and  


so F* has this effect system and that's the I  think this is the other unique thing about F*   that it's dependent types SMT solving and effects  and that package together is, we use it to


to build a variety of things in the  kind of domains that I've mentioned.  


You know like a focus for us  recently has been low-level  


secure communication software that's ranging  from things like cryptographic protocols to


say a network packet parsing things that we  


we implement in F* proof correct and then execute  as you know you can get out of F* you can get C  


code or assembly code or whatever depending on  which of these fragments or style you program in  


you get out this imperative low-level  software that you can then deploy and run.  


So it's in somewhat active use  for real things, it sounds like.


My question, if that's somewhat correct.   Well it's a it you know I'd say there's there's a  small community of active users who are you know


I mean my colleagues at MSR, we use — several  of us use F* to do things with it like you know,  


what something that we've done recently is how  there's an F* a sub-language for specifically  


for writing low-level parsers correctly  parsers of things like network message formats  


so we have a verified parser generator that  we've implemented inside F* and we use this,  


Is this everparse? This is called everparse that's  right yeah. So you know we use this to to build  


the network packet formats used in various parts  of the windows kernel and we extract you know high  


performance C code for parsing those packets  and for the last maybe 18 months or so we have  


verified parcels produced by F* you know running  inside hyper-v, inside the windows kernel  


in several releases of windows so far. So there are pockets of such you know people who  


even outside MSR there's some people who  use F* to deploy verified cryptography in  


the Linux kernel or in Firefox you know.


[Shpat] I have a couple questions about  that which is that's awesome. You,  


at a higher level than F*, have so first  of all you know when we're talking about   dependent types you said they don't end  up in programming languages that are  


more generic or more used because  they're a little complicated, it's my   simplified version of what you said. But it  sounds like you've made this language suitable for  


programming you know in the large and yet you know  designed sort of dependent type theory into it.


How did you make that happen and is it  the case that this is more approachable?  


[Nikhil] I mean I think there's still a long ways to  go I'd like to think that at least some of   what we do in F* makes dependent types more  approachable to certain kinds of people.  


You know it's still like I said a very small  community of people who are using this stuff but I   think I'm optimistic. I think we're you know we're  on a trajectory where as we see more successes  


across the the several differently type languages  that we have out there today and as you see you  


know students coming out of universities around  the world like the gaining more experience with  


this kind of stuff where you know you look at if  I even just from a personal experience like if I 


think about the interns that I had when I  joined MSR in 2008 you know super smart students  


but you know at the time programming like this  in a differently typed language was like the   bleeding edge like not everybody had experience  with it and but today the interns that you get  


like everyone has done many of them have done you  know some part of software foundations or they've  


you know programmed in coq or something and it's  it doesn't feel alien to to people. Many people  


anymore it's just like okay I just I can just  hack my way through this proof no problem and so  


I think the the community of people who have a you  know a certain kind of baseline skill to approach


these things that were previously quite esoteric  


is you know that community is growing and growing  I think quite fast. So I hope that you know in a  


few years like languages like F* or Lean or you  know so many others will no longer be as esoteric.


[Shpat] That's a great word for it. [Joey] Well it's a really  important perspective because sometimes when we  


look at these things today it's still a small  community right. By most metrics of programming  


language communities there is kind of hope because  yeah if you look 10 years ago dependent type was  


like yeah it kind of felt new and now it's  yeah now you have a talent pool to draw from  


and there's no reason that growth that growth  should slow down so that is a really important   perspective to say like, "yeah, it's not it's not  huge yet, but we've made progress." Things like  


F* and coq and I think lean is bringing a whole  new group of people sort of into the independent  


type sort of world and yeah it's really exciting  to see that growth and definitely important to  


acknowledge all the progress that's been made in  the last 10 - 15 years right. [Nikhil] Yeah, yeah I think  


maybe the answer to me one way in which I see sort  of more growth happening is you know and maybe


as a functional programmer this is not that  surprising but I really see like the you know  


domain-specific languages is a big part of the  answer that you know you have these frameworks   that are super powerful you can formalize all of  mathematics in them if you want but most people  


don't want to formalize all of mathematics  they want to do some very specific thing  


you know like I don't know writing in our  for instance that taking everparse again was  


you know I'm not going to unleash the  full power of independent types on   for someone who just wants to write a packet  parser you know I'll design some framework  


within my you know some language within my  very powerful framework that's tailored to   this very specific usage where you kind of that  sort of rains in the complexity and makes the  


allows a person to get value from it without  having to understand all of it. [Joey] I'm laughing   a little because you sort of mentioned that and I  think I think Leo had this experience where maybe  


he thought the same but then the entire math  world sort of showed up at his door and said   we want to formalize all the mathematics with  dependent types. [Nikhil] That's true, that's true but  


yeah fair enough. [Joey] Yes but yeah, certainly  not everybody wants to do that and the uses  


seem to be fairly different so yeah I mean I  guess on an unrelated question you know since  


since we're mentioning lean and I think maybe  this gets back to the extensional intentional


kind of thing but you know we did talk to Leo  a while back and one of the things I've sort of  


been wondering about since we talked with him is  like Leo obviously understands SMT solvers very  


well and knows what they can do from the sounds  of it he started out trying to write something  


that could make use of SMT solvers and they could  let you get more work done before the SMT solvers   do their thing but he ended up in a world where  like lean doesn't speak to SMT solvers very well  


or at all at the moment and is that sort of the  extensional intentional distinction there or did  


some did something else happen in your opinion to  sort of cause them to to move away from that? [Nikhil] I  


think you could make this SMT thing work inside  an intentional type theory in certain ways.


I think within lean too it could  I think it could well happen I  


talked to Leo a lot you know he's talking a  ton about both SMT solvers and type theories  


he was saying the same he  was saying the same about you


he's being kind it's definitely the other way but


yeah I think I mean if I was to guess I  mean I'd say you know some we may still  


someday have a SMT solver inside lean I think it's just that lean has really taken  


off in this math community where the priorities  have been to you know to serve the needs of you  


know to meet and serve the needs of that  community and maybe SMT's offers are not  


have not made it to the top of the priority  list yet but they may I expect someday they will  


but the kinds of things that you know  and and other and people have built SMT  


assisted proof automation in in in other  frameworks from you know there are some in is  


notably it's not dependently type theory dependent  type theory but you know SMT solvers work very  


well there but also there's some some some work  in cock doing things like this but the thing in F*  


that makes it different than say lean or coq  or other defending type languages is that this  


you can use an SMT you know proofs  done by an SMT solver or proof's done  


they can be done by an essential solver but  they could be done by any other means to buy   a tactic or whatever you can use them  to implicitly convert types everywhere  


that's the main thing like you know you can  you know if I have a you know a vector of  


whose length is n plus n I can implicitly treat it  as a vector of length m plus n because of course m  


plus n is equal to n plus m and you know an  assembly solver will tell me that instantly  


but in in an intentional type theory   if you want to do this simple manipulation of  converting these two types from one to the other  


there's a lot of well Joey you're smiling because  you've probably had to confront this kind of thing  


before because it's kind of a mess I mean to do  these these what seem like obvious conversions  


is a bunch of bureaucracy in an intentional type  theory you have to explicitly have these equality  


witnessing conversions and things like this and  and these things just go away in an extensional  


type theory and maybe makes it more immutable  for me to just call an SMT solver at this point  


to just decide if n plus m equals m plus n  and just be done with it without having to  


you know mangle my program [Joey] Gotcha so  it's not necessarily impossible in this  


intentional setting but it's quite a bit easier  I guess yeah in the extensional setting to just  


take what the take what the solver says and  kind of go with it and yeah not have to do   quite so much bookkeeping around you know why  is the why is what the solver says correct how do  


I propagate it through my terms and things  like that yeah that makes a lot of sense  


one of the [Shpat] One of the great things  about being a podcast host is that  


when I am out of my depth or don't understand  something I can move us on so here it is  


me moving us on turned on topic. [Nikhil] Thank you. [Shpat] But jokes and jokes aside pseudo jokes aside the  


I want to see if so there's probably a lot  of people listening who do work on their own   languages or they're doing research and trying a  few new things with maybe toy languages that one  


might they one day might become I get a larger  user base I'm wondering if we can talk about  


how you think especially since if F* is  getting some use about how you think about  


how do you evaluate language design from the  lens of is it useful and will people use it  


I wonder if you have thoughts there to share. [Nikhil] Yeah  it's a good question I mean I have some thoughts I  


don't my thoughts aren't necessarily you know I  don't have any sort of deep worlds of wisdom here  


it's but I think one way one thing  I can say is that you know somehow  


co-evolving your language design with  a problem that you really want to solve  


is really important so that you're not  just you know yeah you're not just building  


something that you can't keep evaluating you  need to be able to evaluate what you're doing  


like every day you know is this actually helping  somebody solve some problem you know that often  


means pairing up with some domain expert because  sometimes you know as a as a language designer  


you're you know you can sort of for many of us  I think we're just the aesthetics of some of of  


you know lambda calculators and all this can can  be quite alluring and you just kind of go off and  


design something because like it's beautiful  or something and that's fine and it's fun  


but I think finding a way to sort of pair  up with a domain expert from from some other  


domain who's really trying to solve some problem  whether it's you know whether it's doing math  


or whether you know that for instance in the  lean community that's been that's been a a big  


driver I think for a star that driver has been  a a lot about you know has been cryptography  


and you know people who are interested in  all sort of implementers of cryptography not  


you know not necessarily cryptographers  as such but you people who want to build   efficient implementations of crypto protocols or  or crypto primitives and so I think finding a way  


to you know and that that sort of cycle has been  I think really helpful for F* you know there's  


we proposed some you know some new way of doing  proofs and immediately you know our friends at  


Inria or CMU will try it and tell us you know  if it makes sense or not and we'll iterate  


so I think you know and this is not a by no means  is it like a huge user communities maybe we're  


talking you know this kind of the the close circle  of people who repeatedly give us feedback on  


on new language features is something like you  know maybe 50 people nothing nothing crazy right   but but that even that is enough I think to  to to keep you honest in your language. [Joey] When  


we interviewed Alastair Reid he had this well I  guess he had a paper called it but I like the idea   as well as kind of the idea of meeting developers  where they are and that's really I think that's  


kind of what DSLs are all about to some extent  is like we have this language it can express   anything in the world you can write any program  in the world with it and say anything you can  


express in a logic about it right yeah and that's  great and that you know that'll get you a puff of   paper maybe an ICF paper but that's not enough  to of course get somebody get somebody using it  


and and the idea they were all about is like we  need to minimize the time between when someone  


starts using this thing and when they see some  some sort of good results basically and so yeah  


DSLs are a way of bringing that time way down and  minimizing the things they need to learn I guess  


have you done areas other than other than  cryptography where you've seen success is   it possible to layer up different sort of DSLs  on top of F*? [Nikhil] Yeah so you know one of the things  


that we're looking at or the sort of systems  programming but systems programming that you   with with concurrency and that you can sort  of prove correct in in a separation logic  


which is a way to write imperative programs in  concurrent programs in netstar and reason about  


them in this in this separate in this thing called  the separation logic separation logic is it's an   active area of research in the PL community  in the program verification community and  


it's a way to sort of structure proofs that   that are that's more modular if I were  to just put it in a in a nutshell and our  


things that we you know are from a purely sort  of language design perspective we we started  


to look into this seriously maybe you know  after our experience trying to do proofs in  


for crypto protocols and so on in a  framework that didn't have separation logic   and we were able to get things done and we  have a large code base that's done without  


separation logic and all these these this code  that's deployed in various places doesn't use   separation logic but we notice that our proofs  are way harder and hard to maintain because  


they lack this kind of modularity so we've been  looking at you know so motivated by that sort of  


you know empirical observation that we have this  code base it's kind of many hard-fought victories  


how can we make it a little simpler maybe by  enhancing the logic in which we work and so so  


that's something we've been looking at  recently and and using it to do things like  


you know it's it's building on on what we've done  with crypto and also parsers and so on but it's  


trying to apply some of these ideas in what we  have a project where we're trying to use this to  


do things like high assurance monitoring of cloud  services like can can you write a can you write a  


monitor for a service that's going to observe the  interactions that a client has with that service  


and you know in and provide some guarantees about  that interaction cryptographic guarantees about  


that interaction where for instance if the  if we notice that the server is responding  


in a way that's incompatible with the trace of  messages that have that the server has seen so far  


then we except for some small probability of  you breaking a cryptographic hash or something  


our monitor will will notice this and and and  signal an error you know signal that you know  


the service has deviated from expected behavior  so things like this so it's so still you know  


there's still some cryptographic components to  it but moving beyond just crypto primitives or   protocols [Joey] Yeah that's I guess that's one of  my that's sort of one of my favorite uses of  


of formal methods maybe almost ever I guess  is like the state machine work you all did   with TLS where you did about the same thing you  made a verified implementation and maybe it wasn't  


quite high performance enough or there were some  other demands that it wasn't quite meeting to   to run in production but what  you all were able to do is  


know exactly what a perfect TLS conversation looks  like because you made a verified implementation   and then just have a bunch of TLS conversations  and whenever the whenever the other side messed up  


you said ah they messed up and you you all found  like a lot of serious TLS implementation bugs  


in systems that are running in production which  was I think a really cool result and a really   way to really great way to make a lot of formal  verification sort of directly useful to production  


systems basically even if people weren't kind  of ready to to use the the verified thing yet .  [Nikhil] Yeah that that's really cool work it's I wish I  could take credit for it it's not actually mine  


it's worked by my close colleagues and friends at  MSR, Cambridge and Indriya in Paris it's Antoine,  


Karthik Bhargava, and Cedric Fornay and others  but yeah that's really cool stuff. [Joey] They did do the  


verification in F* at least to some extent right  so you can get second-degree credit.{Nikhil] I guess sure


[Joey] But this sounds like this a similar sort of  application where yeah you know but it is as  


nice as it is to deploy verified code directly  the existence of verified code allows you to do  


a lot of different things even even if people  aren't ready to ready to run it right away   [Nikhil] Right yeah totally yeah I although you know so yes  I second I totally I buy what you're saying Joey  


but I'm kind of also you know I think you know we  can get to a point where you know verified code is  


you know it is sort of has no excuses like it's  correct it's as fast it's you know maybe you know  


there should be no reason for you to to not  deploy this in a production setting I think   we're getting there in several sort of I mean I  think the main the main barrier is it you know  


is the component that you're trying to  verify is it high value enough like in   terms of is a flaw in that component you  know severe enough that that it warrants  


the investment of so much human expertise into  producing this verified component producing and  


then maintaining it and so there are some places  where this makes sense like you know if there's a   I don't know a bug in your deep in your in  the virtualization stack that's isolating  


cloud tenants from each other then you  know that's that's that's a bug that could  


cost billions and you know maybe in  those situations it's worth you know  


strategically replacing some key component of that  virtualized virtualization stack with a verified  


component and the same goes for cryptography and  you know so there are these these I think these  


sort of select few areas where where they  where the economics is beginning to make sense  


[Joey] The other the other side of economics I guess  that the people take into account here is  


if their team isn't the one that developed the   code but they need to maintain it they  need to they need a decent story there  


and one of the arguments I guess is you're less  likely to need to maintain it because we're more   likely to have gotten it right but sometimes  that's not not sufficientI'm curious you know  


you all got those parsers deployed apparently  teams you know like likely teams other than your  


teams and Microsoft is now sort of responsible  for those in some sense and I'm curious how that's  


I'm curious how that's gone and and what you know  what was convincing in terms of you should use  


this code you're gonna be able to maintain it or  will be able to help you maintain it yeah I don't   know how those conversations go. [Nikhil] Right yeah I think  the maintenance thing is a is a big one like and I  


mean I think this this argument that it's  verified so the maintenance cost is lower  


it's a bit facetious I don't really buy it like  this you know things change sure the code is   verified but you're going to find that you know  the your spec will change because the needs of  


this component are going to change and and then  yeah so but so maintenance is is a is a big issue  


I think in the specific case of the one that you  mentioned that was deploying these parses you know  


the answer again for us was like I mean it's  the DSL thing again you know we're not going to  


if it's just an arbitrary piece of code that's  going to be you know someone has to maintain   then this is if you're not an F* expert you know  then that's that or you don't have an F* expert  


on your team that's really that's a really  hard sell but if what you have is instead  


you your your writing you know you write a format description in some c like syntax and the  


proofs happen automatically behind the scenes then  the maintenance story is is much easier the the  


verification tool is just a tool in as as part  of your your your compiler workflow you're not  


interacting with it directly and so so that's  a bit of an easier sell people who don't have  


to be you know have star experts to to maintain  and evolve this code we still work closely with  


the teams that use our code and you know if the  code has to be evolved we're we're usually part  


of the code review to make sure that it's being  done in a reasonable way things like this the  


other context I think is is also you know in cases  where where the component you're delivering has  


a very clear spec like you're you want to deliver  an implementation of sha or something you know and  


there my colleagues who who do this who have done  this in many in many situations often deliver  


c code that comes out of the tool chain and the c  code is designed to be you know we take a lot of  


pains to make it look pretty and human readable  and so on my colleague Jonathan Proczenko is  


that he's he's responsible for a lot of this  like building a tool that emits clean looking  


c code from F* yeah that's maybe another  way you deliver the artifact and maybe   if things are stable enough like you don't expect  the specter change of this kind of primitive then  


the maintenance question becomes easier as well  [Joey] Right but if yeah if you deliver reasonable  


looking c code the worst case for that team  is that you all disappear no one knows how to   maintain it they've still got a good piece of  code that was verified at one point and that's  


that's it's not the result we sort of dream  of but it's it's a good result still right  


[Shpat] If people want to give F* a shot where  should they start? [Nikhil] there's  


we have an online tutorial actually just pushed a  bunch of new chapters there last week so there's


a I think what's up there right now like if you  wanted to learn how to write purely functional  


programs in star and prove them correct I think  there's a bunch of material up there that can   help you get started I think doing things that are  we also have tutorials out there that sort of help  


you write say c like programs in F* and prove them  correct and get c code out of it in this subset  


of F* that we call low star but these things  this this I have to say it gets substantially  


harder like if you're writing pure F* programs  things are things can be kind of nicely forward  


relatively yeah they can be yeah straightforward  or maybe even if not straightforward it can be  


kind of pleasant and elegant you know that you  set it up right but as you try to do fancier  


things with you know efficient low-level code  and proofs about them things things are still   hard I would say sure [Shpat] That seems to be the  case with with most with most languages that  


are a little bit on the cutting edge like that  [Joey] So relating to having it be a little harder to  


prove the sort of low-level imperative programs  one of the things we find in our tools a lot is  


one of the hardest parts of tool  development is is what you do when   the person doing the proof is is stuck so you know  when we talk about tutorials it's kind of nice  


because you go through and there's the answer and  you know it's going to work where we spend most   of the time in our proofs is when it doesn't work  and we have to figure out what to do next is that  


something you put some work into in F* and making  that story reasonable for your users [Nikhil] Yeah that's  


a great question I wish I mean put some thought  into it but there's so much more to do and I mean  


I think that's kind of the also the sort of the  double-edged sword of having you know lots of  


SMT based automation you know you you're you're  trying to get this automated theory through fear  


improvement to solve undecidable problems and  it's not sometimes it's just not going to do it  


and and then you know how do you ex if a proof  fails how do you explain to user why it failed and  


how to nudge the user from where they are towards  the proof that that may succeed and I think we you  


know we this is a we should F* is not particularly  good at this I should be frank about it like we  


will try to get an error message out of the SMT  solver and tell you oh we fail to prove this  


particular property about you this this conjunct  in your formula that you tried to prove here   we were not able to prove it and that's it and  you know we don't and and sometimes if the SMT  


solver just times out it may not even tell you  why which part of the formula it was working on  


when it timed out and so sometimes we'll say well  here's this chunk of code you know we're trying  


to prove all these things in it but we could we  can't really zero in on like exactly which part   of it failed so I think this kind of diagnosis  of proof failures is it's it's a big thing  


that we that there's lots more that we should  be doing here that we got so so usually this  


this this often involves like you know you  once you gain some experience with the tool  


you kind of get a sense of like you know that the  solver is and and you know all the other things  


that starts doing this it's not just the SMT  solver it's also you know how does the unifier  


work and how you know when does when is it going  to be able to prove things by you know symbolic  


execution and you know things like this you get a  sense of like all these you have all these proof  


tools available to you and you kind of know which  work well in which situations and when approved   fails you say all right well you know I know I'm  trying to do a proof here about modular arithmetic  


and I know the solver isn't great at doing the SMT  solver isn't great at doing that so I'm going to   you know for this part of my proof I'm going to  resort to some other technique maybe you know


use a library of lemmas that we have around  that are going to tell you specific facts   about model arithmetic and do proofs that way  or maybe at that point you realize that hey  


I should just abstract away a lot of these  details like I'm not actually I don't really care  


to be reasoning about you know modular arithmetic  in integers I just care about I can just work  


in some much more abstract setting and my  proofs become simpler and then I can just   instantiate it to my the specific setting  I have in mind so so I think people learn


as in as in most tools I guess you learn  sort of your way around you know and how to  


yeah how to navigate complex proofs [Joey] Right and  the and the question I think is really how do we  


how do we get the the tools to provide  a little more of that rather than   having humans need to rely on on instinct so  much but that was definitely well that was  


definitely a tricky question because I think  in my opinion no one's doing this very well so  


you know I would have been really excited if  you were like yeah we know the answer and you   all can can copy it because I don't think we're  doing great at it I think you know people I guess  


are referring to some of these tools as sort  of the auto active verification tools and   I don't know if anybody in in that space has a  great answer to this question of like how do we  


how do we bridge the gap and like switch over  from switch over from the SMT solver failing to  


the proof the person doing the proof like getting  some useful information out of that and go back   and forth in a in a good way but it's definitely  a really exciting question I think that a lot of  


people are struggling with right now [Nikhil] Yeah I  think it's a really important question it's   also the sort of thing where the incentives  in the academic community are not unless  


or at least in the core PL community are maybe not  structured well you know that well for exploring  


these kinds of questions because there's  no hard there's no clear answer there's no  


theorem you can prove about this it's you know the  you need usability studies and things like this  


you try to extract as much information as you  can from the solver and present it to the user   in some useful way like this problem is super  underspecified right and you just have to  


I think here's where you kind of have to sit in a  very tight loop with with with your users and you  


know figure out if what what you're doing is is  helping or not but it's a it's I think yeah it's  


a really important space and it's kind of wide  open. [Joey] cool well I think we'll leave our listeners   with that as a as a question to ponder and to  you know to reach out when they when they have  


the tool that is that has solved this once and for  all but yeah yeah definitely the definitely how to  


get the incentives right as a challenge in this  in this space as well as the the problem itself.  


But this has been a this has been a wonderful  episode thanks so much for joining us Nik.   [Nikhil] Thank you, yeah thanks Joey, thanks Shpat.  I had a really good time too. [Shpat] Glad to hear.  


This was another episode of Building  Better Systems. Join us again next time.