Building Better Systems

#19: Steve Weis — Security Shouldn't Be the Last Check Box

Episode Summary

In this episode, we talk with Steve Weis, a Senior Staff Security Engineer at Databricks with extensive knowledge of security, cryptography, and software engineering. Steve shares his experience working for large companies like Google and Facebook and how their security needs differ from start-ups and companies trying to scale. He talks about why he thinks companies should share more about how they design their infrastructure and how they can develop a “security mindset” so even non-security-related roles can contribute to building secure systems.

Episode Notes

Watch all our episodes on the Building Better Systems youtube channel.

Steve Weis: https://www.linkedin.com/in/stephenweis/

Joey Dodds: https://galois.com/team/joey-dodds/

Shpat Morina: https://galois.com/team/shpat-morina/

Galois, Inc.: https://galois.com/

Episode Transcription

Shpat Morina (00:00):

Welcome to another episode of building better systems, where we chat with people in industry and academia that work on hard problems around building safer and more reliable software and hardware. My name is Shpat Morina.

Joey Dodds (00:12):

And I'm Joey Dodds,

Shpat Morina (00:14):

Joey, and I work at Galois an R&D lab that focuses on high assurance systems, uh, development and broadly hard problems in computer science.

Joey Dodds (00:21):

Today, we're gonna be talking with Steve Weis. Steve's a senior staff security engineer at Databricks, a company that helps people store visualize, and manipulate data sets. Steve has a wealth of experience in security, cryptography, and software engineering at a wide range of companies, including Google and Facebook. And this episode we discuss how companies should approach security as their needs develop Steve shares, what he's observed at the larger companies that he's worked for and how their needs differ from both startups and companies that are trying to scale up. We follow that up with a discussion about what every employee can do to help build secure systems and talk about some practices that will help in developing a security mindset. Even for people who aren't trained in security,

Intro (01:02):

Designing manufacturing, installing, and maintaining the high speed electronic computer, the largest and most complex computers ever built.

Joey Dodds (01:21):

Thanks for joining us, Steve,

Steve Weis (01:22):

Thank you.

Shpat Morina (01:24):

Um, right now you work at Databricks for the people who don't know what that is, uh, and what the company does. I wonder if you could give us a primer and also, um, maybe tell us a little bit what makes this, what makes Databricks an exciting place to, to work in security for

Steve Weis (01:41):

Yeah, sure thing. This is always a challenge because it kind of does a lot. And so I need to boil it down to a good catchphrase, but, um, you can think of Databricks as a way of working with and visualizing your different data sources that you already have. So imagine you've got data in an S3 bucket, um, you can run Databricks as run time, which is built on Apache spark that log into all these different data sources. And then from that, it gives you a way of, um, using Python notebooks or S scalic code and being able to do your data science and data engineering on top of that. So it's interesting from a security perspective, because, you know, the product is kind of gone from almost like an on-prem like deliverable software into a cloud service, like a software as a service.

Steve Weis (02:27):

And we're in this state now where it's a hybrid model where there's a data plane and a control plane, the data plan's running on the customer side. So it's their responsibility. And then the control plane is like the Databricks operated, you know, website service that will control the data plane. And so it's a, it's, it's an interesting model where we don't have access to the underlying data, but we have the ability to run jobs and analysis on the customer's data on their behalf, and then, you know, present the results in a, in a nice unified interface. And, um, you know, it kind of creates some interesting settings and different complexity. Um, I also think that the long term trend will be more and more of moving into a, uh, you know, software as a service model where more and more is handled by Databricks, the company, versus having things in your own account.

Shpat Morina (03:17):

Yeah, got it. So the, the customer's data and the servers or wherever they store that has nothing to do with Databricks though, you access it for the analytics and for the accessibility of it.

Steve Weis (03:31):

Um, I would say the customer would be running, uh, software in their own accounts. So imagine you have an AWS account and you would spin up some EC2 instances is, and those would be running the, the data re run time. And then the control plan to be able to interact with that. And, you know, tell that run time to go, you know, look over this table and return your results here. And then the, the UI that's printed presented the user is a nice interface that lets you run like Python notebooks or, you know, whatever you wanna do. There's a nice dashboard interface to be old dashboards and provide a, a SQL interface. So, you know, from a security perspective, this creates a lot of questions about providing access control and customers bringing their own key material to encrypt that data and making sure that the control plane is not doesn't have access to your actually underlying data. So it's an it's interesting security model. And, um, I think that it's, you know, has some advantages and end of advantages.

Joey Dodds (04:24):

So you've been with a number of companies working in security, um, previously at some rather mature companies. And, and now at an earlier stage company, um, could you take us briefly through some of the differences and, and how security, um, matures as, as companies mature?

Steve Weis (04:42):

Sure. So just for a little background on myself, I had, uh, previously been at Facebook and Google, which are probably the two mature companies you're talking about and then had also worked at some very small startups. And then now at Databricks, which is kind of a startup becoming a big company. And I think that some of the big differences are at more established companies. A lot of the security infrastructure and problems have not been solved, but somebody's addressed them in some form. Whereas, you know, today a company like Databricks, uh, were kind of transitioning from the point where, you know, you may have had something that was like a stop gap solution or had not been. So now we're building out some of the infrastructure that you would, you would need in the future. So it's really looking at what it takes to go from, you know, a product that is been working and has customers and, you know, maybe has some features that need to be improved and figuring out what needs to be solved for like the next 10 years and, and building out the infrastructure that, uh, that you'd wanna have in, uh, in, in the future.

Steve Weis (05:45):

And that's the kind of infrastructure that I, you know, as an early employee or as an employee, um, was able to use at those big companies.

Joey Dodds (05:52):

So, and, and at the, at the small to medium company, you don't, you don't have the, the infrastructure. So you're just trying to involve it's presumably a juggling act between trying to build that stuff up on the fly and, and dealing with existing issues. Basically,

Steve Weis (06:07):

I think that's right. I think it's trying to look at where you have common problems that come up. And so, you know, an example that, you know, in general would just be like web frameworks, like, you know, most companies today use some sort of web framework that will avoid common web app security issues. And that's the type of thing that solves a class of problems that if you don't have it in place, it'll get recreated by, you know, developers again and again, and not to their fault. Like they're, you know, you know, the reason why you need these F works is to prevent people from making cog mistakes. Uh, another great example is, you know, cryptographic keys and crypto tools, um, left to their own devices. People need to solve problems and they will go Google, you know, stack overflow or whatever form post to figure out how to encrypt bla data or sign a blah data. And there's no way to know that the thing you're looking at maybe 10 years outta date, or have a vulnerability, or have a typo. And, um, you know, the solution to this is to build that infrastructure that gives people a safe options, uh, by default. So if it's, you know, a key management service that handles your keys for you, if it's a cryptographic library, a cryptographic service us, whatever it is that just basically makes your internal developer's jobs easier and safer.

Joey Dodds (07:25):

Gotcha. So you can say when, when things have been done and the way we set it up, we're, we're probably okay. And then you can focus on the exceptions as things that maybe need a closer eye to security, um, rather than having to focus on everything and, and worry about everything. If, if you get that, that up appropriately.

Steve Weis (07:40):

Yeah. I think that if you see like a pattern of common mistakes happening again, and again, you might sit back and say, you know what, it's about time that we provide a tool that does this for people so that they don't have to keep figuring out for themselves and can, can do the right, right thing and whatever that's web app design, whether it's token, generat, credential management, whatever it is that you get to that point where, you know, you've observed it being a problem multiple times, and it's time to, to put something in place to make it safe.

Joey Dodds (08:09):

And presumably sometimes you get to the point where that's gonna be, um, win-win in the sense that, um, that confusion is actually costing you at a certain point, people are taking longer to figure out how to do this stuff even incorrectly. Um, and so if you can provide that framework, it, it would help them quite a bit in just getting to a solution faster, as well as increasing security at the same time.

Steve Weis (08:31):

Yeah, I think so, like a good example of that is, you know, something like access control list management, um, a lot of companies will have some sort of internal, uh, you know, whether it's user to service or service to service access control system. And, you know, if you have to reinvent that every time, let's say you have two services talking to each other of how to, you know, authorize access to one another, you're wasting a lot of time and having a lot of redundant code and probably not doing it right all the time. And also making it hard to maintain hard, to manage hard to audit. So something like that, putting it into like either a centralized service or some sort of common library, um, does save developer cycles. And at the, at the end of the day is making your system more secure here. And also just more understandable that that's really what it comes down to is being able to answer questions about how does this thing work and why do I think it's secure? And how do I know that the right people or systems are accessing what they're supposed to be accessing?

Shpat Morina (09:27):

You go back to Databricks. We speak to a lot of, so right. Databricks sounds like it's pretty big right now. And there's, you know, um, there's lots of cost. The, and it's a, you know, I think it has a multiple billions of dollars valuation, so it's not small anymore. Um, we talk to a lot of smaller companies that start out that might, you know, end up in that situation. And often there's this question of, you know, we meet thoughtful people that, that one, one thing about security earlier on. Um, but it's, there's always seems to be a trade off when you're starting to build something, cuz you need to validate that there's actually a market for it or, or validate that the tech will do what you wanted to do for, for, um, for customers. So small question. Right. But like how do you think about, um, companies that are starting out? Um, how would you suggest they think about when it is time to think about security in which kind of dimensions and, and, but you know, how much, how much security and when go,

Steve Weis (10:33):

It's a good question. And I think that, you know, security people in general, at least myself in the past, uh, aired on the side of, Hey, this has to be secure. Security is really important. Your company will die if you don't have this in place. And the reality is, is that it's one of many different needs. And if you're starting out as a small company, the end of the day is you need a functional business to survive. Your company has to make money and profit to continue. And you're not gonna, the security won't matter if you don't have customers and you're not able to serve those customers. So I see security as a, you know, support system for the product and the underlying business. It, you know, you can build the most secure data storage, crypto, whatever. And if there's nobody using doesn't matter. So I think that those early companies, you know, the things I would try to do are to avoid creating problems in the future and making dis early decisions that come back to bite you.

Steve Weis (11:29):

So, um, there's, you know, incremental things you can do that can put you in a good place. And so one of those would be, you know, looking ahead and figuring out that, Hey, if I'm gonna sell to enterprise customers, they're gonna have their own compliance checklist. What can I do today to make answering that easier? So, um, I think one of the people you interviewed on the show, uh, Dan has like a, how to get SOC 2 from like a startup guide. I don't know, it might be attributing to the wrong person, but somebody out there is a good guide for like, you know, the first thing you do is like use an SSO provider, then all your logins are through one thing and it's easy to account for like what your employees have access to, you know, simple things that most companies probably are doing today, but you know, doing it consciously and then go documenting it and then having those answers in the future where it's like, all right, where is your, how do you know that malicious source code isn't checked in?

Steve Weis (12:20):

And you've got the process of like, here's a repo, here's our review process here it is. And so, yeah, I think that would probably be my advice is to look ahead to what you'll need to enable that process, to make sure you can answer the questions of how your system works and what you're doing and, you know, make sure that you make decisions today that will make that easier to convey to somebody, if you take shortcuts. And they're like, yeah, we dump this in a plain text table somewhere. And when you have to, somebody asks, you're like, have to explain that to them. Um, you know, anticipate that that might be an issue. So, uh, that's probably a good frame to look at. It is how do you explain to somebody else what you did and why you did it? And that can kind of put in light of like, yeah, maybe we shouldn't take this shortcut where, you know, the single developer has the root on their laptop and they can see all the data, you know, whatever, whatever scenario you wanna think of.

Joey Dodds (13:15):

So when you, when you talk about something like SOC 2, it's gonna help you with some technical things it's gonna help you with some people, things. Um, the justification, I guess at the end of the day is like, we won't be able to make a sale, a specific sale unless we're, you know, unless we're able to get this through this checklist, um, is SOC 2 gonna help with a lot more than making the sale at the end of the day.

Steve Weis (13:40):

I think it's a good driver. Um, you know, obviously with any sort of compliance regime, some parts of it are not gonna be relevant to your underlying business. PCI compliance might be talking about, you know, individual network configurations or fire walls. And you might be, it's a completely like cloud model where the, the things they, it asks for don't, don't make exact sense. I think that, you know, if you're looking at it from like a startup where you may not have that first customer, um, I would just be anticipating it. Like, I wouldn't necessarily go through the effort of getting yourself compliant out of the box, but just say, we may have a customer the next three months, six months who is gonna ask for this, how do we address, what will we need to do to get there for them? And it doesn't happen ha have to happen at once, but you should at least know how long will it take, what documents do we need to produce?

Steve Weis (14:29):

You know, what's the, the steps that we would have to take to, to meet this and, you know, look at it a continuous process and not just a one time thing where, all right, we did X, Y, and Z, and now we're good, we're secure. And we can sell the people. It's more of like, this is gonna keep coming up. You're gonna have different customers who are gonna have different, different checklists. They're gonna have different things to ask for and just be able to answer their questions. And then also, you know, in terms of growth, just how are you gonna be able to grow, you know, the regions you're in the number of employees working on this, you know, how are you gonna be able to do support, um, things to think about early on that? So you don't make bad decisions. You know, I think a lot of companies can get in the situation from the startup days where like, okay, everybody had route, everybody can read the database, everyone can go do anything they want in production.

Steve Weis (15:20):

And that's hard to break that habit. Once you get a set of people who've been in the company and have their development process for years where they're like used to logging in and, and, you know, deleting row from a table, um, that's really hard to break over the long run and especially, uh, as people get used to it. So kind of think about, let's not give everyone full access to start with, what can we do to make this a little more controlled, you know, with the balance of like, you actually do need to develop and get stuff done. So, you know, a good example would be having multiple environments to start with, like, that's something I think a lot of companies could do is have a dev and staging and pro environment or some sort of very realistic testbed environment because, you know, it lets your engineers do what they need to do to build the products, debug the product without having the gates open from the get go.

Steve Weis (16:10):

And, um, you know, I think my time at Google, they, I never was there at a time where it was kind of like wide open. It was always pretty well controlled. And they had like these tiers of environments by the time I was there. So that's kind of a good model in my mind of like, you know, I never had access to customer data there and I don't even know what the process was, cause I never had to cause I was able to develop like fully without doing that. I mean, other people would need to go into production systems to debug things, but, um, I never really had to.

Shpat Morina (16:40):

Yeah, I guess that's ideal. Right? That's that's the ideal case. Um,

Joey Dodds (16:45):

It can be, it can be liberating as an engineer from the sense of like, I, I think you, the pressure of like being able to maybe accidentally break broad, you know, people find it, maybe be thrilling and people find the, you know, being able to get things done quickly, um, in the absence of like, yeah, well, like I know this is right. I'm just gonna put it out there is is nice. But at the same time, it's always sort of nice to say like, well there's a, there's a backstop. Like someone might catch some things before, you know, there might be tears, there's other people watching, there's some systems watching, um, in some sense it lets you experiment a little more sometimes I think, than being like, well, every everything I'm gonna push is gonna like, you know, I could ruin a customer's day. Um, if I, if I, if I put this out there wrong, you're gonna, it's gonna, it's gonna change the way you think. Um, so it's, it's sort of a, it it's, yeah. It's hard to say what's, you know, what's better for just from like the individual developer side cuz yeah. I know some people get accustomed to that, like yeah, it's going straight to fraud kind of thing. But um, it's not, it's not clear that it's all that great.

Steve Weis (17:44):

I, I think having like good access control in place where exactly like you said, you know, that something you do is not gonna accidentally wipe out a whole database because you don't have the right. Do you have to go through some process to get your job or whatever you're talking about the rights to do, you know, whatever that destructive action is. And you know, I think this kind of speaks to something else where security doesn't have to be antagonistic to regular engineering and development. There's a lot of things where it's aligned. So good example is talking about these multiple environments, like having a nonsensitive environment to develop in where you don't have any customer data. That's actually great for stability because I can test all my stuff in there. I can have like shadow traffic, whatever it is, where, you know, this is not pro data, I'm not gonna break anything.

Steve Weis (18:32):

And that's good for development in general and also, um, the speed at which things can be released. So if you have a, like a, a release cycle that is very long, it makes it hard to fix security issues. When you do have to do a security patch, it's like disrupted because it's kind of out of the regular release cycle. So you get very short release cycles. That's great for an engineer who wants to get features out faster. And it's also great for security cuz we can patch really quick and make sure that if there is a security issue that, you know, it's a daily release or something or hourly release, it just it'll go out with that. You don't need to do any, anything special.

Shpat Morina (19:05):

Um, this, this idea of security being antagonistic to engineers and I, I guess, especially in smaller companies as they evolve to this, um, and they haven't thought about it before. I'm sure there's gonna be more often than not a security person, the, the security person that comes in or, or the little team that comes in and it might feel like that. Right. Because it, they're probably asking you to change the, how do you think about that? Like how do you think about not necessarily splitting like that, that way when, when you have to bring in security as a, a later thing. Um, and what, what would you given your experience? I'm sure you've experienced situations like that. What would you do, uh, um, to make that more smooth?

Steve Weis (19:55):

Yeah, it's a, it's a good question because I feel like a lot of companies that first person on will be more on the compliance side and more on the like kinda operation side. So, you know, several places I've been kind of have a split between the people working on the engineering side and product who actually they are working on the security of the product or security features in the product versus security of the whole organization. And that might include like it, um, operations of the infrastructure of like, you know, how do you manage your, your cloud services accounts, how who's got access to that. Who's got access to the corporate SSO systems. And I think at often at companies, the original or early hires will be more on that op side of who's setting up our firewalls, who's setting up our, you know, key management system.

Steve Weis (20:43):

Um, and that, that engineering side kind of comes on later. And I think that an early company, it may make sense to take existing engineers and kind of train them up to be more security representatives. So it's not necessarily, you have to bring in a person who's gonna run security for engineering work. But if you have kind of senior engineers who are familiar with the system and can think about it and you know, have them be responsible for some of these things, I think that's okay to have that responsibility to diffuse among different engineering teams. And it really depends on, on the phase of the company, if you're a company with 10 engineers, it's like, you know, somebody's gonna be responsible for it in like, there's not gonna be five engineers working on security. So it really depends on the, the phase of the, and the market you're talking about. I mean, there's all different needs if you're, you know, somebody who's in the, in the, uh, you know, working on health space, I know a, a startup with like five employees and they're super security oriented because they're working with, uh, with health data. And like, that was the first thing outta the day. They can't operate their business unless they're HIPAA compliant and they have to have of that as a selling point for their, their, their product. So yeah, it really depends on the company size, the maturity and the market itself,

Joey Dodds (21:56):

But some, some companies do still end up in this, I think, situation where there's a security team that is viewed as antagonistic, almost like it's like, okay, great. The security guys are coming. We've gotta like, we gotta like get through this review so we can push our stuff to pro and it's just gonna be like, it's just gonna be a headache. Um, I don't think that's necessarily desirable, but do you have any ideas on how companies can avoid that kind of situation?

Steve Weis (22:22):

I think that that can happen especially. And if there's like a product release process where you have to have security sign off and it's kinda like the last thing of the checkbox is like security, legal privacy, and that needs to be early in the process. Like it just needs to be earlier in the process and be kind of a co co-development process. So you don't want to just go at the end of the day and throw your product over the fence and say, check it out. Um, a lot of people kind of just think there's like a magic like pen test this button where you throw this finish product out the door and then somebody pen test it and gives you back a couple things to fix, like really it has to be earlier on. And so, you know, if you're in that, I would say having design reviews early on to look for security issues, and again, this doesn't need to be somebody who's, you know, a dedicated security per person, just somebody thinking about it, like, just sit down and say, all right, what are we trying to protect against?

Steve Weis (23:13):

What's new, what's changed? How do you address, like, you know, these five or six issues on new components and, and really just somebody thinking about it, that that's the main thing is it doesn't need to be like, perfect. Um, but just the act of sitting down, writing down what that threat model is and thinking about, okay, what has actually changed here? What, what happens if somebody, you know, hits us with a ton of traffic or, you know, how is authorization provided here? And a lot of times people are like, oh, it's not, you know, we, we didn't think about that. So, um, I think the earlier on the process, you can do that. Um, now that statement is kind of baking in that there's actually like a design and, and a review process. So, you know, early starts may not have a design process, might be like, oh yeah, we gotta do this thing.

Steve Weis (23:56):

And then, then they build it the same day. So, um, it really, you know, I think this is kinda where it's a little bit more organic and it's like, companies need to find that balance where they are getting some sort of security oversight, but also not, you know, burdening teams with too much pressure. And like I said, I think the best thing to do is get view it as like, co-creation where you're not antagonizing the product team. You're there to actually get the product team out the door and what things we need to think about and basically bringing up issues that may happen. It's like, by the way, this new API has no rate limiting. And so if you open this up, you might get slammed. You know, what, what happens if somebody does that? And then somebody might think, oh, okay. Yeah, we gotta put rate limiting on.

Steve Weis (24:40):

Or maybe we don't need to expose this, or maybe we need something else. So, you know, things like that, just being the person who thinks about it, cuz you know, day to day somebody building something is thinking about getting it to actually work. They're not thinking about all the ways that somebody out there might try to break it or, or that, and again, I think anybody can do this. Like I don't think it it's like some special skill. It's just like sitting down, drawing the picture of what it's, what it's supposed to look like. And then looking at the, at the new things and the edges and saying, okay, what happens here? How is this, you know, what's sent over this connection, what is, how is this actually authorized? And, um, it's a good exercise for anything, anyone to engineer and not even necessarily engineers. This is something that like somebody from a non-technical background can do. It's like, if you can talk about some of these things and uh, just, you know, reason about it, you don't need to know how the code works underneath. You can just, you know, be able to look at a diagram and kind of understand like the basics of what's going on.

Joey Dodds (25:38):

I think, I think people underestimate how valuable the step of writing it down or typing it out actually is as well. Cuz like a lot of people, are you like sort of like, uh, yeah, yeah. I thought about it. Like, I, I don't think anybody could break this. Right? Like this feels pretty good um, but then a lot of times when you take even that step of just, yeah, I'm gonna write down what I think like that forces you to just think about it that much harder and to get that much more, um, intentional about what you're doing. Um, but I think what really stands out is that, you know, your, your statement that like you don't have to hire a security person to do this. Like we've all seen systems break and you know, any, anybody that's spent time engineering has, has seen systems in code breaks. So you have an understanding of the kind of things that can go wrong. And then, then it's just a matter of thinking, like, could this happen to, to my system that I'm, that I'm building now. And, and if you write that out, you're gonna do a better job than if you don't basically. Um, and in the long run you can talk about security people, but if you don't have those resources, it doesn't mean that that it's impossible to have any semblance of security or to build systems up in a secure way.

Steve Weis (26:44):

Yeah. I think just somebody sitting down and taking your design and explaining your own design back to you. Like for me, that's like the first thing I do is I read somebody's design and I think say, okay, this is what I think this says, and then they can correct me if I'm wrong. And that's how I understand how it works is like, let me explain to you what you just told me and see if I got it right then at that point it's, you know, like you said, looking at just what's new, what's being protected. What's what are the new interfaces on whatever this is, if it's, uh, you know, storage system or an API or whatever we're talking about, um, you know, user interface and just asking the same questions, you know, how is this validated? How is this authorized? How is, uh, how is this protected in transit?

Steve Weis (27:28):

Like all, all the basics. And you know, I think that, uh, a lot of people could do this. It's not like it's a, a skill that is impossible to learn or takes a whole career learn. It's, you know, going through a few times, it kind of understanding like what look for, and it doesn't need to be perfect. It's like the, the act of talking about it and thinking about is really the benefit and the output of this threat model. Like really, you're probably not gonna look back at that. Like it's maybe there for later, if you wanna like, you know, explain to somebody how it works. So that's good to have, but, um, you know, it's, uh, it's the, the going through the, the act that is really the benefit.

Joey Dodds (28:07):

Yeah. We say that we say the same thing, I guess, with like formal methods, a lot of the time, it's like, it's great at the end of the day that you end up with this proof, but you know, the, the magic, you know, the journey is, is, is really what, what you're there for and, and like spending the time with the code and having the tools be behind you to really double check that you're doing a solid a job is, is where a lot of the value comes from. Like the tools, make sure that you don't get lazy. Um, but in the same way, just writing things down, make sure that, that you don't get lazy. Um, and this, this really seems like something you can build into the culture of, of a company, right? Like it, at first you might have to remind people to do it, but I sort of think once people see the value in this it's, it seems like something you could build as a habit basically that would have positive effects across, across an organization, basically.

Steve Weis (28:55):

Yeah. I think the, the formal methods is a good example because it's the act of defining what it's supposed to do and trying to write out what you're trying to prove. And, and the actual proof is like, okay, great. It does what we think it does having to say, this is the behavior that we actually want it to have. Um, that that's the whole benefit of it. Cause then you have to be very precise and, you know, think about what you don't want it to do and all, all the positive and negative behaviors you want to have.

Shpat Morina (29:20):

Yeah. In a way you want, you want the people who are gonna be developing it to write that out. Um, because the full are those two things are the, the less valuable it is. Right.

Steve Weis (29:32):

What I would love to see as kind of a riffing on just hypotheticals is like, have more of this tied together with the actual code, whether it's documentation, that's side by side, with the code or the specification of the code and have more of this kind of done programmatically where, you know, if I'm gonna do a threat model, I can generate all the new stuff. It's say, okay, here's the new interface? Here's the new calls. Let's go through them and, you know, trace what's, uh, what's been tainted. What's been, you know, sanitized and have that be more, more formalized. I mean, today that a lot of that is kind of manual and, and going through that, and like there do exist tools that do some of these things, but, um, you know, if you have a code pro you know, like a development process that has this stuff checked into the repo, so that'd be a nice thing too, is having like the decisions alongside the actual implementation saying, you know, this was, these are the threats we've looked at.

Steve Weis (30:28):

Here's the understanding of it. You know, if this changes, this should get changed too. And having that where it's not a huge manual burden to update, like that would be nice too. It's having some of these things actually be autogenerated and just kind of like generate your, your, your punch list of, to review. Um, that would be like a, a nice improvement on this. Cuz like I said today, a lot of things are, are manual. You know, I think they're, they're getting better, but it's not, it's not like there's a magic magic tool to do this yet.

Shpat Morina (30:54):

Sounds like a good product idea.

Joey Dodds (30:57):

Yeah. And I, I think that, I mean, calling it a product is, is really important here. Cause I think one of the missing links in something like this is, is like, you know, I think we can kind of imagine how that it might work, but user experience is gonna be critical and the amount of engineering that's gonna go into making something like that. Something that people just don't dislike using every single day. Um, it it's, I think a massive effort that like honestly, a lot of formal methods, people are just unqualified for like how do we make this an nice to use? How do we, how do we help people like enjoy using this and feeling like it's valuable and feeling like it's not in the way. Um, I think there's a lot of sort of crossover research that needs to be done there. And, and that that's, you know, like it very much needs to be a product. Like it can't be a, it can't be a research tool to, to get people to adopt it. Um,

Shpat Morina (31:44):

And that that's true regardless of formal methods

Joey Dodds (31:47):

yeah. I mean just, just about threat modeling or high level modeling right. Or anything that comes out of research. Yeah.

Steve Weis (31:55):

Yeah. Definitely. The de degenerate case is where you get this huge list of just completely irrelevant things to check through. And so I, I've definitely seen that with some of the tools out there and it, it is a balance of making sure that you're not generating a bunch of noise that wastes people's time and doesn't really improve the underlying product. So, you know, there is something to be said of having like some, you know, some, some sense to this and like what, it's more of like a, an art, not a science of knowing what to focus on and, and not getting too bogged down in, in some details, which, which are not as relevant of it.

Joey Dodds (32:26):

Yeah. I mean, on the other hand, we've seen all the pushback on things like UML, that's just like, you know, this is just a, like a consistent way to model things and people are like, well, what does, what does that mean? Like, it doesn't mean anything it's just next to my code and it's not , it doesn't apply to the code. So yeah, I think there is, you know, when we talk about what what's left in research, like the sweet spot in between, um, checking something and providing some checkable value and tying things to code and like, but not being too annoying. Right. There's like the just right level of annoying. And I think, you know, like Lins feel like they've found this to some extent, like most of the things suggest people are like, yeah, that's great. I can do that. Like that's a, that's gonna make things better. Um, but yeah, and type checkers I think have, have sort of hit the same sweet spot, but for, for sort of slightly heavier weight specifications, I'm not sure that anybody's like, just really nailed it on that yet.

Steve Weis (33:14):

Yeah. Like some of the stuff on like the, uh, you know, open API front too, where it's just a standard to specify what the API is and, you know, potentially generate, generate the actual interface. So whether it's like a, you know, open API or like protocol BU UFF definitions, like some of those are nice too, because it gives you a clear picture of like, these are the APIs, like what's going on behind. It's obviously not covered by that, but you can at least see the like, you know, message is that could be sent between different services. And a lot of this I'm talking about is like very like infrastructure focused, like internal infrastructure of reviewing things. There's a whole different world of like, you know, externally facing like web app features. And so, um, you know, it really depends what you're building on and, and what you're trying to achieve. Yeah.

Joey Dodds (33:58):

Although, I mean, things like open API can have of, you know, generating, generating nice docs that actually match what your API does is, um, you know, it's, it takes a lot of effort and it's all too often sort of not done well. Like it's, you know, we've all loaded up an API and it's turns out that like, yeah, the docs were two, two versions behind where we are now. So like you just have to search the search, the GitHub issues to figure what, what the API is now. Um, it's companies struggle with this stuff every day and it's like, it's a thing that needs to be done, but it's not the next feature. Um, so how do, how do we prioritize, um, updating docs? How do we prioritize being secure when, when we actually, you know, need to do something new? Um, and that's a question that applies, you know, sort of across the board.

Steve Weis (34:41):

Yeah. Kinda reminds me of some things I've seen on this too, is once you have those API specs that are very machine readable, you can do things like start fuzzing APIs, and I've seen some work on this and some, some implementations, but it doesn't seem like it's that widely used. And, um, you know, as a way of just like stress testing and API, you throw it up and then have something that will look at the fields it takes and then go through and throw a bunch of, you know, junk values at it and make sure that your, your back end is handling it as, uh, as expected. And the other thing too, is that, um, you know, versioning of things too, of just throwing like old versions of the same message and like just making sure that you can handle like unexpected fields and things like that, like that those are the type of things that you could easily automate with a nice proto above definition or some, you know, API definition.

Joey Dodds (35:28):

So, I mean, having seen lots of environments that presumably have APIs, but aren't doing things like this, what, what's your, what's your insight into why this stuff just doesn't happen?

Steve Weis (35:39):

I think the examples I've seen of that are because of the complexity and it's really like an integration test because you can't just test it on a binary like you would with, you know, like a fuzzing framework. It's like, you're actually calling a service that's running. And that depends on other services. So it's like, you can't just spin this on one environment and test it in isolation. And that's kind of a, a challenge in, in testing in general of just, you know, you have to spin up the whole environment to do like a realistic integration test and, um, having ways to, you know, automatically mock out parts of it or have, have things where you can make that simpler. That that's what, one of these things where it actually benefit developers too of, Hey, we don't need to start the entire environment to test your one thing that you added. It's like, you can maybe start some subsection of it and simulate the rest of the environment or have some like, you know, replayed versions of, of data or whatever it is. Um, so yeah, that, and also just like being able to, of, you know, simulated traffic or recorded traffic at some of these things so that if you do make changes, you can replay realistic workloads at it and make sure that, you know, it's working as expected. Um, that, that I think are some improvements that could happen around this area.

Joey Dodds (36:54):

Yeah. That, that matches up. Exactly. I think with what, um, you know, with the episode we did with Dan, um, this was like the main insight from blockchain smart contracts is like, everything is testable and everything is reproducible in that space. And it lets you do a lot, like anytime there's a bug, you have the exact same environment at your fingertips and you can reproduce it. Like anybody can reproduce it right away. Um, and that's really part of what's enabled that, um, that ecosystem, I think, to be as effective as it is, but that's yeah. Test testability is something, you know, like, yes, a question. Do you have tests? And, and like, sometimes the answer is no, but like how can you have tests is, is a very valid question in your system, right? Like, is, is, does a developer have a way to actually write it to test that applies at different levels of your system is, is a really good question to ask as you build things up. And I think this is another one of those things that if you try to build it up from the start as your system improves, you're gonna have a much better time than if you try to come in once you have a full-fledged system and then, and then tack it on, cuz you're just like, it's, it's work that just accumulates basically.

Steve Weis (38:01):

And this is somewhere I wish companies would share more about how they've designed their infrastructure because you know, if you're starting scratch from scratch, like nobody knows how to do it. Like, you know, your startup and you're doing well. And in five years you may be like five times bigger, but you don't know how to design your system today to make it scale for, you know, 10 X traffic and the future. And there's not as many like things to model. And I know Google's done a good job of starting to publish some things about like from their, uh, like SRE books on how to do this. But this is where I feel like industry could probably share more because it's, there's not really a lot of reason for, you know, a Facebook or Google or Stripe or whoever it is to share about how they design their infrastructure.

Steve Weis (38:42):

Cuz it's, it's not competitive. It's like, you know, I could copy your infrastructure perfectly. It doesn't like give me an advantage over your, your established business. And so I wish that company would share more about this and especially on the security of like security infrastructure, because I feel like people have reinvented the same building blocks over and over and nobody kinda, really interesting, you know, shares them that much or talks about it. And if you go look at these companies, you're like, okay, everyone's got their own, you know, crypto library or crypto tool or they've like using the same tool, but we don't really like share that, Hey, this is exactly what we're doing. This is how we do a management or we're using these vendor products together. So, um, that's where I feel like there could be more, you know, discussion about just what we're doing on our infrastructure and how's it scaled up and, and how we do these things. And so, you know, I think the security community focuses a lot on the offensive side and you know, there's some things that are talking about the defensive side, but I haven't seen much on, you know, how do you build security infrastructure? What works, what doesn't work, what? Or like success stories, failure stories. So that would be something I'd like to see.

Joey Dodds (39:50):

Yeah. Um, so security is, it's such a house of cards, right? Cause as you're writing your threat model, right. Um, um, like good luck finding a threat model that doesn't say, you know, like, like we're just not gonna worry about like somebody loses their, you know, Google, Google authentication gets broken and you, anybody can log into anybody's Google account. Like, okay. Like everybody in the world security has just been blown up cuz they can all like reset passwords through Gmail, right? Like, um, there's a lot more subtle cases where somebody else's security, failings, um, result in sort of cascading security failings. So, um, sharing makes a lot of sense in this world I guess is the point cuz like everybody's threat model includes a number of other companies. Um, and, and if, if we can kind of raise the bar across the security industry, it's gonna, it's gonna help everyone in some sense.

Steve Weis (40:36):

I think so. I mean, I think also just to learn from other people and not have to go through the same, same process of learning of trial and error of what scale and what doesn't. And I think that, you know, one way that this gets transferred between companies is people who have, you know, moving between companies, engineers who are like, oh, you know, Twitter does this, Facebook does this and this is their built system. And so there is knowledge that's that is circulated that way. And so, you know, I think that one lesson from that, it kind of helps to build teams that have backgrounds that are diverse from different companies, different industries, because I don't know you get somebody from like the aeronautical industry and they may have like great ideas on reproducibility and testability because they're dealing with systems that planes will crash and they do it wrong. So, you know, having people from different industries, different backgrounds is probably a big strength there because you're gonna learn that there's probably solutions out there that you are, you're reinventing the wheel for that. Somebody already knows how to, how to solve

Shpat Morina (41:34):

If you happen to hire them. Right.

Steve Weis (41:36):

If you happen to

Shpat Morina (41:37):

Hire on your team and this idea of an open source mentality is what I'm hearing for security and, and, and uh, infrastructure design, um, would probably open that up to even more people.

Steve Weis (41:50):

Yeah. And also open sourcing some of the things that at companies have built internally. I mean, I think that, you know, the challenge with, with some of them is it's so tied into their, their bespoke infrastructure that you can't just lift, you know, a Facebook service and open source and have it be usable for anybody. Um, so, you know, but not everyone is, is a Facebook or Google or, or one of these big companies. And there's probably a lot of, you know, a lot of companies like could benefit from having more open source things. And so like, you know, Google's, um, tin, crypto library is one that I think its great because that's just like, Hey, you use that as crypto library, it's backed by a big company. It's pretty, you know, good, good model to look at of, of just doing the right things by default. And then you don't need to worry about, you know, your engineers using, uh, an insecure mode or insecure key size or you know, it kind of handles a lot of the key rotation for you for so things like that I think are great, uh, developments because, um, again like Google sharing that doesn't hurt Google at all. It just benefits, um, you know, more people using it and um, you know, if you do use it, you, you avoid a lot of problems.

Shpat Morina (42:53):

Great. We should forward this podcast to them after this.

Steve Weis (42:58):

Yeah.

Shpat Morina (42:59):

Well this has been a fun conversation. Um, Steve, thanks for chatting with us today.

Steve Weis (43:04):

Thank you both. This has, uh, been a good conversation.

Shpat Morina (43:07):

Agreed. All right. This is another episode of building better systems. See you all next time.