Building Better Systems

#17: Iain Whiteside — The Twists and Turns of Validating Neural Networks for Autonomous Driving (Part 2)

Episode Summary

In this two-part episode, we speak with Iain Whiteside about the challenges and some of the more novel solutions to make autonomous vehicles safer and easier to program. In part 1, we discuss how Ian and his team formalize and check the different actions and situations that a car finds itself in while on the road. In part 2, we discuss how you might validate the accuracy of neural networks that sense the world, and how to mitigate issues that might arise.

Episode Notes

Watch all our episodes on the Building Better Systems youtube channel.

Iain Whiteside: https://www.linkedin.com/in/iainjw

Joey Dodds: https://galois.com/team/joey-dodds/

Shpat Morina: https://galois.com/team/shpat-morina/

Galois, Inc.: https://galois.com/

Episode Transcription

Shpat (00:00):

This is part two of a two part episode, find the first part linked in the description,

Intro (00:07):

Designing manufacturing, installing and maintaining the high speed electronic computer, the largest and most complex computers ever built.

Shpat (00:25):

So with signal temporal logic sounds like you can encode some of these more complex abstract rules of the road in a way that makes sense to autonomous cars and, you know, our set of rules to be followed, whether tho those are reacting to the right things. It depends on whether the sensors and the machine learning part of the car. As I understand it is perceiving the world correctly. I'm assuming that there's a, there's a lot to get right there as well. And it sounds like you work a little bit on that as well, right?

Iain Whiteside (00:59):

Yeah. If, if, if anything, that's, that's the, that's the harder problem to get, right. Um,

Shpat (01:06):

What makes it, what are, what are the challenges there? So, so

Iain Whiteside (01:10):

The, the main challenge is that in, yeah. It, it sort of well understood that deep neural networks, you know, conclusion on neural networks or whatever are, are sort of by, for the best like, approach that we have for deriving an, an understanding from like large, large quantities of like yes, sensor sensor data in particular like cameras and, and LIDAR. So you can do, you can do some things using classical techniques, but like deep, deep learning, you know, it's sort of taking over the world, right. And as a result of that, you, you, you lose this like standard concept of requirements. Well, the requirements are spot every car in the scene and detect every car in the scene. And of course, in, in machine learning, that's basically expressed implicitly with a bunch of training data and a cost function. When the training data has, you know, binding boxes thrown around all of the cars, and you're basically evaluating, uh, your generalization error once you've trained one of these neural networks.

Iain Whiteside (02:19):

So your challenge is I've got this distribution, which is all my training data. It does that distribution, you know, do I do a good job of fitting to that in the network and, and is that, you know, representative of the world in which I drive. And so, for example, like the most obvious thing is that you, you know, all of your training data is from the summer when everybody wearing shorts and dresses, and there's a lot of exposed skin, and then you start to come into winter and it gets a little bit darker and people are wearing hats and scarves. And all of a sudden your, um, detection accuracy for pedestrians drops really badly, or, you know, there's, there's less lighting or there's, there's the, the sun is lower. And in, in fact this is a problem we had where, but like at a very particular time of the year, the, the angle of the sun in the UK was, was such that, um, you would quite often get, um, like less lens flares on our camera, which tended to look a little bit like traffic lights.

Iain Whiteside (03:24):

And when the angle of the sun was, you know, pointing in that general direction, you would get traffic lights at around the same height in the scene that there would be traffic lights, but they were not traffic lights that, that would, they were lens lights. And of course, you know, because we were building a production autonomous system, there are, you know, there are sort of various different modalities that are like helping you determine that, that this is in fact a traffic light or not a traffic light, including maps, but, you know, you, you, you get these challenges that you don't anticipate and you turn out to have, you know, not quite the, the right distribution. And, and so, so one of the big challenges in, in autonomous systems is not only just, you know, training neural networks with such huge amounts of data to be as accurate as you want them to be.

Iain Whiteside (04:12):

Um, but also then, you know, making sure that you have all the data for, for the world in which you're driving. Um, and this is, this is difficult. Um, so let, let me, let me, let me start actually with, with the actual challenge of like training, training these networks, because it's sort of, it's a, it's a good place to start because in the traditional deep learning world where you would, you know, use internet or something like that, it's the performance of these systems is, you know, like what percentage of images did I classify correctly? And when you move to networks, which are, you know, 3d object detectors, for example, um, you, you have traditionally the, these metrics on the binding box. So, so, well, let's say a 2d object detector. It puts a box around, you know, a 2d image and it says car. Um, and the idea is that the box has to be very close to the car and it's, you know, getting the dimensions of the car correctly, but traditionally the, the metric that the deeper learning community use for, for determining whether or not you've accurately classified dis car as a car, um, is something called intersection over union, which is, you know, basically a metric on how, how much of the binding boxes are overlapping with each other.

Iain Whiteside (05:38):

So how much of the growing truth, perfect information overlaps with the, the region being proposed by the neural network. And, you know, some of the time that number is actually quite coarse, like 50%. And so you could actually have a box which like thinks a car is bigger than it is, uh, or think the car has slightly shifted in some way to this. And so the, the first thing you have to do in like autonomy for, for dealing with neural networks is start to start to get metrics for performance of neural networks that are much more domain relevant. So it's okay actually to who do pretty bad at detecting like the size of a car that is a hundred meters away from you. If you're driving at, you know, 30 miles an hour, cuz you've got time to get a better view of it, it's not okay to make the same error on a car that's like 20 meters away from you.

Iain Whiteside (06:32):

And you know, in certain situations, if the car is in your of travel, uh, you need to be pretty accurate. But actually if it's on a completely different carriage way, well probably we should be accurate, but like we don't have such stringent requirements. And so interestingly, what, what we do at five is, um, and you can almost anticipate this right now is we use signal temporal logic to also express properties of a perception system. So you, you can express simple properties, not necessarily needing, you know, the, the sort of temporal aspect of it, but we try to formalize, you know, exactly what properties are relevant for us. So, you know,

Shpat (07:17):

What's example of, of something that applies to a neural network that might get things wrong.

Iain Whiteside (07:22):

So, so for example, one, one of, one of the things that we do is, you know, we evaluate the domain relevant information in a given image. So where are objects with respect to the road? Um, are they in the same lane as us? Are they not on the same lane? Are they in front of us? Are they behind us? Are they parked on the side of the road? What types of vehicles are they? What time of day it is, what speed we're driving. And you can say that, you know, for the simplest thing is to say that when vehicles are further away from you, we can accept, you know, 10% larger error than when vehicles are close in and, and you can use the temporal aspect of it. So the, the, the next thing that perception systems tend to do is they fuse raw detections over time.

Iain Whiteside (08:10):

So instead of just going frame by frame, you end up, you know, classifying this as object one, and you see object one forward and backwards in time, and you can write properties that, you know, express that you don't forget about object one or something like that. So you can use some temporal temporal properties there. So, so like, that's, that's, this is the sort of first in, you know, easy problem as it were to like, I, I call it based on paper by a, by some research from Waterloo, calling it partial specifications. So it's like doing your best to write some sort of specification for, uh, for a neural network. And that that's sort of what we do there. And then the challenge is to basically show that the, the network meets that specification in the real world, because, you know, traditionally you say, okay, I've worked on, you know, faster RC or I've worked on one of these other like super cool CNNs and I've trained it on, on this data set.

Iain Whiteside (09:12):

And it gets me this mean average precis, uh, or whatever metric you want to use. It's super good. Um, write my research paper, go to the conference and, and, and we're good. I've advanced the state of the art. And when it comes to, you know, practical uses of, of these networks, that's, that's basically the first step, right? You, you, the way, way to view the training and the sort of verification of it is like, I'm convinced that it's good enough that I can take it into the world to, you know, to validate it or so and so on and so forth. Um, and then actually, you know, to me, that's where the, the, the sort of learning journey actually starts where you you're then doing validation in the road or doing validation in simulation and finding where your network is deficient. And in, in particular, what that's really saying is it's validation. So it's actually where your requirement specification isn't sufficient and that's because your missing images of people, um, in winter style clothing or something like that. So

Joey (10:20):

What do you do? I mean, the neural network feels like sort of a black box where you put training data in and hopefully good understanding of, uh, for lack a better word, uh, of that data comes out. Um, but if you recognize the de efficiency, you there, anything you can do other than attempt to improve the training data.

Iain Whiteside (10:40):

Yeah. So, so there's, there's a few different things. So, so I'll, I'll, I'll, I'll talk about efficiently finding the deficiencies, I guess, first, and then we can maybe touch on, you know, way, ways in which you can be a little bit more introspective on the, on, in the black box. So, um,

Joey (10:59):

So the first part of the problem you're saying is just even realizing that you weren't getting the quality of data back that, that you had hoped

Iain Whiteside (11:06):

For. Yeah. So, so this is, I mean, this is, this is the interesting thing. So let's say we plunk our let's, let's just keep, be really simple. Right. We plunk our neural works like onto the car and we go, I, you know, driving autonomously. And like, as we said before, like these networks are making mistakes all the time. Right. And, you know, you're what you're sort of ideally hoping is that these constraints are staying within the contract, right? They're not, they're not too extreme. It's not misclassifying a pedestrian as a bus, and it's not, you know, having, you know, a ghost detection, seeing something that doesn't exist or, um, potentially worse, not seeing something that does exist or, you know, thinking something has a very different orientation to what it has. So these are, you know, these are all the problems you find. And, you know, a lot of the times actually, like if you go out and drive these systems, there will be problems that are quite extreme with the neural networks, but they don't bubble up to a system level problem either because, you know, it's happened too far, or it was a very transient problem or something in the color of the moon.

Iain Whiteside (12:19):

And, you know, the, the sort of flapping of a butterflies wings means that, you know, this complex system of systems didn't do something bad. And the, the sort of historical way, the tradit way in which companies started first, you know, testing these vehicles is with safety drivers where, you know, if the car did something unexpected, it would be reported by the safety driver, either as an event of interest or a disengagement. And then that data would be triaged by engineers and maybe even annotated by humans to, um, to, to give you growing truth. And then you can actually look at the results from the perception system or the neural network and see if they did something wrong that led to that failure. And that's very good because you'll, you'll find these problems, but it's also extremely inefficient because you, you basic I'll claim, you know, 99% of the perception failures that don't in this situation bubble up to a system level failure.

Iain Whiteside (13:25):

And the argument that I would make is, you know, because we're driving, you know, thousands and hundreds of thousands of miles with these systems in test, once you were to scale that up to a production system, that's doing millions and millions of miles. What you would end up with is, you know, some of these situations happening again in a, in a slightly different configuration that leads to a failure. So one of the things that you actually want to do is, is find these surprises a little bit or efficiently that even if they don't necessarily bubble up to a system level thing, and there's one very easy way to do that, um, which is every time your car goes out and drives, you get an army of humans to manually annotate, um, the, the raw data that came in to get perfect information, and then you can compare the against it.

Iain Whiteside (14:16):

And you can sort of see that I'm painting two ends of a spectrum. One being, you know, extremely precise, but extremely costly, and one being relatively cheap, but quite imprecise. Um, some of the work that I've been doing in, in my research is basically trying to find a middle growing where you can get, uh, as much information out of the, the system for neural network failures automatically in, in an unsupervised way, uh, as you can, to basically help find these edge cases and find weaknesses in the system. And some of the ways we do that is by using other neural networks that are bigger and can take advantage of the fact that once you've actually went and done the drive, you can, you know, sweep backwards and forwards in time. So you can, you, you don't just have to work with one image. So you can, you can sort of, we, we call it like pseudo ground truth. So you can automatically get something close to ground truth, which you can then use as a, a way to find word or particular failures.

Shpat (15:23):

And then there's other ways I'm assuming that you're, you're thinking about, um, how to fix some of this sensor sensor stuff

Iain Whiteside (15:30):

As well. Yeah. So the, the, the sort of broad, I, the like the, the way I like to look at it is like, thinking about like leading indicators. So in a statistical point from a, a statistical perspective that if, if you don't have enough data about the, the lagging indicator of, you know, an actual crash or a bad thing, can you use something that's a leading measure of that, which could be like an unexpected failure of the perception system, where it fails to meet the, the contract that you define for it? Um, which I think is the sort of key thing. Um, yeah. And so one, one of the other techniques that we use is we take all of the, we take all of the data, it from a vehicle and we, um, make small modifications to it. You know, for example, add a little bit of noise to the image, um, and see how the neural network performs against that slightly noise image.

Iain Whiteside (16:25):

And in general, you'll find that most of the time, the neural network is quite robust to maintaining its prediction under small amounts of noise. So, you know, you can gradually turn, you know, one to three, four up to 10% of the pixels, just arbitrarily black and white, but the bonding box for a given vehicle will stay roughly the same because it's not relying on such fine features as, um, as you know, it's not the sort of features in the, in the convolutions in the early stages of the network are not so brittle that they can be thrown off entirely by, you know, some small changes, but if you're actually quite weak in certain area, such as, you know, we, we find, you know, dark bands against a dark background or pedestrians that are leaning up against a pole or standing against another type of vertical object, we actually can find that you, you get quite a, a divergent behavior of the network once you add noise to it.

Iain Whiteside (17:33):

That actually, it doesn't take very much for that detection to disappear altogether or to change shape quite dramatically. And you can use that as some sort of leading indicator that actually probably there is something about this image and something about this pedestrian or this van that makes it quite difficult for the detector. And the challenge of course, is then first of all, detecting this type of thing. And then second of all, figuring out exactly what it is. That's making it, uh, a weak detection and then figuring out how to iterate a new version of the model that doesn't have that deficiency. And doesn't introduce any more deficiency. So you can start of thinking about building some sort of regression test suite for, you know, failures of your neural networks and making sure that it never, um, it never regresses, once you find something,

Joey (18:27):

Do do the perceptions systems communicate a sense, like a sense of their, I guess, do they communicate their own certainty about things to the rest of the system? I mean, so like, if you think about, you know, I'm, I'm thinking about, like, I, I've got some junk on my desk, right. And like there's a cup over there and it reads, you know, and I'm looking at you, it reads as a, as a blue shape. And then, you know, the rest of my, my mind's like, well, that's a cup cuz it was a cup before. And like, it doesn't matter that like maybe it reads as like a, a blue ball or something. Um, I know it was a cup cuz it used to be, um, is, is this like, could this, is this just the reality of, um, sensors, uh, like it holds true for our eyes. Is it gonna hold true for neural networks that are, that are processing, um, various inputs or, or can, is it reasonable to hope to do better in those scenarios? Yeah.

Iain Whiteside (19:21):

So, so there are various ways in which you can and, and should train your neural network to, to be a little bit more busy and right, to have something like, you know, certainty of confidence in its predictions. And some of the techniques, um, are actually quite inefficient. So one, one technique is, is called concrete dropout. Um, which, which basically, you know, basically works on the point from the perspective of sampling many passes through the neural network for a given image. And what you do is randomly turn off nodes in the neural network as it were, which is what dropout is. And it's part of a training process that probably we shouldn't get into. But basically the idea would be that if turning off various like, uh, neurons in the neural network, doesn't have large impact on the classification or binding box, then that's a relatively high certainty.

Iain Whiteside (20:17):

But if it has quite a large impact on it, that's a relatively low uncertainty. And there's actually been, you know, a lot of research in, in the community on, you know, how you, how you can get uncertainties out of neural networks. And there's been a lot of research on, can you actually trust the uncertainties? And people have shown that the uncertainties can be utterly meaningless. It can be extremely certain about something that is in extremely wrong about, and people are still trying to figure out why and you know, neural networks are black magic some of the time.

Joey (20:54):

Well, that sounds like also sounds like people though. Yeah.

Iain Whiteside (20:58):

Yes. A lot of people are very, it's very true. Um,

Joey (21:01):

Yeah. And I just keep thinking about optical illusions basically, right? And there's like, there's some of those optical illusions where you can kind of grasp how it works and then there's some of them where it's like, it's so wired into the way we perceive the world that we're just gonna get it wrong. Like the, you know, some of the color ones in particular and how we interpret color and, and relative to shadow, for example, DARS. Um, and this stuff matters obviously for cars as well. Um, but like our built in and model of the way the world works, forces us to, to get that wrong. And there's no amount of additional reasoning that we can apply sort of to override that, which is, which is wild. And I, you know, it's, it's, I'm sure that lots of people are thinking about whether this is the reality of, of all perceptual systems or it's sort of a, something that we will be able to overcome in, in the, in the world where we control neural networks.

Iain Whiteside (21:50):

Yeah. AB absolutely. And like BA basically like what, what certainly what we are doing from the point of view of like verification and validation is sort of wielding as many tools as we possibly can. And, and, you know, building novel tools to, to get a is good to, to get your confidence to like at a certain level, but at, at the end of the day, like we are attempting to build a safety critical system and you need to in fact, have, you know, different sensing modality. So you're not always putting complete faith in your own network. So for example, you can use free space detectors or you can use LIDAR or radar or various seller different modalities in order to figure out like, you know, is everything in sync that, that there's definitely something there. And that's my, my sort of research path from formal verification has sort of led me through like the safety assurance can immunity.

Iain Whiteside (22:49):

And one of the things that the safety assurance community actually does is they build, write these things called safety cases, which argue that the system as well as a whole is safe because there are enough mitigations and barriers, um, that stop, you know, an individual single point of failure from propagating through. And so when it comes to like deep neural networks, you will have, you know, redundancy in different sensors. So you can never just rely on one, then you will have monitors in place. So for example, you can have monitors, which are checking that the predictions from the neural network have the general shape of cars. So if you it's a car, does it have the general proportion of a car? Is it driving it the reasonable speed of cars? And so you can like build more and more of these mitigations in place to help you build more confidence.

Iain Whiteside (23:41):

And the challenge is what do you do when you end up in a situation where, uh, you think something's wrong and you have to feel operational and you know, that that's a whole different ballgame. Um, but yeah, but, but what I'm sort of most interested in and focusing on is, uh, how, yeah. How to get confidence in these neural networks and how to start building something a little, a bit more like introspection into, you know, how they're operating and find ways to find the failures and understand why that is a particular failure for the network and how you can then actually start to improve other than just, you know, plunking that image that previously failed into your training set with, um, you know, an annotated version of it. So that, um, almost certainly you'll not have that point failure again, but it's, you know, you discover that your network is weak against objects against the dark background. So you solve the general problem rather than the specific problem.

Joey (24:42):

Right. So we're, so we're so, so sort of early on in these neural network applications, that, that, it's a big part of the value that you're looking for is just to define where they fall down and where they don't so that we can try to build reasonable systems around them.

Iain Whiteside (24:56):

Yeah. Yeah. Like we're, we're definitely in the early days of applying formal methods and, and you know, other techniques from the verification community in, in, in the keep learning and, and anybody who has any ideas send them to me.

Shpat (25:13):

Um, I had a quick question about people who might be listening that are work, are putting together quick neural networks from a bunch of data, especially visual that maybe it's not as critical in terms of safety as, as you, what are some tips of getting, getting those to, to function a little bit better?

Iain Whiteside (25:32):

So, so, so the, the, the one, the one thing I would say is like, think carefully about your specification, like re really think carefully about, you know, exactly what good performance looks like from, from in your own network and what the domain you're attempting to apply it to and how you can like help the network know when it's outside of its distribution. And because that starts to become more and more important as you rely on that for any downstream task

Shpat (26:09):

And, and your, your suggestion of specifying that is then to use it, how to use it in, in terms of selecting your, your, uh, input data or like your training data, or change the weights accordingly, or, or wrap the output around something more, um, sort of that can filter out the

Iain Whiteside (26:29):

Results. So, so it's a little, it's a little bit of three things. I think one is, you know, under understanding your training data a little bit better, um, and understanding the distribution of it. One is understanding, you know, having the right cost function, um, when you're training the network and then the third is having the right evaluation function when you're sort of taking your, your verification

Joey (26:53):

Set, but you don't need SIG. I mean, you don't need signal temporal, uh, logic to write these things down, right. Like if I have a pen and paper or I have a markdown file, and I set out to say, like, what do I want out of this system? Um, that's gonna help me make, uh, appropriate decisions along those three axes better, even if I don't have all of the, the framework that, that you all have built up. Um, just being explicit is, is almost never a, a bad thing. And taking time to think about your systems

Iain Whiteside (27:22):

For, for sure though, of course, I encourage everybody listening to your podcast to, um, like embrace signal temporal logic for specifying their life, um,

Shpat (27:33):

Their life, their

Iain Whiteside (27:34):

Life, everything. No, no, I'm just kidding.

Shpat (27:38):

I wanna see that always eventually that source called document. Yeah. Eventually flustered always eventually

Joey (27:49):

A little better than, uh, than just eventually

Shpat (27:53):

So, well, eventually we should end this podcast. Yeah. Excellent. So, yeah. Thanks Ian. For, for talking to us today, this was so fun. Um, I could talk about this all day, um, but we have to, we have to stop and I know it's almost bedtime over there. Yeah. So thanks for spending this time with us.

Iain Whiteside (28:13):

Yeah. Much. Appreciate. Thanks. Thanks for chatting me folks later.

Shpat (28:18):

Bye. Yeah. Thanks again. This was another episode of building better systems. See you all next time.