Building Better Systems

#12: Alex Malozemoff & Marc Rosen – Censorship Circumvention with ROCKY Balboa

Episode Summary

We chat with Alex Malozemoff and Marc Rosen about a recently published paper, "Balboa: Bobbing and Weaving Around Network Censorship," on a novel system for censorship circumvention and its corresponding implementation. The paper authors also include James Parker.

Episode Notes

We chat with Alex Malozemoff and Marc Rosen about a recently published paper on a novel system for censorship circumvention, and it's corresponding implementation. The paper authors also include James Parker.

Watch all our episodes on the Building Better Systems youtube channel.

Joey Dodds: https://galois.com/team/joey-dodds/ 

Shpat Morina: https://galois.com/team/shpat-morina/  

Alex Malozemoff: https://galois.com/team/alex-malozemoff/

Marc Rosen: https://galois.com/team/marc-rosen/ 

Paper referenced: Balboa: Bobbing and Weaving Around Network Censorship: https://arxiv.org/abs/2104.05871

Episode Transcription

Intro (00:02):

Designing manufacturing, installing and maintaining the high-speed electronic computer, the largest and most complex computers ever built.

Shpat (00:22):

Hello everybody. And welcome to another episode of building better systems, the podcast, where we explore tools and approaches that make us more effective engineers and make our systems safer and more reliable. My name is Shpat Moreno.

Joey (00:36):

And I'm Joey Dodds.

New Speaker (00:37):

Joey and I, um, work at Galois, uh, computer science, R and D lab, um, focused broadly on hard computer science problems. Um, today's episode is a little bit different. Uh, we have with us Alex Malozemoff that's a principal researcher at Galois as well. And then Marc Rosen, I'm a research engineer at Galois mark and Alex have, uh, recently published a paper on a novel system for censorships or convention. And, uh, we thought we'd hear about it cause it's really interesting stuff. Um, Alex and mark, welcome. Thanks for having us. Thanks for having us. It's good to get to have you here. Um, if you're okay with it, I'd love to hear kind of, uh, whoever wants to start kind of in your own words. What, um, well, first of all, what the project is called and w at a high level, what it is.

Alex (01:28):

Sure. So, uh, this project, um, about building a resilient and anonymous messaging system, uh, this has a lot of connections with censorship in general. So, uh, the goal of a messaging system is to be able to operate in a country or an environment in which there's a sensor, trying to block certain types of communication, the project that we are working on it now, while it's called Rocky. And this particular system that we're going to talk about today is called Balboa. And, uh, Volvo is a, is a system for, uh, circumventing censorship broadly using, or it operates, um, by working under, uh, the TLS protocol, essentially. So it uses existing communications that are operating over TLS and does various tricks to reuse that communication, to send to Cobra debt.

Marc (02:29):

It was with some existing, uh, point to point, uh, censorship, circumvention messaging systems is that it's very, very difficult to actually make a communication that looks exactly like something else. So the trick that we, or the goal of bell boa is to basically take some communication that the sensor doesn't want to allow and make it look identical to something that the sensor does want to allow so that they can't block one without blocking.

Shpat (03:07):

I see. So this basically piggybacks on, on PLS, you said, yes,

Alex (03:11):

Exactly. Or censorship in general, the sensor could easily sensor content by blocking everything right. You don't allow any communication through then. It's very easy for a sensor to censorship, but obviously there's a trade-off here because the sensor would want some communication to go through. Otherwise you don't get the, the sensor doesn't get the benefits of having existing in the broader world and being able to connect using the internet, um, while also being able to set your content. So there's a trade off that the sensor has, and that's really what we're exploiting. So the sensor needs to let some communication through, but it only wants to let the communication through that. It doesn't want a sensor. And so if we can hide the Cobar data that we're sending in communication, then that looks innocuous. It looks like something that, uh, the sensor is, is fine. Um, letting through then that's, that's a way we can kind of exploit this concern that the sensor has. So

Shpat (04:13):

Presumably there's something that didn't exist in previous censorships or convention kind of ways and mechanisms in Balboa that you're proud of or excited about what, what makes it special?

Marc (04:27):

Yeah. So the key difference between Bebo and previous similar approaches is that other, if you want to try to make network traffic look like something in particular. So let's say you want to, uh, make browsing the New York times look like just listening to music streaming, pretty a lot of previous approaches decided to manually code some sort of model of what, uh, of what internet radio would look like. So they would say, oh, first you send a packet of this length. And, uh, it contains, uh, with this particular timing. And the problem is that actually getting that exactly right, and exactly mimicking all of the little changes, some of which the sensor can actually insolence is incredibly difficult. So the key thing that Balboa does that's unique is bell boa is a framework which lets us take an existing unmodified application, binaries, or example for VLC and use this unmodified binary by dynamically injecting some code into it, which lets us use VLC. VLC runs just as normal generating the traffic that it would normally generate, but we stick some extra data in there that we can use to communicate messages.

Alex (05:50):

Yeah. And that's, that's a key point is using unmodified existing applications. So there are several censorships are convention tools that say use a specific version of Firefox. However, Firefox gets updated all the time. And actually there's been evidence that certain sensors will look for a particular version of firebox because that is the version that is used for these censorships or convention tools. It's maybe two years out of date for what the current version of PowerPoints is. So very small number of users are using it and they tend to be the ones using this censorship circle mentioned tool. So there's all of these, these subtle things that the sensor is picking up on to detect these tools. And really the key insight of Volvo is let's run the applications as is. And that the behavior of the applications is exactly the behavior of the application without any covert data going through. And then we use these, these tricks of dynamic library, injection and carefully swapping out the data, such that the traffic patterns and everything about the, um, communication patterns of the applications is identical. And yet there's Cobra data going through.

Shpat (07:01):

So you're sending data, but if somebody is looking at the data coming out of, let's say, I mean, BLC doesn't necessarily stream, but let's say, um,

Alex (07:10):

Let's yeah, let's take firebox, right? So you're browsing some website that involves, you know, we get requests and then there is you get the text, you get the images and whatnot. And, and that, uh, some of those images, um, when being sent out by the server can be swapped out with the poker data. Now from the point of view of the adversary, again, because we are using TLS because of all this communication is encrypted. They see what exactly they would see if the user is browsing the normal website, but under the hood, instead of the browser, instead of the firebox user, getting that image, they get data

Shpat (07:48):

Interesting. Um, if you're, I don't quite know how DLS works, but is there a way to analyze whether the amount of data was different from the image that you were supposed to receive? Like I'm, I'm assuming you've done the homework on, if somebody's looking at this, it looks exactly the same.

Marc (08:05):

Yeah. Uh, so yeah, TLS, just like, uh, many other encryption protocols doesn't hide metadata about the content. So in particular, the lengths of TLS records and also the time at which these records are sent, and it turns out this information, this metadata can actually weak wise information. So for example, there are papers showing that if you can just observe the packet timings and lengths, you can, uh, in some cases to figure out what websites someone's browsing, even though you can't actually see, uh, the content at all, uh, similarly with, uh, like Skype, uh, audio encryption, uh, people have done work to show that even though the audio data is itself encrypted the timing and length of the audio packets that are being sent, let you figure out what people are saying. So, uh, the, one of the things that we make sure to do with Balboa is make sure that we don't change any of the metadata that the adversary can see.

Marc (09:04):

So either the sizes or the timings, the timings we do, right, we have to do some extra processing. So the timings are going to be slightly different, but we've put in a live effort to make sure that these timings are in the microsecond range. And beyond that, we've been working on various evaluation techniques using machine learning, random forest classifiers, to try to look at various features of the TLS connection that the item Sarah can see this metadata to try to see if an adversary would be able to tell the difference between Balboa and not bell Bella.

Alex (09:42):

And one key point, I left out in the description. So I mentioned that the Firefox user, instead of getting the image gets the Coker data, right. But actually what we do under the hood is we swap back in the original image that was meant to be sent. And so this gets to something that we haven't touched on before, which is that we require that the two users have what we've been calling a traffic model. And this is, they essentially know ahead of time. Some of the data that is going to be set up between the two parties. So imagine you're connecting to New York times, there's the New York times logo, right? The user can know that ahead of time. Uh, maybe there's some, uh, specific blurbs or something that always appear. And again, that's something that the user can be aware of ahead of time. So while under the hood, when the web server is sending out, instead of sending out, uh, the same New York times logo, it sends that Coker data under the poet on the client side, it receives the covert data and then swaps back in the New York times logo. So from the point of view of our Fox user, they see exactly what they would expect to see when browsing your,

Marc (10:52):

This is important also in terms of making sure that the adversary can't see what's going on since you could imagine, for example, if I'm streaming a video, the rate at which I'm going to request new frames of video is going to depend on the rate, which I'm consuming those frames, which in turn depends on the amount of like CPU processing I'm doing to decode them. So if we were to just say, okay, we're going to throw away all the video data and just be like, here's a black square, then it's possible that the adversary could basically see, oh, this machine's consuming data at a weird rate and then detect the Bubba was running. So as a result, we make sure that the original data is swapped back into place.

Joey (11:37):

Can you help me understand this? I guess, a little more from, from the user side? So I'm, uh, some pair of individuals, it sounds like is, is who would likely want to use this? And maybe we have a channel that we can say some stuff, but not so much. And sort of say, we're going to use rainy, use this technology somehow on this other channel. And then it's going to look like installing something on our K could you just sort of take me through how that, how that might look, I don't know how much of the work you've done to make this usable. Um, but, but just to understand, I guess, how, how this would feel, um, as a user, um, I think would, would help me understand what's going on a little

Marc (12:15):

More. Yeah. So I think the, well we're talking about, um, the user using Firefox and practice, if we were actually to give someone an application say, here is a central sips or convention, uh, tool, the user wouldn't be manually going in and clicking links and doing stuff in sort of the shadow version of Firefox. Instead, there's a program that we have, which simulates, oh, you want to go to this particular or special website and click various links and interact with it to make Firefox ends up sending and receiving data. So in terms of like what the actual experience would be like for a user, basically, we're just building a communication tool. You could totally imagine that if you want it to, you could build this into a matrix or some other front-end for actual communication, but you can think of this as basically the TCP of censorship circumvention. So,

Alex (13:14):

Yeah. So from the point of view of how would you deploy this? So we do rely on this traffic model, the shared information that the two parties have. And additionally, we rely on a symmetric key that the two parties both share from there. One of the parties say for using web browsing can launch a, you know, a web server that hosts, let's say a computer science block or something. And what's another nice feature of GABA is that my blog can be accessed by anyone because we are doing all of this, uh, this swapping out under the hood. Any user that's not running by Balboa can access that blog, read the blog, whatnot. And then the user on the other side that is running Balbo and has the shared information could then either manually or through, like mark said, some automated process access as blog, manually clicking through the links, or by using, uh, a script for prescriptive Firefox user actions to make it look like a normal user. And from that, get, get over that. So there is some setup required. There is some symmetric key key set up in this, uh, deployment of these traffic models that is required. But from there it really walks like any normal connection to that, that website.

Joey (14:37):

And that's necessary to some extent, because the more people use different types of blogs, for example, or different settings, the harder it's going to be to figure out what's going on in this setting. Presumably, um, if everybody used the same pre-packaged blog,

Alex (14:51):

Yeah. You can even imagine hosting this on a very popular website. I imagine you're at times would be blocked in and many of the places that could use something like a boa, but, um, let's, let's use your time as examples floating. It's not blocked, right? You could imagine running Val boa, even on your website and the parties that have the particular information required to get COVID data from the New York times website can and every other user and access New York times website as they normally would. So it could even be a, I guess, a high value website that the sensor is not willing to walk.

Joey (15:28):

So you can, you can, you can broadcast information essentially to people. Yes. Um, although I guess the, the key exchanges, right. Cause right. There's two sides of this one is that, um, you don't want people to see the thing that you want to say that they might sensor, but the other is presumably you really don't want anybody to know you're doing censorship evasion. So keeping things secret is also important from that standpoint.

Alex (15:56):

Yeah. But even if the sensor knows, so, so let's suppose there's a website again, let's just use your company's example is an example that, uh, the sensor does not want to block for whatever reason, uh, even in that case. So the sensor knows that New York times is running Balboa under the hood. It's still cannot know which particular users are accessing New York times and are running Volvo. Even if it's monitoring all of the connections to the New York times,

Shpat (16:25):

All of us, all of the traffic looks exactly the same. It sounds like so short of blocking all of the internet at it, you know, presumably, um, that's at least

Alex (16:36):

Yeah. Or, or, or the particular, uh, website that, uh, yeah, because so one of the, one of the downsides of elbow versus some other tools is it's a direct link. So the two parties have a direct TCP connection to each other. Other tools now use third party services to redirect traffic and whatnot to make it harder to block. So if the sensor say knows that the New York times is running Volvo and is willing to block that website, then that's one way of defeating Bebo. But, uh, if either they don't know that the website is running Volvo or they're not willing to block that website. Um, those are the two settings, I think we're on the loop work best,

Shpat (17:21):

But is there a way to detect whether their they're servers running?

Alex (17:25):

Hopefully no. Um, I guess mark and they go into some details, but I guess mark carefully designed a piece to, um, uh, avoid these types of, of attacks. And there's a lot of subtle attacks that we identified in the process of developing the law.

Joey (17:45):

I mean, of course, one way to know would be to get access to a key, which is kind of bound to happen eventually, but it sounds like even in that case, other people using the channel, wouldn't, wouldn't be, um, exposed, which, which is still a nice property to have.

Marc (17:58):

Yeah. And in fact, the, our protocol inherits the perfect forward secrecy properties of TLS. So even if, uh, today you compromised a server running Balboa, even if you couldn't go back in the past and figure out which connections were using Balboa, or we would be able to do is, uh, detect feature connections, running Balboa. I see.

Joey (18:21):

So if a keyword compromised, you would be able, then you would be able to distinguish users that, that were running Balboa from users that weren't running Belvaux,

Marc (18:29):

But we, uh, only if you also compromise the server. So if, uh, let's say I am a client and I'm given the key, so I can, uh, talk to the New York times. I have no way of knowing which of the other connections are also using the same key to talk to the New York times. I see. So you would need to, you would need to compromise the server and to know the key.

Joey (18:49):

That sounds, I mean, that sounds, that sounds pretty robust. Um, it seems like a reasonable security property, for sure.

Shpat (18:55):

Uh, so there's a research paper that's been published, right?

Alex (18:59):

Yeah. Use next security this year and we will be making the implementation open source as well. Oh, great.

Shpat (19:05):

So this will be open source and the, the implementation, um, I presume it's a prototype, like, like a lot of the things that we do, um, it sounded like there was an interesting kind of mix of, uh, in terms of implementing it. Right. Cause some of our viewers, uh, you know, w would be probably interested in that. Um, it sounds like there were some unique challenges in implementing something that's, um, kind of this, this, this mix of crypto and systems programming and networking. I wonder if, if you could tell us a little bit about that.

Marc (19:38):

So I think, uh, for one thing, right? Part of the emphasis of Balboa was realizing that while in theory, you could say, oh, I'll just code something that looks exactly like something else in practice. That's really hard. So yeah, part of the design goal of Balboa was to make it so that you could write code. You could realistically write code that is secure, uh, without needing to without needing to get everything perfect in particular, the way in which we achieve this is by using dynamic library injection into an existing binary. So the way this works is that if you have something like VLC, VLC talks to the operating system to say, Hey, open a connection or send data along this connection. And we intercept those calls to a particular network socket, and we can go in and say, oh, uh, this is TLS data extract, uh, the TLS keys, and basically do a surgical manipulation of this data before it hits the operating system.

Marc (20:48):

Uh, and that's how we can insert our data. That sounds hard. Yeah, it's, uh, it's difficult in part because, so for one thing, uh, we need to make sure that the timings are fast, right? If we're doing, uh, rewriting on the hot path, we are rewriting every single read rate system call. So if you want to stream data at 10 megabits a second, like that's a lot of reads and writes. There's a lot of data there. So we've put in a lot of effort to making sure that, um, the timings are fast. So we're looking at between double and triple digit number of microseconds for each rewrite, and we're still working on bringing that down, even lower and beyond that, uh, one of the other complications is that because we are looking at rewriting, just a D read and write system calls, there's a bunch of sort of extraneous reads and writes that we need to filter out. The filtering part is less difficult, but filtering safely is because for example, uh, there's all sorts of contexts in which you might want to read or write. So for example, the, if the, if in a signal handler a process is like, Hey, I want to write something to the screen. We need to be sure that we're not doing anything in our read and write operations. That's not safe to do from a signal handler. So the filtering itself can actually be very difficult to do safely.

Alex (22:19):

Yeah. So the right, the high level of Balboa is kind of like an obvious, in retrospect, it's like swap out data with Cobra data and then swap it back in, but actually getting that right and getting that to a point where you don't introduce the signals that a sensor can pick up on is surprisingly challenging because you have to deal with timing. You have to make sure you're not centrally. You have to make sure you're not changing the behavior of the application in any way. And you're fiddling with some very low level stuff. And that's surprisingly difficult.

Shpat (22:51):

I presume some of this stuff you've covered in the paper. Um, some of these lists we'll make sure to link to that.

Joey (22:59):

And so I guess we, we, we talked about, I think you, I think you briefly mentioned this earlier, but I want to revisit it. You talked about all these challenges and making sure you get it right. It sounded like you're doing some things to check your own work and see that you're doing it right. Um, to, to sort of examine these timing things and make sure that, you know, for example, a future commit, doesn't, doesn't introduce a timing thing that used to be there. Um, are you doing some work in that direction?

Marc (23:24):

Yeah. So in part we have, uh, like integrate unit tests and integration tests, continuous integration set up with all these, which, uh, we found, uh, very helpful for development, especially on tight deadlines. If, uh, we need to make a change. It's nice to know that this small change that really shouldn't break anything, didn't actually break everything. Um, beyond that for specifically the detectability side, we have, uh, also this augmented test suite, which collects a bunch of packet, captures acts as the adversary in very simulated network conditions and, uh, captures packets and analyzes the packets to extract various features about statistics for TCP packet size delay between packets, uh, various other, uh, TCP statistics. And, uh, then feeds that into a random forest classifier to try to determine is it possible to distinguish these two data sets with any amount of accuracy? Um, so we've put a lot of time and effort into that, especially because this particular detectability model is very much a best case scenario.

Marc (24:37):

This is a supervised learning technique, which means we have a data set of this is music streaming without Balboa. This is music streaming with bell boa. Can you tell a difference between these two in practice? There's like, we've actually been looking, is there, what, like, what does music streaming look like? And like, that's a very hard question to answer. So this detectability analysis is very much a, a worst-case scenario for Balboa. Uh, but the good news is that, uh, Balboa does pass. So we are able to see low rates of detectability, even in this worst case.

Alex (25:14):

Yeah. And we found some kind of interesting things that I maybe wouldn't have expected. So we've tried this say in the audio streaming case. So we have elbow working for audio streaming and web browsing and the audio streaming case, we ran it, these classifiers against four different audio streaming clients, you'll see M player audacious and MBB. And, uh, kind of interestingly, the detectability differs across these four. Like it's never very high it's in the, I don't know if the, the 70% where 50 is kind of the worst case detecting, cause you're, you're randomly guessing essentially. Um, but it just, it's kind of an interesting observation that all of these audio clients have different traffic patterns, different audio clients have different enough traffic patterns that they affect the detectability, which I dunno, I didn't, I did not expect going in

Marc (26:16):

Building off that the classifier is able to detect with near a hundred percent accuracy, the difference between for example, the VLC and MPlayer traffic patterns. So one of the cool things about Bubba, because we just operate on this unmodified binary, we can use the same code base and literally the way we tested this was just like, let's just pull down a bunch of random media players and just plug them in. And they all worked. But again, interestingly, they all produce very distinct traffic patterns, which means that if an adversary was like, oh, I heard the center strips are convicted. People are using MPlayer. We can just like plug in something else. And we're good to go.

Shpat (26:48):

Fascinating. Amazing. Um, what's what's next with this? I mean, it sounds like this it's going to be, the implementation is gonna be open source. The paper's going to be in use next security. What do you, what else do you have in mind? The future of Balboa?

Alex (27:02):

Yeah. So there's, um, a lot we can do to improve the, I guess, performance and this size of these shared models. So one of the big downsides, or I guess two of the big downsides of boa is we rely on only being able to swap in covert data for specific selected items like New York times logo or whatnot. So if that is not a big portion of the data that's being transferred, you're not going to have very high over throughput. And if you want to have very high Cobar throughput, you need these traffic models to be very large. So let's take the audio streaming case, right? You can imagine having, uh, a jingle that the client knows. And whenever that jingle plays between songs over data gets sent, that would have a very low COVID through. But the alternative is you have the collection of all of the music that will be played on that audio stream ahead of time.

Alex (28:04):

And then you can swap out all of that with poker data, but now your traffic models are huge. And so you need some method to get these traffic models to the appropriate parties. So those are really the two big challenges to focus on going forward or the traffic model piece. Can we dynamically generate these traffic models in some way, start with like the jingle. And then as the client collect songs that it's been listening to notify the server in some way that, Hey, I now have these songs that I can use in the traffic model and then, uh, supporting higher throughput protocols or, uh, like, so something like dash for a video over HTTP. So you can send more covert data, um, over a, basically just a higher throughput over channel.

Marc (28:54):

Uh, a couple of interesting things there as well. So first in terms of dynamically generating traffic, one of the approaches we've been exploring is there's, um, a large number of, uh, pieces of software, which can dynamically generate music. Uh, I don't think it's going to win any awards, but you know, it's all right. Something music for computer generated music. So if, uh, the server basically just picks a random seed, uh, to use, to generate the music, it can communicate just that seed it's the client. And then the client can basically generate music using the same seats. They don't necessarily have to have prearranged the exact music they're going to science. They just need to prearrange this small, concise seed, but it still generates new music each time.

Alex (29:40):

Yeah. You could even take to the next level and imagine using like, um, AI, like GPG two for generating text, right? And you can have a blog that has red GPD three, I guess, is the latest and greatest, but yeah, you can have a blog that's AI generated, but that initial seed to seed that model is shared between the two parties. This is,

Joey (30:01):

This is all just going to end up being used by metal heads that don't want their friends to know they're listening to Taylor swift. Isn't it? That's where this is all no shame. That's all right. This is all going to start out with the best of intentions. And it's just

Shpat (30:15):

No, no shame in that. I could talk about this all, all day, super interesting work, and I'm glad the world we'll get to see it. Um, Alex and Marc is a pleasure hearing about this. Um, and I guess we'll see you next time. You're, you're writing about some groundbreaking research. That's going to change the world. Hopefully we'll have more on the way. Fantastic. Yeah. Thanks for having us. It was a pleasure. Thanks for being here. This was another episode of building better systems. We'll see you next time.