388: Durable Objects

Robert and I jump on to chat about Cloudflare’s product Durable Objects. It’s part of their Workers platform, which we already use at CodePen a good bit, but with Durable Objects…

Global Uniqueness guarantees that there will be a single instance of a Durable Object class with a given ID running at once, across the world. Requests for a Durable Object ID are routed by the Workers runtime to the Cloudflare data center that owns the Durable Object.

In their intro blog post a few years back, they call the “killer app” real-time collaborative document editing, which is obviously of interest to us. So we’ve been tinkering and playing with how that might work with CodePen’s future technology.

Time Jumps

00:25 What Robert’s been up to
01:08 What is a durable object?
02:52 Using Workers with durable objects?
08:13 Sponsor: Equinix Metal
08:51 How does clientside work with Workers?
15:58 What if the durable object dies?
19:47 Cost of durable objects

Equinix Metal’s Startup Partner Program helps early stage companies level up. Their experts work with startups like Koord and INVISV to build their competitive edge with infrastructure. Equinix Metal provides real time guidance and support to help startups grow faster. With up to $100,000 in infrastructure credit, access to Equinix’s global ecosystem of over 10,000 customers and 1,800 networks, they might just be what you need to take your startup global.

Visit metal.equinix.com/startups to take your startup to the next level.

Transcript

[Radio channel adjustment]

Announcer: Today, on CodePen Radio.

Chris Coyier: All right! This is CodePen Radio #388. I have with me here Robert from Team CodePen. Hey, Robert.

Robert Mion: Hey, Chris. Good to be back.

Chris: Yeah, be back on the show. Yeah, it's been a while since we've had you on, but you're a great podcast guest because you have plenty to say about things. You've been tons of different work at CodePen since I last talked to you, and we just picked one out of a hat, really. There are all kinds of things we could talk about.

But recently, we've been exploring some more of Cloudflare's offerings. There's one of them in particular called Durable Objects that we'll talk about in this show that we are eyeballing up in heavy ways and plan to use in production, although are not yet. So, we can't say a guarantee this is the best thing, but it's looking pretty good. So, I thought, nobody knows more about them on our team than Robert does, so let's do this. What the heck is a durable object, Robert?

00:01:18

Robert: Yeah, well, I've actually been looking forward to talking about these because they're a really interesting technology, especially for me in the context of stuff I've worked on in the past. I'm going to do a terrible job of describing Cloudflare technology because I'm sure they do it much better. Durable Object is basically a pattern for maintaining shared state across lots of clients at the edge of the network where I think Cloudflare really specializes with the developer offerings that they've been putting forth.

Chris: Is the whole point real-time connectivity of sorts? Like you wouldn't use this if you didn't need any aspect of real-time-ness?

Robert: I don't know. That's actually a great question, and it is, certainly, in the context of what we've been using them for, which is really collaboration. I think that's a fair statement, but I wouldn't restrict it to just that. I'm sure there are other applications out there where you have workers, Cloudflare workers, which are functions that run at the edge. But you want to coordinate state between those workers or, rather, between a set of clients out there and a more persistent mechanism like the key-value store that they offer doesn't quite meet your needs.

Chris: Foot the bill. Yeah, okay. Well, you mentioned workers, which we can't avoid here because I think I had this screwed up in my head for a while is that it's not like these things are -- they're hierarchical, in a way. I can't just talk to a durable object. I have to have a worker talk to a durable object.

Robert: Exactly. Yeah, a durable object is a resource that you would take advantage of in your worker. You have your worker functions out at the edge, and I don't know how deep we want to go for our listeners.

00:03:23

Chris: What's the very quickest way of doing a worker? Not only is it a cloud function, but it's like a cloud function that intercepts a request, which is a little unique, right? You don't have to call it. It just is in the way of a request. [Laughter]

Robert: Yeah, exactly. I mean it's a JavaScript function that lives out on the cloud and that's really triggered (most of the time) by an HTTP request that comes in from somebody's client. There are other ways of triggering them, but that's really the first order.

You have these functions that are really sitting out there designed to respond to HTTP requests. But if you've got multiple clients that all want to collaborate and work together, well, like the KV store is really a data store. One of the things that distinguishes a durable object from a worker or a KV store is that you're actually running code that really sort of maintains this state for you.

So, the great and classic example of an app where you would use this is a chat application. So, you'd have lots of clients out there all connecting to Cloudflare's edge infrastructure, and you want them to talk to each other.

Well, a KV store is literally just a key-value store. You could have a worker function, but those worker functions are sort of coordinating through the KV store, which is this--

Chris: It doesn't feel very natural, does it? Would each chat message be an individual KV pair or something? I don't know. Maybe. But that doesn't feel very good.

Robert: It's a little square-peg-round-hole-ish. You know? What you do, instead, is you basically have your worker connect to a durable object, which is shared across all the clients. Then that durable object is a chunk of JS code that you provide.

You're now effectively running JS code right on top of the state you want to manage because, by the name, it's a durable object, so it's an object much like an instance of a class. It actually is an instance of a class. That's what you're writing is a class that gets instantiated, and now you have an object that's out there in the wild.

I'll sort of ramble on here, but one of the things that I find really exciting about these is that they solve a really nasty problem in software architecture, which is, how do you scale connectivity and state out at the edge across lots and lots of clients?

One of the projects I did early on in my career was I worked on the AOL Instant Messenger team.

Chris: We all remember that!

00:06:10

Robert: Yeah, right. There was this sort of story that went around about how AOL trying to scale the Instant Messenger platform had a guy that they'd hired whose job was to write the network interface driver so that they could support 60,000 users connecting to one server out on the edge. Then they had a farm of these servers. But God forbid the server that you as a user happened to connect to went down. You'd be offline, and that was one of the hallmarks of the AOL Instant Messenger system was you'd have these waves of disconnection that would go through when one of the servers went down.

That just doesn't happen anymore. With Cloudflare, with durable objects, you can write a durable object that's a chat client and push it out there in, like, half an hour. You don't have to worry about scale.

Chris: To put a point on this chat thing, though, there's not just one durable object that is Robert's chat app. It would probably be like one durable object per chatroom, right? Me and you are going to chat. We'd make a URL called robertschathouse.com/12345, and 12345 would somehow map to the durable object. That's one way you could architect it such that the storage and chattiness is scoped to just the people at that URL. Right?

Robert: Yeah, exactly. Yeah, so when you access a durable object, you basically use an identifier, and that mapping of what you're mapping from to get a durable object is sort of up to you. In our case, it would be, we need a durable object for Robert and Chris's chatroom. And so, we would get one of those. That lives out, and that lives in the infrastructure, in the cloud infrastructure, which is interesting in its own right.

00:08:15

[Guitar music starts]

Chris: Equinix Metal Startup Partner Program helps early-stage companies level up. Their experts work with startups like Koord and INVISV to build their competitive edge with infrastructure.

Equinix Metal provides real-time guidance and support to help startups grow faster with up to $100,000 in infrastructure credit, access to Equinix's global ecosystem of over 10,000 customers and 1,800 networks. They might just be what you need to take your startup global. Visit metal.equinix.com/startups to take your startup to the next level.

[Guitar music ends]

00:09:00

Chris: I do have another interesting question about this. [Laughter] It's almost like tech therapy for me, like, I need to understand this tech better so let's do a podcast about it.

You could have a worker respond to a URL, and the worker isn't injecting JavaScript. The worker could just - I don't know - do nothing or manipulate one little piece of HTML before it gets to the browser or anything. So, then it does get to the browser. That page might not have any JavaScript on it at all.

How do I say I want some real-time action here? Does the worker respond with, like, "Hey, if you want to connect to the WebSocket - or whatever - in which to have real-time communication with this, you should probably put this script on your page too." How does the client-side JavaScript aspect of it work?

Robert: Well, the thing about workers is that you're kind of writing a little mini server that sits out at the edge of your system. Workers, it provides a URL that your clients can connect to. What it does at that point is really up to it, but you're making a fetch request from the browser if you want to get a standard HTTP request that this worker does.

But if you want to do something real-time, then you have to create a WebSocket. Really, that's the go-to for real-time communication is instead of doing the fetch in URL, you'd create a new WebSocket and give it a URL. That request goes to a worker, but then when the worker gets that request, it actually creates a Web -- well, it basically creates its own WebSocket handler on the server.

I'm hesitating a little bit just because the code for that is kind of going through my head, and it's a little interesting because Cloudflare has their own way of supporting WebSocket connections.

00:11:03

Chris: Okay, so it is a WebSocket then. That is the technology they recommend using.

Robert: Yeah. Oh, very much so.

Chris: Yeah. Again, in this case, it's to the worker, not the durable object, because the durable object is just a piece of technology that's attached to the worker.

Robert: Well, yes and no. I mean, behind the scenes, what happens is when your browser makes that WebSocket request, it lands in the worker. The worker is like, "Oh, I've got a request here," and it can either create its own WebSocket handle, so you can have a worker that handles WebSockets directly. But in the case of a durable object, what you would do is actually pass control to the durable object and have it handle the WebSocket at that point. You can terminate WebSockets either directly in your worker or in the durable object code itself. Does that answer your question?

Chris: Yeah. [Laughter] Yeah, yeah.

00:12:05

Robert: [Laughter] One of the things that, for me, was a bit of -- that has helped me to understand this is to sort of keep in mind that workers and durable objects are really designed for working in this global network infrastructure that Cloudflare has and that any time you have a worker or a durable object, it has to exist at some data center somewhere.

Chris, you're in Bend here (alongside me), so we're talking to the same data center. But if Rachel were on this call, she's in the Sunshine Coast of Australia. When she connects to our collaboration session or our chat session, she's talking to a different data center on the other side of the world. The worker function that she talks to is somewhere in Australia, whereas ours is probably in Seattle, Portland, or somewhere.

Chris: But durable objects are different, right? You're saying that those don't -- workers are edged out, but durable objects kind of aren't.

Robert: Exactly, so if you and I, or if Rachel and you and I, all hit the same worker URL, we're all talking to different worker instances that are handling that. But if that worker that we set up needs to talk to a durable object, then when you and I and Rachel all connect to our separate workers, those workers all connect to the same durable object in one data center somewhere, like one on server on one data center.

Chris: But at least it's doing that last hop on Cloudflare's network, not your wi-fi or whatever. You know? I don't know.

00:13:52

Robert: Right. Right. I mean I've actually run a couple of little tests on the responsiveness. When a durable object is created in, for example, the Seattle data center or Portland (I forget which one it is), the worst case is if Rachel's on the Sunshine Coast of Australia is connecting to that same durable object in Portland, there's like a 250 to 300-millisecond latency in there. I mean it's not great, but it's not terrible.

Chris: Mm-hmm. Mm-hmm.

Robert: You're not going to be running real-time video games through this. But for a chat app, it's fine.

Chris: Yeah. If that chat app had no -- yeah, right. It could even take three seconds for a message to arrive and nobody would care.

Robert: Yeah.

Chris: Maybe that's an exaggeration, but it won't be that long anyway. Let's say you also don't care about logs. You come back into Room 12345, and the expectation of your app you've built up is that it's a blank, nothing, and that only when somebody else arrives and you're both there at the same time, and you send messages back and forth, that's when things show up on the screen. That makes this really easy, right? I'd send a message that says, "Hi, Robert." It goes to the sky. It comes down to your machine, and it says, "Hi, Robert," on there.

It sounds weird in a chat app because we're so used to opening Slack, Discord, or something and seeing the history of it there. In that case, surely there's some data store in the world that's being populated with this. But you don't have to store data, right?

Robert: No. It's a JavaScript object. Simplest case, you just have a little chatroom.history and make that an array. Any time a message comes in, you stick it in there. If somebody else connects to the room, it's still in memory. It's an object that persists that all of your workers are talking to. Now that object may go away at some point, and you'd lose that history. But--

Chris: Well, in my little architecture, who cares if the durable object dies because there's no expectation of history. Right? But as soon as there is an expectation of history, then you'd have to just start architecting in a different way, right? You'd get some history from a persistent object but not forever.

00:16:17

Robert: There is a persistent data store API available for durable objects. You can stick data in a persistent data store from your durable object and still have access to it. I forget the name of that store off the top of my head, but it's cool.

Chris: Oh, that's cool. So, they've anticipated this, and they say, "Oh, if you need longer term storage for this thing, don't expect the exact durable object to hang around forever. But when it gets spun up again, it can access this storage thing." That's pretty clever, actually.

Robert: Yeah. Yeah. Now the trick is -- and I haven't really got the answer to this yet. But the trick is knowing when to put stuff in there because if you don't want to do that on every keystroke or every message (just because persistent data stores tend to be a little bit lower latency and more expensive - or is it higher latency [laughter]), it'd be nice to know when your durable object is going to go away.

So, like, "Okay, well, I've got all this message history in memory. Let me just stuff that in the persistent store and pull it back later."

Chris: Yeah, or even have that be part of the class like on, before, or unmount; store all your crap.

Robert: Right. This sort of gets back to that notion that you've got to keep in mind that this technology is designed to live in data centers at the edge, and one of the problems is that (just the way these things are set up) it's hard to know when they're going to die. And so, Cloudflare doesn't actually offer a "Hey, this object is about to go away" event. [Laughter]

It's one of the things that, personally, I have to kind of wrap my head around. It's like, all right, what's the right model for really managing state that you really want to have stick around?

Chris: Yeah. Right. Hmm. I don't know. Part of me is like, well, just don't. Bring your own data store. Use your own database or something. But it's not for lack of trust of what Cloudflare is doing. We have stuff in Cloudflare that we put a decent amount of trust in. Not like 1000% trust, but we stick stuff in their KV that has never failed us, which is neat.

Robert: Yes.

00:18:35

Chris: They have all kinds of storage stuff. It's funny, isn't it? You think of a chat transcript. You're like, "Where do you put it? Do you put it in the fancy DO storage? Do you at one point actually use KV because you could just pluck it all into the V part of a KV? Do you save it to a text file and use their R2?"

They're starting to be a real cloud offering where it starts to be confusing what the hell to even use.

Robert: Well, you know it's interesting. There's certainly a lot of types of applications where you don't need the structure and rigor that comes with traditional SQL or relationship database or even the no-SQL databases like a KV store. I've got a key, and I've got a value, and I just want to be able to store a bajillion of those.

That works really well for a lot of stuff. One of our workers is the snapshots worker where we use the KV store to basically keep track of what the URLs for snapshot images are or for screenshot images, and we have that for every Pen in CodePen. It's not a big, fancy database. It's just a KV store.

Chris: Yeah. Right. Not relational. Okay, and you've also done a little research into the cost of this. Cloudflare produced this product because they want to sell it to you for money, just like every other company in the world's product.

They have a unique storage model for it thing. I think, with workers, it was unique because they limit the execution time and such on them. Strictly, it's basically like how many times do they get hit. That's how much money you owe.

But this is a little different almost because of the expectation of persistent connection and shit. They actually start a little timer, don't they, and charge you?

00:20:26

Robert: Yeah, so they bill. There's sort of two knobs or dials that affect the billing for durable objects. One is the duration. It's just how long your durable object has been active. Then the other one is what they call a request, but a request can be a variety of things, including a message across a WebSocket.

The cost of a durable object is a product of both of those things. And depending on how you use your object, one or the other may become the most dominant factor.

That said, the costs are very reasonable. It's pennies per millions of - pick your unit. The initial sort of estimation that I've done for CodePen, if we're running 100,000 collaboration sessions for hours on end, those costs do add up. That's one of the things that we're going to have to really figure out as we move forward is what exactly, like, how exactly are users using the service? Where can we mitigate the costs where they do kind of get out of hand?

Chris: Yeah. Right. It's one thing if it's like, are your users specifically doing things in your application that trigger an event, like they hit the save button and the save button calls a worker once?

Robert: Right.

Chris: There's a one-to-one between usage, and you're probably going to be favorable. What's not so favorable is if ... wrote a set timeout for every 300 milliseconds, and so that way if your users just have the tab open, it's hitting this worker. Not that you would ever write code like that, but I'm thinking of some other technology we've played with that is chatty on our behalf, in a way, in that just the way that it operates is kind of hammering a durable object in a way that a user has no idea. They're not at fault for how chatty the thing is.

Robert: Yeah. The main feature that we've been looking at this technology for, at least what we started with, is collaboration where we need to synchronize the documents or the files that users are looking at in real-time. Every keystroke ends up being one or more messages in and out of these objects, potentially to multiply by however many people are actually connected to these sessions.

Chris: Right. Think of keystroke at a time or more, which is just a lot, so we have to watch that and make sure that it's an affordable technology too. A lot of times these cloud services, you look at the pricing and you're like, "Oh, my! This is insanely cheap. How do you possibly do that?"

Then you get into real usage and you're like, "Yeah, that's actually -- you can tip the expensive-ometer pretty quick." [Laughter]

Robert: Yeah. Well, I mean, we're going to all have to type a little slower, too.

[Laughter]

Robert: Just to keep costs down.

Chris: Opening memo.

00:23:38

Robert: Well, one of the things that's sort of interesting about Cloudflare and workers and durable objects, in general -- this may be getting into the weeds -- I was reading up on just how they actually implement this and keep things so affordable. One of the things that distinguishes workers and durable objects is that they don't run in docker containers, which is what I think a lot of us typically think of for cloud virtualization stuff. They're actually using JavaScript isolets, which are really lightweight runtimes for JavaScript. I don't know if people want to read up on that. It's an interesting read.

Chris: Yeah. I'm sure, for speed and cost reasons on their side. And it fits the zeitgeist. You're like, "Oh, a little JavaScript runtime in the sky. That's the hotness right now."

Robert: Yeah.

Chris: Surely partially the hotness because that's what they offer, but hey, you know, it's all cyclical. Yeah, so the point being just because it makes possible real-time, even though that's not the only use case we're saying, it also doesn't necessarily solve the--

You don't just get to, like, put a text area on a webpage connected to through these technologies and say that you built Google Docs, right? [Laughter] That doesn't work.

Robert: Yeah. There's a lot more to it, as we have discovered recently. Keeping documents in sync is its own challenge and its own set of technologies. But if you want to connect some people and have them interact and have them play tic-tac-toe or chess in real time, it's a pretty small stack needed to do that these days. It didn't use to be that way. That's for sure. So, it's pretty cool.

Chris: Yeah, that is pretty cool. Well, hey, thanks for the tour here. Is there anything else you can think of at the end here? I mostly just wanted to wet everybody's whistle on the kind of technologies we look at and evaluate and think about and measure and things.

Robert: No, I'm excited about it. I wish I was better versed in how Cloudflare actually has all this stuff implemented and the full range of use cases that they perceive for this stuff. But for what we're doing at CodePen and other projects I've been working on, workers and durable objects are pretty exciting.

Chris: Yeah. Yeah, it must be hard for them because it seems like they innovate on the tech stuff a lot, but there's not exactly repos sitting there of five different applications you could build with it with the exact code or anything. It's a little bit like you've got to use your imagination and give it a shot.

I might be short-selling them a little bit. Maybe it's out there, but I don't seem to run across them very often.

Okay. Thanks, Robert. We'll talk to you again soon.

Robert: No problem. All right, guys.

Chris: Bye.

Robert: Take care.

[Radio channel adjustment]

Time Jumps

Sponsor: Equinix Metal’s Startup Partner Program

Transcript