373: Script Injection with Cloudflare Workers

This week Shaw and Chris dig into some deepnerd tech stuff: manipulating HTML. In a perfect world, perhaps we wouldn’t need to, but today, and even moreso in the foreseeable future of CodePen, we need to do a smidge of HTML manipulation on the HTML that you write or that is generated by code you write on CodePen. A tiny example is removing the autofocus attribute when a Pen in shown in a grid view <iframe>. A more significant example is that we need to inject some of our own JavaScript into your Pen, to power features of CodePen itself, like the console, which receives information from your rendered page (like logs, errors, etc) and can push commands to execute as well.

So how do we inject a <script> into absolutely 100% arbitrary HTML? Well, it’s tricky. We’re starting to do it with Cloudflare Workers and the HTMLRewriter stuff they can do. Even then, it’s not particularly easy, with lots of edge cases. Thank gosh for Miniflare for the ability to work on this stuff locally and write tests for it.

Time Jumps

00:22 Let’s talk Messing with HTML
03:07 Reasons for messing with HTML
05:48 How and when to inject a script
10:14 Where we show your profile page
14:17 Using Cloudflare workers
18:52 Testing

Transcript

[Radio channel adjustment]

Announcer: Today, on CodePen Radio.

Chris Coyier: Hey, everybody. CodePen Radio #373. We're going to be talking about HTML just a little bit, kind of. We really are. Anyway, Shaw is with me.

Stephen Shaw: Hello.

Chris: How are you doing? Yeah, man. What we're specifically going to talk about is kind of like messing with HTML on the fly.

Shaw: Your HTML.

Chris: Your HTML. That's right, not even our HTML. How do I set this up? Let's say you write code on CodePen and we show it to you. It's about as simple as that, except for that it's the world's most complicated thing, apparently.

Shaw: Ah, yeah.

Chris: But let's imagine the Pen editor of today, which so many people are familiar with. You write some code, including HTML, and CSS and JavaScript, and you adjust settings, and you add in external resources, and yadda-yadda-yadda. We take all that stuff, and we smash it together into one HTML document and then serve it.

You can see that HTML right in the editor, so that's where we'll start. We have some level of control over that in the Pen editor because there's only just that one HTML file. Just keep that in mind a little bit.

But then let's say, okay, where else might you see this Pen? Well, you might see it on your profile. Everybody has a profile on CodePen, so it's there, but it's in a slightly different context, right? You're looking at it on some page that anybody else might be able to see too.

In that case, we actually mess with your HTML a little bit - a little bit differently than you would in the editor. For example, if you used the auto-focus attribute in HTML. We kind of try to not have that there so it doesn't steal your focus when you're just browsing some profile or something. We call that the grid.

Shaw: Or alert, or audio - all those kind of annoying APIs.

Chris: Geolocation. Yeah. Anything that's annoying, we try to stop. Okay, so there's that. Let's go back to the editor for a moment. There's another thing that we offer on CodePen that's a console. It's actually pretty cool, pretty useful. You can pop it open. You can share a Pen with the console open in case the point of your Pen is logging or showing other things in the console. How the heck does that work?

[Laughter]

00:02:47

Shaw: Yeah. For the built-in browser console, it has access to the page. It can read all of those things. All of those console functions are native.

Chris: Mm-hmm.

Shaw: But in order for us to get that actually working in the browser, there's kind of a lot of intervention we have to do. Ultimately, it comes down to a little script that we inject on that on that page.

Chris: A script, right. Totally, so there are at least two things we're doing. We're occasionally messing with code (a little bit). Our goal is to do it as absolutely little as possible.

Shaw: Yeah.

Chris: If we didn't have to do anything, that would be very ideal.

Shaw: We don't want to touch it. We don't.

Chris: We really don't want to, but if it's security or UX related or offering a very useful feature, sometimes we just have to.

At the moment, let's keep this on the, like, okay, in order to make that console work (that's a pretty cool feature), we need to inject a script. We kind of just internally refer to that as script injections. It makes kind of sense, right?

Shaw: [Laughter] [Indiscernible]

Chris: Yeah. Yeah, sometimes it's easier than others. You know? Let's imagine that we didn't have as much control over the HTML output as we do in a typical Pen, and that could be easily true.

Shaw: Well, think about the project editor.

Chris: Projects. Yeah.

Shaw: Yeah. We've got multi-files. We really don't mess with the content of that. It's pretty direct from what you see in the file editor to what's shown in the preview.

Chris: Yeah. All right, so we have no idea. Just less control, so we're kind of thinking, what's a way to approach this in a kind of scalable way? There are a number of ways we could do it. But we have some, and we've mentioned this a number of times on the show before.

We use Cloudflare, and we use Cloudflare workers. We've used them for all kinds of different things on CodePen.

I guess one of the features of a Cloudflare worker is that it has an API as part of it called HTML Rewriter. It's just this fancy API. I'll leave it to you to go look at the docs. If I may mouth blog it for one second, if you choose to use that API, it's kind of like looking at your HTML as a stream, and you can write little matchers that say, "Oh, when you see this element, do this callback function," essentially. It's your opportunity to look at the content of the HTML, perhaps change it and put it back.

00:05:25

Shaw: Mm-hmm. Yeah, and you could use this for translation or injecting things.

Chris: A/B testing.

Shaw: Yeah.

Chris: Yeah.

Shaw: Removing content, like all that kind of stuff. You essentially have a document.querySelector kind of power of finding these elements and then messing with them.

Chris: Yeah. A service worker can do this, which has been affectionately referred to as a man-in-the-middle attack on yourself.

Shaw: [Laughter]

Chris: It's kind of like you're your own man in the middle attack. Imagine how dangerous this would be if a bad person could do what this is, you know, intercept the HTML before it even gets to the browser and change it. But this is secure because it's as secure as the rest of your website is and gives you the opportunity to do these kinds of things.

Well, what could you do then? Well, you could put a script tag there. Just as simple as that. You could say, "Oh, we need to power our console as a feature on this website. We'll just put a script tag there." That's what we've been working on and kind of successfully did.

Shaw: Mm-hmm.

Chris: It's kind of a scalable way to get that script tag in there and kind of open the door for other manipulations we might not need to do. Remember that auto-complete attribute and little things like that.

But it was a little trickier than we thought, right? You might be like, "Oh, well, just look for the body then and replace the body with itself plus a script tag," or something like that. You know? It turns out there are some complications with that, I guess.

00:07:01

Shaw: Yeah. In order to do it properly, we want to inject the script as soon as possible in the document in a correct place. We don't want it to break the validation of the HTML or anything like that or just have it before the HTML has even started or anything like that. But we need it in there as soon as possible so that anything that's running in the head or any external scripts or things like that that are logging to the console, you want to be to intercept those and report those back to the in-editor console.

Chris: Right.

Shaw: Really, getting it after the first line in the head is ideal.

Chris: Ideal. Okay. Yeah, that's a good point. Not necessarily in the body. In fact, the body is an okay place to put it as a fallback in case the document is so wildly malformed that it doesn't even have a head. In this case, I think some astute listeners may be like, "How can it not have a head? I thought the browser forced an HTML document to have a head?" Even if you're sending over HTML that doesn't have a head, it'll put a head in there. But remember, it hasn't even hit the browser yet.

Shaw: Right. We don't have that cleaned-up browser render like validated version.

Chris: No.

Shaw: It's like the raw HTML coming from the server request.

Chris: I mean it does occur to me. I wonder if there's some weird, secret API to chrome-ize your HTML before you start messing with it.

[Laughter]

Chris: But that would probably add some overhead.

Shaw: Right. Go ahead and make it browser ready. Yeah, but we can't trust that, that HTML. It's all user input. Somebody may just put text in there. They may not understand HTML.

Chris: Right.

Shaw: Or they may have some very specific purpose for it, and so we can't just expect there to be a head or perfectly valid HTML all over the place.

Chris: In fact, we've spent a decade now teaching people, "Hey, don't worry about putting the whole doc type in the HTML. We'll inject that for you," which is very true in the Pen editor and not true in the project editor.

Shaw: Right.

00:09:14

Chris: Okay. Interesting stuff there. It turns out we were actually able to craft this worker such that it tries for the head first. Then it tries for the body. There's even a fallback beyond that, I think, that's like, "I don't know. Just put it somewhere," if you can't find--

Shaw: [Laughter] Yeah. The document at all. It still has a document even if there's not an HTML element or anything like that in there. There's still technically a document that Cloudflare's HTML rewriter detects. Then we can still inject that script, just at the very end. It's not ideal, but it still gets it in there so that we can do as much as possible.

Chris: Yep.

Shaw: At least, in addition to console messages, like runtime errors and different things like that that the browser reports, we can capture through that script as well and report that back to the editor so that, in case the user does have some malformed HTML, we can catch some of those things and report it.

Chris: Totally. Cool. We already kind of talked about how we show your code in different contexts. There's the profile page or the trending page or any place that you can imagine, like one of those mini items (I think we call them internally at CodePen), a grid of items that we don't need the console script in there. There's also a little logic that's like, "Yeah, maybe let's avoid putting the full-blown console script in there in those contexts. We don't really need to do that."

We're just thinking. Not even all of this is even done yet. [Laughter]

Shaw: Yeah.

Chris: We're just thinking it out and talking about it. There's that. Then there's, well, what about this thing that we offer on CodePen called debug view? Well, we're not going to inject anything there because that's holy. That's a "don't touch it" code, entirely.

But as we're going forward into the future of CodePen, we're thinking, gosh, that's a useful view.

Shaw: [Laughter]

00:11:23

Chris: What a nice one, the fact that we don't have to -- that we're not touching your code mostly at all. But you know there's almost like - I don't know - arguably too many views on CodePen.

Shaw: Yeah.

Chris: Why is live view so different than debug view? It's like, "Well, live view auto-refreshes for you." We're like, "Hmm. I wonder if there's a way to kind of combine those." I'm not promising anything. We're just kind of thinking this stuff out. But maybe we could actually inject code in debug view that basically does what live view did today, which would be nice because then there's one less view on CodePen that still behaves really nicely - yadda-yadda-yadda.

That view also really wouldn't need to power the console because there is no console. You just use your browser console in that context. But it would need some kind of automatic refresher machine script.

Shaw: A little listener, you know, communicating with--

Chris: Yeah.

Shaw: --some kind of service worker or a firebase, something like that.

Chris: A listener, a Web socket listener of some kind. Yeah, it's just kind of theoretical at this point, but not entirely. We're exploring code like this.

The point is, that's a different set of scripts to be injected than the one that's in the editor itself. That's so interesting that this HTML rewriter is going to have different jobs depending on different contexts of CodePen. Pretty interesting stuff.

And okay, so maybe -- I don't know. Do you have any other thoughts on that before I shift gears a tiny bit?

Shaw: No. I mean it's a complicated problem. We want to touch as little as possible, but we want to provide these little conveniences and make things as good as possible. By intercepting and modifying little bits, it's so much better than the Rails templates that we have and the 30 different URLs for Pens.

Chris: [Laughter] Yeah, quite literally. Even internally, there's even more that you all never even see.

Shaw: Thankfully.

Chris: It's like, okay. There are too many. Too many!

Shaw: [Laughter]

00:13:34

Chris: You know one of the things even occurred to me and the reason why you would inject into the head instead of the body, too, is just little, silly stuff that, at CodePen, scale actually comes up sometimes. People are like, "My body space colon last child selector isn't selecting the last paragraph on the page like I expect it to," or whatever. You're like, "Oh, that's because we put a script there, so it's actually selecting the script, not your paragraph element." If that was different between views, that could get really confusing.

Shaw: Yeah.

Chris: Anyway, we definitely try to avoid situations like that. Just some insight into how weird things can get in CodePen land sometimes.

Here's another thing, though. As we're working on all of this, Cloudflare workers are really very easy to work on in production because Cloudflare themselves makes it easy. They have an online editor you can use. You're like, "Oh, if your DNS is coming through Cloudflare already, just throw a worker up and give it a test and see how it goes and deploy it if you want to."

Sure. Very good. I mean I actually quite like that DX for experimentation and stuff, but for most of us who work on really important production stuff, you can't do that. That's a totally unacceptable way to work on a website.

First, you need to wrote the code for these workers (locally), and it needs to be part of your repo. Asterisk here, this is why people talk about mono-repos because that little chunk, that little worker, it's just some other thing in your code base. But wouldn't it be nice if it's not another repo? That's the kind of thing, I think, if you're ever confused about why people are like, "Why is there so much talk about repos?" Well, it's stuff like this that's a little piece of a side of a chunk of a production website that has nothing to do with the rest of your website, but you kind of want to have it live in the same repo.

Shaw: And share constants, share utilities, all that kind of stuff. It's so much easier if it's mono-repo.

00:15:47

Chris: You set up deployment for it too. They have a CLI tool called Wrangler that helps deploy them, which if you're all a grownup website like CodePen, you probably are writing a GitHub action to use that CLI to deploy it when you commit to branches and such. That's some work to get ready. It's not too bad. Pretty okay, you know. I think we had some hiccups and stuff, but largely, it works.

If you commit a worker, it runs that thing and deploys and all that. But that's still just production then. I mean you still have some work to do to be like, "Well, what about staging? We're not going to test this thing on production, so that's a whole can of worms in itself. Fine. Overcome-able.

But what about dev? Now that's when it gets really weird.

Shaw: You've got to test it out.

Chris: Now you don't have any URLs. You can't -- Cloudflare doesn't know about your local host.

Shaw: [Laughter]

Chris: How does Cloudflare intercept your local host URLs? Well, technology, that's how. Right?

[Laughter]

00:16:54

Shaw: Yeah. There's a little package called Mini Flare that kind of helps all this happen in local dev. I feel like it's relatively new, right? Within the past year.

Chris: I think that's about right. It's probably about a year old. I don't intimately know all the details, but I believe it was written by a third party, probably somewhat of a fan of Cloudflare workers or something. Cloudflare had Wrangler, which was--

Shaw: It did some of this. Yeah.

Chris: But definitely wasn't the full monte like Mini Flare is. Then I think it got kind of subsumed -- or whatever the right word is there -- by Cloudflare itself. Now it's an official project, which is great because, for us, we spin it up. We make it part of our local development environment such that this works literally identically to how it's going to work on staging and production, which is just awesome.

Shaw: Mm-hmm.

Chris: You know?

Shaw: Yeah, and it even has some helpers and things like that for local tests, like Jest environments and things like that, so that each little bit of the worker can be tested and run without even the full--

Chris: Yeah. That's right. What were some of the hiccups there? I remember it's weird because a word like HTML rewriter, like I said, that's actually kind of a global that won't error when you're worker runs because that's just magically available in your worker.

00:18:25

Shaw: Yeah. Workers have to be completely self-contained. They can't import anything else. If you do have imports for multiple files, it has to be compiled before it's run as a worker.

Chris: Hmm.

Shaw: You have to have some kind of build step if it's more complicated than that. Then there are access to those global in the workers' own environment, like HTML rewriter and other utilities. Mini Flare helps set all that up, and it also sets that up for the Jest environment too so that you can test little utilities like does this HTML rewriter actually work on malformed HTML?

Chris: Yeah.

Shaw: Are these URLs being detected correctly? All those little tests that we were able to get in.

Chris: It's the perfect sort of thing you should be testing. You're literally manipulating HTML. You better put ten pretty whacky tests together to make sure it's doing what you expect it to be doing.

Shaw: [Laughter]

Chris: You can't just -- I don't know -- test it once on your local machine and be like, "Yeah, that seemed to work okay." As you all know, you've got to actually write real tests. And it was able to do that, which was great because you want to use your Jest - or whatever - that I'm sure has other bindings. Maybe not. I don't know. But we use Jest anyway, so thanks. [Laughter]

Shaw: [Laughter]

Chris: You don't want your Jest test to be like, "Bonk! What is HTML rewriter? Not defined!" You know? It needs to be able to know what that global is, and it does. Yay! Good job.

I'm so glad this all came together. This is a real need we really have right now, and it's just a miracle that all this tech is just kind of ready and stable for us the minute we need it, even though it's less than a year old, essentially.

Shaw: Yeah.

Chris: Magical. All right, well, that's the thing, I guess. No need to drag it out so much. Good job.

[Laughter]

Chris: Good job, Cloudflare. Cool tech.

Shaw: Thank you. Yeah.

Chris: Yeah. Check out Mini Flare for your local Cloudflare worker needs.

[Laughter]

Chris: Upgrade to Pro on CodePen immediately.

Shaw: This is not a paid advertisement, but it can be.

Chris: No, it really isn't, which is confused, I think, because we have worked with them for paid endorsements before. This is not one of them. Anyway--

[Laughter]

Chris: We'll let you have your day back. People like short podcasts anyway, right?

Shaw: A little disclaimer there, yeah.

Chris: Cool. Take care.

Shaw: Bye.

[Radio channel adjustment]