342: Sky Icing

Episode 342 · June 14th, 2022 · 43 mins 42 secs

About this Episode

Another toaster strudel debate?! Plus, the results are in for the most listened-to podcast in the RoR community! :: drum roll ::

Steph has a "Dear Gerrit" message to share. Chris has a follow-up on mobile app strategy.

The Bike Shed: 328: Terrible Simplicity
When To Fetch: Remixing React Router - Ryan Florence
Virtual Event - Save Time & Money with Discovery Sprints

Become a Sponsor of The Bike Shed!


STEPH: thoughtbot's next virtual event "Save Time & Money with Discovery Sprints" is coming up on June 17th, from 2 - 3 PM Eastern. It's a discussion with team members from product management, design and development. From a developer perspective, topics will include how to plan a product's architecture, both the MVP and future version, how to lead a tech spikes into integrations and conduct a build vs buy reviews of third party providers. Head to thoughtbot.com/events to register, the event is June 17th 2 - 3 PM ET. Even if you can't make it, registering will get you on the list for the recording.

CHRIS: We're the second-best. We're the second-best. Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Chris Toomey.

STEPH: And I'm Steph Viccari.

CHRIS: And together, we're here to share a bit of what we've learned along the way. So, Steph, what's new in your world?

STEPH: I'm very happy to report that I picked up a treat from the store recently. So while I was in Boston and we were hanging out in person, we talked about Pop-Tarts because that always comes up as a debate, as it should. And then also Toaster Strudels came up, so I now have a package of Toaster Strudels, and those are legit. Pop-Tart or Toaster Strudel, I am team Toaster Strudel, which I know you're going to ask me about icing and if I put it on there, so go ahead. I'm going to pause. [laughs]

CHRIS: It sounds like I don't even need to say anything. But yes, inquiring minds want to know.

STEPH: I think that's also my very defensive response because yes, I put icing on my Toaster Strudel.

CHRIS: How interesting. [laughs]

STEPH: But it feels like a whole different class of pastry. So I'm very defensive about my stance on Pop-Tarts with no icing put Strudel with icing.

CHRIS: A whole different class of pastry. Got it. Noted. Understood. So did you travel? Like, were these in your luggage that you flew back with?

STEPH: [laughs] Oh no. They would be all gooey and melty. No, we bought them when we got back to North Carolina. Oh, that'd be a pro move; just pack little individual Strudels as your airplane snack. Ooh, I might start doing that now. That sounds like a great airplane snack.

CHRIS: You got to be careful though if the icing, you know, if it's pressurized from ground level and then you get up there, and it explodes. And you gotta be careful. Or is it the reverse? It's lower pressure up in the plane. So it might explode.

STEPH: [laughs] Either way, it might explode.

CHRIS: Well, yeah. If you somehow buy a packet of icing that is sky icing that is at that pressure, and you bring it down, then...but if you take it up and down, I think it's fine. If you open it at the top, you might be in danger. If you open icing under the ocean, I think nothing's going to happen. So these are the ranges that we're playing with.

STEPH: I will be very careful sky icing and probably pack two so that way I have a backup just in case. So if one explodes, we'll be like, all right, now I know what I'm working with and be more prepared for the next one.

CHRIS: That's just smart.

STEPH: I try to make smart travel decisions, Toaster Strudels on the go. Aside from travel treats and sky icing, I have some news regarding Planet Argon, who is a Ruby on Rails consultancy regarding their latest published this year's Ruby on Rails community survey results. And so they list a lot of fabulous different topics in there. And one of them includes a learning section that highlights most listened to podcasts in the Ruby on Rails community as well as blogs and some other resources. And Bike Shed is listed as the second most listened to podcast in the Ruby on Rails community, so whoo, golf clap.

CHRIS: Fantastic.

STEPH: And in addition to that, the thoughtbot blog got a really nice shout-out. So the thoughtbot blog is in the number two spot for the most visited blogs in the community. In the first spot is Ruby Weekly, which is like, you know, okay, that feels fair, that feels good. So it's really exciting for the thoughtbot blog because a lot of people work really hard on curating and creating that content. So that's wonderful that so many people are enjoying it.

And then I should also highlight that for the podcast in first place is Remote Ruby, so congrats to Chris, Jason, and Andrew for grabbing that number one spot. And Brittany Martin, host of the Ruby on Rails Podcast, along with Brian Mariani, Jemma Issroff, and Nick Schwaderer, are in the number three spot. And some people say that Ruby is losing steam but look at all that content and all those highly ranked podcasts. I mean, we like Ruby so much we're spending time recording ourselves talking about it. So I say long live Ruby, long live Rails.

CHRIS: Yes. Long live Ruby indeed. And yeah, it's definitely an honor to be on the list and to be amongst such other wonderful shows. Certainly big fans of the work of those other podcasts. We even did a joint adventure with them at one point, and that was a really wonderful experience, so yeah, honored to be on the list alongside them. And to have folks out there in the world listening to our tech talk and nonsense always nice to hear.

STEPH: Yeah. You and I show up and say lots of silly things and technical things into the podcast. The true heroes are the ones that went and voted. So thank you to everybody who voted. That's greatly appreciated. It's really nice feedback. Because we get listener responses and questions, and those are wonderful because it lets us know that people are listening. But I have to say that having the survey results is also really nice. It lets us know people like the show.

Oh, but I did go back and look at some of the previous stats because then I was like, huh, so I'm paying attention. I looked at this year's, and I was like, I wonder what last year’s was or the year before that. And I think this survey comes out every two years because I didn't see one for 2021. But I did find the survey results for 2020, which we were in the number one spot for 2020, and Remote Ruby was in the second spot.

So I feel like now we've got a really nice, healthy podcasting war situation going on to see who can grab the first spot. We've got two years, everybody, to see who [laughs] grabs the number one spot. That's a lot of prep time for a competition.

CHRIS: Yeah, I feel like we should be like, I don't know, planning elaborate pranks on them or something like that now. Is that where this is at? It's something like that, I think.

STEPH: I think so. I think this is where you put like sky frosting inside someone's suitcase, and that's the type of prank that you play. [laughs]

CHRIS: The best of pranks.

STEPH: We'll definitely put together a little task force. And we'll start thinking of pranks that we all need to start playing on each other for the podcasting wars that we're entering for the next few years. But anywho, what's going on in your world?

CHRIS: Let's see, what's going on in my world? A fun thing happened recently. I had a chance to reflect back on some architectural choices that we've made in the Sagewell platform. And one of those specific choices is how we've approached building our native mobile apps. We made what some listeners may remember is an interesting set of choices. In particular, in Episode 328, which we'll include a link to in the show notes, I shared with you the approach that we're doing, which is basically like, Inertia is great, web user great. We like the web as a platform. What if we were to wrap it in a native shell and find this interesting and somewhat unique hybrid trade-off point?

And so, at that point, we were building it. We had most of it built out, and things were going quite well. I think we maybe had the iOS app in the store and the Android app approaching the store or something like that. At this point, both apps have been released to the store, so they are live. Production users are signing in. It's wonderful. But I had a moment in the past couple of weeks to reassess or look at that set of choices and evaluate it. And thankfully, I'm happy with the choices that we've made. So that's good.

But to get into the specifics, there were two things that happened that really, really framed the choice that we made, so one was we introduced a major new feature. We basically overhauled the first-run experience, the onboarding that users experience, and added a new, pretty fundamental facet to the platform. It's a bunch of new screens, and flows, and error states, and all of this complexity. And in the process, we iterated on it a bunch. Like, first, it looked like this, and then we changed the order of the screens and switched out the error messages, and et cetera, et cetera.

And I'll be honest, we never even thought about the mobile apps. It just wasn't even a consideration. And interestingly, we did as a final check before going fully live and releasing this out to the full production audience; we did spot check it in the mobile apps, and it didn't work. But it didn't work for a very specific, boring, technical reason that we were able to resolve. It has to do with iframes and WebViews and embedded something, something. And we had to set a flag. Thankfully, it was solvable without a deploy of the native mobile apps. And otherwise, we never thought about the native apps.

Specifically, we were able to add this fundamental set of features to our platform. And they just worked in native mobile. And they were the same as they roughly are if you're on a mobile WebView or if you're on a desktop web, you know, slightly different in terms of form factor. But the functionality was all the same. And critically, the error states and the edge cases and the flow, there's so much to think about when you're adding a nontrivial feature to an app. And the fact that we didn't have to consider it really spoke to the choice that we made here.

And again, to name it, the choice that we made is we're basically just reusing the same WebViews, the same Rails controllers, and the same what are Svelte components under the hood but the same essentially view layer as well. And we are wrapping that in a native iOS. It's a Swift application shell, and on Android, it's a Kotlin application shell. But under the hood, it's the same web stuff. And that was really great.

We just got these new features. And you know what? If we have to rip that whole set of functionality out, again, we won't need to deploy. We won't need to rethink it. Or, if we want to subtly tweak it, we can do that. If we want to think about feature flags or analytics, or error states or error reporting, all of this just naturally falls out of the approach that we took. And that was really wonderful.

STEPH: That's super nice. I also love this saga of like, you made a choice, and then you're coming back to revisit and share how it's going. So as someone who's never done this before, in regards of wrapping an application in the manner that you have and then publishing it and distributing it that way, what does that process look like? Is this one of those like you run a command, and literally, it's going to wrap the application and then make it hostable on the different mobile app stores? Or what's that? Am I oversimplifying the process? What does that look like?

CHRIS: I think there are a lot of platforms or frameworks I think would probably be the better word like Capacitor is something that comes to mind or Ionic or Expo. There are a handful of them that are a little more fully featured in what they provide. So you just point us at your React Views and whatnot, and we'll wrap that up, and it'll be great. But those are for, I may be overgeneralizing here, but my understanding is those are for more heavy client-side bundles that are talking to a common API. And so you're basically taking your same rich client-side application and bundling that up for reuse on the native app, the native app platforms.

And so I think those do have some release to the store sort of thing. In our case, we went a little bit further with that integration wrapper thing that we built. So that is a thing that we maintain. We have a Sagewell iOS repo and a Sagewell Android repo. There's a bunch of Swift and Kotlin code, respectively, in each of them, and we deploy to the stores manually. We're doing that whole process. But critically, the code that is in each of those repositories is just the bridge glue code that says, oh, when this Inertia navigation event happens, I'm going to push a WebView to the navigation stack. And that's what that is.

I'm going to render the tab bar of buttons at the bottom with the navigation elements that I get from the server. But it's very much server-driven UI, is the way that I would describe it. And it's wrapping WebViews versus actually having the whole client bundle wrapped up in the thing. It's unfortunately subtle to try and talk through on the radio, but yeah. [laughs]

STEPH: You're doing great; this is helping. So if there's a change that you want to make, you go to the Rails application, and you make that change. And then do you need to update anything on that iOS repo? It sounds like you don't, which then you don't have to push a new update to the store.

CHRIS: Correct. For the vast majority of things, we do not need to make any changes. It's very rare for us to deploy the iOS or the Android app is a different way to put it or to push new releases to the store. It happens we may want to add a new feature to the sort of bridge layer that we built, but increasingly, those are rare. And now it's basically like, yeah, we're just wrapping those WebViews, and it's going great. And again, to name it, it's a trade-off. It's an intentional trade-off that we've made.

We're never going to have the richest, most deep platform integration, smooth experience. We are making a small trade-off on that front. But given where we're at as an organization, given how early we are, how much iteration and change, we chose an architecture that optimizes for that change. And so again, like what you just said, yeah, I can...you know how it's really nice to be able to deploy six times a day on a web app, and that's a very straightforward thing to do? It is not so straightforward in the native mobile world. And so, we now have afforded ourselves the ability to do that.

But critically, and this is the fun part in my mind, have the trade-offs in the controls. So if we were just like, it's just a WebView, and that's it, and we put it in the stores, and we're done, that is too far of an extreme in my mind. I think the performance trade-offs, the experience trade-offs, it wouldn't feel like a native app like in a deep way, in a problematic way. And so as an example, we have a navigation bar at the top of our app, particularly on iOS, that is native iOS navigation. And we have a tab bar at the bottom, which is native tab UI element. I forget actually what it's called, but it's those elements.

And we hide the web application navigation when we're in the mobile context. So we actually swap those out and say, like, let's actually promote these to formal native functionality. We also, within our UI on the web, have a persistent button in the top right corner of your screen that says, "Need help? Reach out to your retirement advocate." who is the person that you get to work with. You can send questions, et cetera, et cetera. It's this little help sidebar drawer thing that pops out. And we have that as a persistent HTML button in the top corner of the web frame.

But when we're on native, we push that up as a distinct element in the native UI section. And then again, the bridge that I'm talking about allows for bi-directional communication between the JavaScript side and the native side or the native side and the JavaScript side. And so it's those sorts of pieces that have now afforded us all of the freedom to tinker, and we don't need to re-release where we're like, oh, we want to add a new weird button that does a thing in the WebView when you click on a button outside the WebView. We now just have that built-in.

STEPH: Yeah, I really like the flexibility that you're describing. When you promoted those elements to be more native-friendly so, like the navigation or the footer or the little get help chat, is that something that then your team implemented in like the iOS or the Kotlin repo? Okay, I see you nodding, but other people can't see that, so...[laughs]

CHRIS: Yeah. I was going to also say the words, but yes, those are now implemented as native parts. So the thing that we built isn't purely agnostic decoupled. It is Sagewell-specific; a lot of it is low-level. Like, let's say we want to wrap an Inertia app in a native mobile wrapper. Like, 90% of the code in it is that, but then there are little bits that are like, and put a button up there. And that button is the Sagewell button. And so it's not entirely decoupled from us. But it mostly is this agnostic bridge to connect things together.

STEPH: Yeah, the way you're describing it sounds really nice in terms of you're able to get out the app quickly and have a mobile app quickly that works on both platforms, and then you're still able to deploy changes without having to push that. That was always my biggest mental, or emotional hurdle with the idea of mobile development was the concept of that you really had to batch everything together and then submit it for review and approval and then get it released.

And then you got to hope people then upgrade and get the newest version. And it just felt like such a process, not that I ever did much of it. This was all just even watching like the mobile team and all the work that they had to do. And I had sympathy pains for them. But the fact that this approach allows you to avoid a lot of that but still have some nice, customized, more native elements. Yeah, I'm basically just recapping everything you said because I like all of it.

CHRIS: Well, thank you, friend. Like I said, I've really enjoyed it, and similar to you, I'm addicted to the feedback loop of the web. It's beautiful. I can deploy ten times or however many I want. Anytime I want, I can push out a new version. And that ability to iterate, to test, to explore, to tweak, to not have to do as much formal testing upfront because I'm terrified that if a bug sneaks out, then, it'll take me two weeks to address it; it just is so, so freeing. And so to give that up moving into a native context.

Perhaps I'm fighting too hard to hold on to my dream of the ability to rapidly iterate. But I really do believe in that and especially for where we're at as an organization right now. But, and a critical but here, again, it's a trade-off like anything else. And recently, I happened to be out about in the town, and I decided, oh, you know what? Let me open up the app. Let me see what it's like. And I wasn't on great internet. And so I open the app, and it loads because, you know, it's a native app, so it pops up.

But then the thing that actually happened is a loading spinner in the middle of the screen and sort of a gray nothing for a little while until the server request to fetch the necessary UI elements to render the login screen appeared. And that experience was not great. In particular, that experience is core to the experience of using the app every single time. Every time you use it, you're going to have a bad time because we're re-downloading that UI element. And there's caching, and there's things that could happen there to help with that. But fundamentally, that experience is going to be a pretty common one. It's the first thing that you experience when you're opening the app.

And so I noticed that and I chatted with the team, and I was like, hey, I feel like this is actually something that fixing this I think would really fundamentally move us along that spectrum of like, we've definitely made some trade-offs here. But overall, it feels snappy and like a native app. And so, we opted to prioritize work on a native login screen for both platforms. This also allows us to more deeply integrate. So particularly, we're going to get biometric logins like fingerprints or face scans, or whatever it is.

But critically, it's that experience of like, I open the Sagewell native app on my iOS phone, and then it loads immediately. And then I show it my face like we do these days, and then it opens up and shows me everything that I want to see inside of it. And it's that first-run experience that feels worth the extra effort and the constraints. Because now that it's native mobile, that means in order to change it, we have to do a deploy, not a deploy, release; that's what they call it in the native world. [laughs] You can tell I'm well-versed in this ecosystem.

But yeah, we're now choosing that trade-off. And what I really liked about this sort of set of things like the feature that we were able to just accidentally get for free on native because that's how this thing is built. And then likewise, the choice to opt into a fully native login screen like having that lever, having that control over I'm going to optimize for iteration generally, but where it's important, we want to optimize for performance and experience. And now we have this little slider that we can go back and forth.

And frankly, we could choose to screen by screen just slowly replace everything in the app with true native WebViews backed by APIs. And we could Ship of Theseus style replace every element of the app with true native mobile things until none of the old bridge code exists. And our users, in theory, would never know. Having that flexibility is really nice given the trade-off and the choice that we've made.

STEPH: You said a word there that I missed. You said ship something style.

CHRIS: Ship of Theseus.

STEPH: What is that?

CHRIS: It's like an old biblical story, I want to say, but it's basically the idea of, like, you have the ship. And then some boards start to rot out, so replace those boards. And then the mast breaks, you replace the mast. And slowly, you've replaced every element on the ship. Is it still the same ship at that point? And so it's sort of a philosophical question. So if we replace every single view in this app with a native view, is it still the same map? Philosophers will philosophize about it forever, but whatever. As long as we get to keep iterating and shipping software, then I'm happy.

STEPH: [laughs] Y’all philosophize. That's that word, right?

CHRIS: Yeah.

STEPH: And do your philosopher thing. We'll just keep building and shipping.

CHRIS: I don't know if I pronounced it right. It's like either Theseus or Theseus, and I'm sure I said the wrong one. And now that I've said the other, I'm sure both of them are wrong somehow. It's like a USB where there's up and down, and yet somehow it takes three tries. So anyway, I may have mispronounced it, and I may be misattributing it, but that's the idea I was going for.

STEPH: Well, given I wasn't even familiar with the word until just now, I'm going to give both pronunciations a thumbs up. I also really like how you decided that for the login screen, that's the area that you don't want people to wait because I agree if you're opening an application or opening...maybe it's the first time, maybe it's the 100th time. Who knows? But that feels important. Like, that needs to be snappy. I need to know it's responsive. And it builds trust from the minute that I clicked on that application. And if it takes a long time, I just immediately I'm like, what are y'all doing? Are y'all real? Do you know what you're doing over there?

So I like how you focused on that experience. But then once I log in, like if something is slow to log me in, I will make up excuses for the application all day where I'm like, well, you know, maybe it's my connection. It's fine. I can wait for the next screen to load. That feels more reasonable. And it doesn't undermine my trust nearly as much as when I first click on the app. So that feels like a really nice trade-off as well, or at least a nice area that you've improved while still having those other trade-offs and benefits that you mentioned.

CHRIS: To highlight it, you used a phrase there which I really liked. Like, it's building trust. If something's a little bit off in that first run experience every single time, then it kind of puts a question in the back of your head, maybe not even consciously. But you're just kind of looking at it, and you're like, what are you doing there? What are you up to, friend? Humans say to the apps they use on their phone. That's normal, right? When you talk...

But to name it, we've also done a round of performance work throughout the app. And so there are a couple of layers to it. But it was work that we had planned for a while, but we kept deferring. But now that we're seeing more usage of the native apps, the native apps experience the same surface area of performance stuff but all the more so because they may be on degraded network connections, et cetera.

And so this is another example where this whole thing kind of pays off. The performance work that we did affects everything. It affects the web. It's the same under the hood. It's let's reduce the network requests that we're making in the payloads that we're sending, particularly the network requests to upstream things, so like the banking partner that we're using and those APIs, like, collating all the data to then render the screen.

Because of Inertia, we only have a single sort of back and forth conversation via the API as opposed to I think it's pretty common to have like seven different APIs and four different spinners on the screen. We're not doing that, none of that on my watch. [chuckles] But we minimize the background calls to the other parties that we're integrating with. And then, we reduce the payload of data that we're sending on each request.

And each of those were like, we had to think about things and tweak and poke, but again it's uniform. So mobile web has that now, desktop web has that now. Android, iOS, they all just inherited it sort of that just happened one day without a deploy or release, without a release of either of the native mobile apps. We did deploy to the web to make that happen, but that's easy. I can do that a bunch of times a day.

One last thing I want to share as we're on this topic of trade-offs and levers, there was a really great conference talk that I watched recently, which was Ryan Florence of remix.run also React Router fame if you're familiar with him from that. But he was talking about the most recent version of Remix, which is their meta framework on top of React.

But they've done some really interesting stuff around processing data, fetching data, when and how to sequence that. And again, that thing that I talked about of nine different loading spinners on the screen, Remix is taking a very different approach but is targeting that same thing of like, that's not great for user experience. Cumulative layout shift being the actual number that you can monitor for this.

But in that talk, there are features that they've added to Remix as a framework where you can just decide, like, do we wait for this or do we not? Do we make sure we have all of the data, or do we say, you know what? Actually, this is going to be below the fold. So it's okay to defer loading this until after we send down the first payload. And then we'll kick in, and we'll do it from the client-side. But it's this wonderful feature of the framework that they're adding in where there's basically just a keyword that you can add to sort of toggle that behavior.

And again, it's this idea of like trade-offs. Are we okay with more layout shift, or are we okay with more waiting? Which is it that we're going to optimize for? And I really love that idea of putting that power very simply in the hands of the developers to make those trade-off decisions and optimize over time for what's important. So we'll share a link to that talk in the show notes as well.

But it was very much in the same space of like, how do I have the power to decide and to change my mind over time? That's what I want. But yeah, with that, I think that's enough of me updating on the mobile app. I'll continue to share as new things happen. But again, I'm at this point very happy with where we're at. So yeah, it's been fun. But yeah, what else is up in your world?

STEPH: I have a dear Gerrit message that I wrote earlier, so I want to share that with you. Gerrit is the system that we're using for when we push up code changes that then manages very similar in the competitive space of like GitHub and GitLab, and Bitbucket. And so the team that I'm working with we are using Gerrit. And Gerrit and I, you know, we get along for the most part. We've managed to have a working relationship. [chuckles] But this week, I wrote my dear Gerrit letter is that I really miss being able to tell a story with my commit messages. That is the biggest pain that I'm feeling right now.

So for anyone that's less familiar or if you already are familiar with Gerrit, each change that Gerrit shows represents a single commit that's under review. And each change is identified by a Change-Id. So the basic concept of Gerrit is that you only have one commit per review. So if you were to translate that to GitHub terminology, every pull request is only going to have one commit, and so you really can't push up multiple.

And so, where that has been causing me the most pain is I miss being able to tell a story. So like even simple stories that are like, hey, I removed something that's not used. I love separating that type of stuff into its own commit just so then people can see that as they're going through review. Now, before I merge, I'm likely to squash, and that doesn't feel important that it needs to be its own commit. That's really just for the reviewer so they can follow along for the changes.

But the other one, I can slowly get over that one. Because essentially, the way I get around that is then when I do push up my code for review, is I then go through my change request, and then I just add comments. So I will highlight that line and say, "Hey, I'm removing this because it's not in use." And so, I found a workaround for that one.

But the one I haven't found a workaround for is that I don't push up my local work very often because I love having lots of local, tiny, green commits so that way I can know the progress that I'm at. I know where I'm headed. Also, I have a safe space to roll back to, but then that means that I may have five or six commits that I have locally, but I haven't pushed up somewhere. And that is bothering me more and more hour by hour the more I think about it that I can't push stuff up because it makes me nervous.

Because, I mean, usually, at least by the end of the day, I push everything up, so it's stored somewhere. And I don't have to worry about that work disappearing. Now I am working on a dev machine. So there is that aspect of it's technically...it's not even on my local machine. It is stored somewhere that I should still be able to access.

CHRIS: What's a dev machine? The way you're saying it, it sounds like it's a virtual machine, not like a laptop. But what's a dev machine?

STEPH: Good question. So the dev machine is a remote server or remote machine that then I am accessing, and then that's where I'm performing. That's where I'm writing all of my work. And then that's also kind of the benefit is everything is not local; it's controlled by the team. So then that also means that other teams, other individuals can help set up these environments for future developers.

So then you have that consistency across everyone's working with the same Rails version, or gems, or has access to the same tools. So in that sense, my work isn't just on my laptop because then that would really worry me because then I've got nowhere...it's not backed up anywhere. So at least it is somewhere it's being stored that then could be accessed by someone.

So actually, now, as I'm talking this through, that does help alleviate my concern about this a bit. [laughs] But I still miss it; I still miss being able to just push up my work and then have multiple commits. And I looked into it because I was like, well, maybe I'm misunderstanding something about Gerrit, and there's a way around this. And that's still always a chance. But from the research that I've done, it doesn't seem to be. And there are actually two very fiery takes that I saw that I have to share because they made me laugh.

When I was Googling, the question of like, "Can I push up multiple commits to one single Gerrit CR? Or is there just a way to, like, can I have this concept of like a branch and then I have many commits, but then I turn it into one CR? Whatever the world would give me. What do they have? [laughs] I'm laughing just looking at this now. One of the responses was, have you tried squashing your commits into one commit? And I was like, [laughs] "Yeah, that's not what I had in mind, but sure."

And then the other one, this is the more fiery take. They were very defensive about Gerrit, and they wrote that "People who don't like Gerrit usually just hack shit together. They cut corners and love squashing commits or throwing away history. And those people hate Gerrit. Developers who care love it. It's definitely possible and easy to produce agile software." And I just...that made me laugh. I was like, cool, I'm a developer that cuts corners and loves squashing commits. [laughs]

CHRIS: So you don't care is what that take says.

STEPH: I'm a developer who does not care.

CHRIS: You know, Steph, I've worked with you for a while. And I've been looking for the opportunity to have this hard conversation with you. But I just wish you cared a little more about the software that you're writing, about the people that you're working with, about the commits that you're authoring. I just see it in every facet of your work. You just don't care. To be very clear for anyone listening at home, that is the deepest of sarcasm that I can make. Steph cares so very much. It's one of the things that I really enjoy about you.

STEPH: I mean, we had the episode about toxic traits. This would have been the perfect time to confront me about my lack of caring about software and the processes that we have. So winding down on that saga, it seems to be the answer is no, friend; I cannot push up multiple commits. Oh, I tried to hack it. I am someone that tries to hack shit together because I tried to get around it just to see what would happen. [laughs] Because the docs had suggested that each change is identified by a Change-Id. And I was like, hmm, so what if there were two commits that had the same Change-Id, would Gerrit treat those as patch sets?

Because right now, when you push up a change, you can see all the different patch sets, so that's nice. So that's a nice feature of Gerrit as you can see the history of, like, someone pushed up this change. They took in some feedback. They pushed up a new change. And so that history is there for each push that someone has provided. And I wondered maybe if they had the same Change-Id that then the patch sets would show the first commit and then the second commit. And so I manually altered the commits two of them to reference the same Change-Id.

And I have to say, Gerrit was on to me because they gave me a very nice error message that said, "Same Change-Id and multiple changes. Squash the commits with the same Change-Ids or ensure Change-Ids are unique for each commit. And I thought, dang, Gerrit, you saw me coming. [laughs] So that didn't work either. I'm still in a world of where I now wait. I wait until I'm ready for someone to review stuff, and I have to squash everything, and then I go comment on my CRs to help out reviewers.

CHRIS: I really like the emotional backdrop that you provided here where you're spending a minute; you're like, you know what? Maybe it's me. And there's the classic Seymour Skinner principle from The Simpsons. Am I out of touch? No, it's the children who are wrong. [laughs] And I liked that you took us on a whole tour of that. You're like, maybe it's me. I’ll maybe read up. Nope, nope. So yeah, that's rough.

There's a really interesting thing of tools constraining you. And then sometimes being like, I'm just going to yield control and back away and accept this thing that doesn't feel right to me. Like, Prettier does a bunch of stuff that I really don't like. It shapes code in a way, and I'm just like, no, that's not...nope, you know what? I've chosen to never care about this again. And there's so much utility in that choice. And so I've had that work out really well. Like with Prettier, that's a great example whereby yielding control over to this tool and just saying, you know what? Whatever you produce, that is our format; I don't care. And we're not going to talk about it, and that's that.

That's been really useful for myself and for the teams that I'm on to just all kind of adopt that mindset and be like, yeah, no, it may not be what I would choose but whatever. And then we have nice formatted code; it's great. It happens automatically, love it. But then there are those times where I'm like; I tried to do that because I've had success with that mindset of being like, I know my natural thing is to try and micromanage and control every little bit of this code.

But remember that time where it worked out really well for me to be like, I don't care, I'm just going to not care about this thing? And I try to not care about some stuff, which it sounds like that's what you're doing right here. [laughs] And you're like, I tried to not care, but I care. I care so much. And now you're in that [chuckles] complicated space. So I feel for you, Steph. I'm sorry you're in that complicated space of caring so much and not being able to turn that off [laughs] nor configure the software to do the thing you want.

STEPH: I appreciate it. I should also share that the team that I'm working with they also don't love this. Like, they don't love Gerrit. So when I shared in the Slack channel my dear Gerrit message, they're both like, "Yeah, we feel you. [laughs] Like, we're in the same spot," which was also helpful because I just wanted to validate like, this is the pain I'm feeling. Is someone else doing something clever or different that I just don't know about? And so that was very helpful for them to say, "Nope, we feel you. We're in the same spot. And this is just the state that we're in."

I think they have started transitioning some other repos over to GitLab and have several repos in Gitlab, but this one is still currently using Gerrit. So they very much commiserate with some of the things that I'm feeling and understand. And this does feel like one of those areas where I do care deeply.

And frankly, this is one of those spaces that I do care about, but it's also like, I can work around it. There are some reasonable things that I can do, and it's fine as we just talked through. Like, the fact that my commits are not just locally on my machine already makes me feel better now that I've really processed that. So there are lower risks. It is more of just like a workflow. It's just, you know, it's crushing my work vibe.

CHRIS: Harshing your buzz.

STEPH: In the great words of Queen Elsa, I gotta let it go. This is the thing I'm letting go. So that's kind of what's going on in my world. What else is going on in your world?

CHRIS: Well, first and foremost, fantastic reference and segue. I really liked that. But yeah, let's see, [laughs] what else is going on in my world? We had an interesting thing happen last week. So we had an outage on the platform last week. And then we had an incident review today, so a formal sort of post-mortem incident review. There are a couple of different names that folks have given to these. But this is a practice that we want to build within our engineering culture is when stuff goes wrong, we want to make sure that we have meaningful conversations around to try to address the root causes.

Ideally, blameless is a word that gets used often in this context. And I've heard folks sort of take either side of that. Like, it's critical that it's blameless so that it doesn't feel like it's an attack. But also, like, I don't know, if one person did something, we should say that. So finding that gentle middle ground of having honest, real conversations but in a context of safety. Like, we're all going to make mistakes. We're all going to ship bugs; let's be clear about that. And so it's okay to sort of...anyway, that's about the process.

We had an outage. The specific outage was that we have introduced a new process. This is a Sidekiq process to work off a specific queue. So we wanted that to have discrete treatment. That had been running, and then it stopped running; we still don't know why. So we never got to the root-root cause. Well, we know what the mechanism was, which was the dyno count for that process was at zero. And so, eventually, we found a bunch of jobs backed up in the Sidekiq admin. We're like, that's weird.

And then, we went over to Heroku's configuration dashboard. And we saw, huh, that's weird. There are zero dynos processing this. That wasn't true yesterday. But unfortunately, Heroku doesn't log or have an audit trail around changes to those process counts. It's just not available. So that's unfortunate. And then the actual question of like, how did this happen? It probably had to be someone on the team. So there is like, someone did a thing. But that is almost immaterial because, again, people are going to do things, bugs will get shipped, et cetera.

So the conversation very quickly turned to observability and understanding. I think we've done a pretty good job of instrumenting error reporting and being quite responsive to that, making sure the signal-to-noise ratio is very actionable. So if we see a bug or a Sentry alert come through, we're able to triage that pretty quickly, act on it where it is a real bug, understand where it's a bit of noise in the system, that sort of thing. But in this case, there were no errors. There was no Sentry. There was nothing; there was the absence of something.

And so it was this really interesting case of that's where observability, I think, can really come in and help. So the idea of what can we do here? Well, we can monitor the count of jobs backed up in Sidekiq queue. That's one option. We could do some threshold alerting around the throughput of processed events coming from this other backend. There are a bunch of different ways, but it basically pushed us in the direction of doubling down and reinforcing the foundation of our observability within the platform.

So we're just kicking that mini-project off now, but it is something we're like, yeah, we feel like we could add some here. In particular, we recently added Datadog to the stack. So we now have Datadog to aggregate our logs and ideally do some metric analysis, those sort of things, build some dashboards, et cetera. I haven't explored Datadog much thus far. But my sense is they've got the whiz-bang things that we need here. But yeah, it was an interesting outage. That wasn't fun. The incident conversation was actually a good conversation as a team. And then the outcome of like, how do we double down on observability? I'm actually quite excited for.

STEPH: This is a fun moment for me because I have either joined teams that didn't have Datadog or have any of that sort of observability built into their system or that sort of dashboard that people go to. Or I've joined teams, and they already have it, and then nobody or people rarely look at it. And so I'm always intrigued between like what's that catalyst that then sparked a team to then go ahead and add this? And so I'm excited to hear you're in that moment of like, we need more observability. How do we go about this?

And as soon as you said Datadog, I was like, yeah, that sounds nice because then it sounds like a place that you can check on to make sure that everything is still running. But then there's still also that manual process where I'm presuming unless there's something else you have in mind. There's still that manual process of someone has to check the dashboard; someone then has to understand if there's no count, no squiggly lines, that's a bad thing and to raise a concern.

So I'm intrigued with my own initial reaction of, like, yeah, that sounds great. But now I'm also thinking about it still adds a lot of...the responsibility is still on a human to think of this thing and to go check it. Versus if there's something that gets sent to someone to alert you and say like, "Hey, this queue hasn't been processed in 48 hours. There may be a concern that actually feels nicer." It feels safer.

CHRIS: Oh yeah, definitely. I think observability is this category of tools and workflows and whatnot. But I think what you're describing of proactive alerting that's the ideal. And so it would be wonderful if I never had to look at any of these tools ever. And I just knew if I got, let's say, it's PagerDuty connected up whatever, and I got a push notification from PagerDuty saying, "Hey, go look at this thing." That's all I ever need to think about. It’s like, well, I haven't gotten a PagerDuty in a while, so everything must be fine, and having a deep trust in that.

Similar to like, if we have a great test suite and it's green, I feel confident deploying the sort of absence of an alert being the thing that I can trust. But right now, we're early enough in this journey that I think what we need to do is stand up a bunch of these different graphs and charts and metric analysis and aggregations and whatnot, and then start to squint at it for a while and be like, which of these would I be really concerned if it started to wibble?

And then you can figure the alerting around said wibble rate. And that's the dream. That's where we want to get to, but I think we've got to crawl, walk, run on this. So it'll be an adventure. This is very much the like; we're starting a thing. I'll tell you about it more when we've done it. But what you're describing is exactly what we want to get to.

STEPH: I love wibble rate. That's my new measurement I'm going to start using for everything. It's funny, as you're bringing this up, it's making me think about the past week that Joël Quenneville and I have had with our client work. Because a somewhat similar situation came up in regards where something happened, and something was broken. And it seemed it was hard to define exactly what moment caused that to break and what was going on.

But it had a big impact on the team because it essentially meant none of the bills were going through. And so that's a big situation when you got 100-plus people that are pushing up code and expecting some of the build processes to run. But it was one of those that the more we dug into it, the more it seemed very rare that it would happen.

So, in this case, as a sort of a juxtaposition to your scenario, we actually took the opposite approach of where we're like; this is rare. But we did load up a lot of contexts. Actually, I was thinking back to the advice that you gave me in a previous episode where I was talking about at what point do you dig in versus try to stay at surface level? And this was one of those, like, we've spent a couple of days on getting context for this and understanding. So it felt really important and worthwhile to then invest a little bit more time to then document it.

But then we still went with the simplest approach of like, this is weird. It shouldn't happen again. We think we understand it but then let's add a little bit of documentation or wiki page around like, hey, if you do run into this, here are some steps that will fix everything. And then, if you need to use this, let somebody know because this is so odd it shouldn't happen. So we took that approach in this case where we didn't increase the observability. It was more like we provided a fire extinguisher very close to the location in case it happens. And so that way, it's there should the need arise, but we're hoping it just never gets used.

We're also in the process of changing how a lot of that logic works. So we didn't really want to optimize for observability into a system that is actively being changed because it should look very different in upcoming months. But overall, I love the conversations that you bring about observability, and I'm excited to hear about what wibble rates you decide to add to your Datadog dashboard.

CHRIS: There's a delicate art and science to the selection of the wibble rates. So I will certainly report back as we get into that work. But with that, shall we wrap up?

STEPH: Let's wrap up.

CHRIS: The show notes for this episode can be found at bikeshed.fm.

STEPH: This show is produced and edited by Mandy Moore.

CHRIS: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review on iTunes, as it really helps other folks find the show.

STEPH: If you have any feedback for this or any of our other episodes, you can reach us at @_bikeshed or reach me on Twitter @SViccari.

CHRIS: And I'm @christoomey.

STEPH: Or you can reach us at hosts@bikeshed.fm via email.

CHRIS: Thanks so much for listening to The Bike Shed, and we'll see you next week.

ALL: Byeeeeeeee!!!!!!!!

ANNOUNCER: This podcast was brought to you by thoughtbot. thoughtbot is your expert design and development partner. Let's make your product and team a success.

Support The Bike Shed