359: Serializers
Episode 359 · October 25th, 2022 · 44 mins 10 secs
About this Episode
Chris Toomey is back! (For an episode.) He talks about what he's been up to since handing off the reins to Joël. He's been playing around with something at Sagewell that he enjoys. At the core of it? Serializers.
Primalize gem
Derek's talk on code review
Inertia.js
Phantom types
io-ts
dry-rb
parse don't validate
value objects
broader perspective on parsing
Enumerable#tally
RubyConf mini
where.missing
Transcript:
JOËL: Hello and welcome to another episode of The Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Joël Quenneville. And today, I'm joined by a very special guest, former host Chris Toomey.
CHRIS: Hi, Joël. Thanks for having me.
JOËL: And together, we're here to share a little bit of what we've learned along the way. So, Chris, what's new in your world?
CHRIS: Being on this podcast is new in my world, or everything old is new again, or something along those lines. But, yeah, thank you so much for having me back. It's a pleasure. Although it's very odd, it feels somehow so different and yet very familiar.
But yeah, more generally, what's new in my world? I think this was probably in development as I was winding down my time as a host here on The Bike Shed, but I don't know that I ever got a chance to talk about it. There has been a fun sort of deep-in-the-weeds technical thing that we've been playing around with at Sagewell that I've really enjoyed.
So at the core of it, we have serializers. So we take some data structures in our Ruby on Rails code base, and we need to serialize them to JSON to send them to the front end. In our case, we're using Inertia, so it's not quite a JSON API, but it's fine to think about it in that way for the context of this discussion.
And what we were finding is our front end has TypeScript. So we're writing Svelte, which is using TypeScript. And so we're stating or asserting that the types like, hey, we're going to get this data in from the back end, and it's going to have this shape to it. And we found that it was really hard to keep those in sync to keep, like, what does the user mean on the front end? What's the data that we're going to get? It's going to have a full name, which is a string, except sometimes that might be null. So how do we make sure that those are keeping up to date?
And then we had a growing number of serializers on the back end and determining which serializer we were actually using, and it was just...it was a mess, to put it lightly. And so we had explored a couple of different options around it, and eventually, we found a library called Primalize. So Primalize is a Ruby library. It is for writing JSON serializers. But what's really interesting about it is it has a typing layer. It's like a type system sort of thing at play.
So when you define a serializer in Primalize, instead of just saying, here are the fields; there is an ID, a name, et cetera, you say, there is an ID, and it is a string. There is a name, and it is a string, or an optional string, which is the even more interesting bit. You can say array. You can say object. You can say an enum of a couple of different values. And so we looked at that, and we said, ooh, this is very interesting. Astute listeners will know that this is probably useless in a Ruby system, which doesn't have types or a compilation step or anything like that.
But what's really cool about this is when you use a Primalize serializer, as you're serializing an object, if there is ever a type mismatch, so the observed type at runtime and the authored type if those ever mismatch, then you can have some sort of notification happen. So in our case, we configured it to send a warning to Sentry to say, "Hey, you said the types were this, but we're actually seeing this other thing." Most often, it will be like an Optional, a null sneaking through, a nil sneaking through on the Ruby side.
But what was really interesting is as we were squinting at this, we're like, huh, so now we're going to write all this type information. What if we could somehow get that type information down to the front end? So I had a long weekend, one weekend, and I went away, and I wrote a bunch of code that took all of those serializers, ran through them, and generated the associated TypeScript interfaces. And so now we have a build step that will essentially run that and assert that we're getting the same thing in CI as we have committed to the codebase.
But now we have the generated serializer types on the front end that match to the used serializer on the back end, as well as the observed run-time types. So it's a combination of a true compilation step type system on the front end and a run-time type system on the back end, which has been very, very interesting.
JOËL: I have a lot of thoughts here.
CHRIS: I figured you would. [laughs]
JOËL: But the first thing that came to mind is, as a consultant, there's a scenario with especially smaller startups that generally concerns me, and that is the CTO goes away for a weekend and writes a lot of code...
CHRIS: [laughs]
JOËL: And brings in a new system on Monday, which is exactly what you're describing here. How do you feel about the fact that you've done that?
CHRIS: I wasn't ready to go this deep this early on in this episode.
JOËL: [laughs]
CHRIS: But honestly, that is a fantastic question. It's a thing that I have been truly not struggling with but really thinking about. We're going to go on a slight aside here, but I am finding it really difficult to engage with the actual day-to-day coding work that we're doing and to still stay close to the codebase and not be in the way.
There's a pattern that I've seen happen a number of times now where I pick up a piece of work that is, you know, one of the tickets at the top of the backlog. I start to work on it. I get pulled into a meeting, then another meeting, then three more meetings. And suddenly, it's three days later. I haven't completed this piece of work that was defined to be the next most important piece of work. And suddenly, I'm blocking the team.
JOËL: Hmmm.
CHRIS: So I actually made a rule that I'm not allowed to own critical path work, which feels weird because it's like, I want to be engaged with that work. So the counterpoint to that is I'm now trying to schedule pairing sessions with each of the developers on the team once a week. And in that time, I can work on that sort of stuff with them, and they'll then own it and run with it. So it makes sure that I'm not blocking on those sorts of things, but I'm still connected to the core work that we're doing.
But the other thing that you're describing of the CTO goes away for the weekend and then comes back with a new harebrained scheme; I'm very sensitive to that, having worked on; frankly, I think the same project. I can think of a project that you and I worked on where we experienced this.
JOËL: I think we're thinking of the same project.
CHRIS: So yes. Like, I'm scarred by that and, frankly, a handful of experiences of that nature. So we actually, I think, have a really healthy system in place at Sagewell for capturing, documenting, prioritizing this sort of other work, this developer-centric work. So this is the feature and bug work that gets prioritized and one list over here that is owned by our product manager. Separately, the dev team gets to say, here are the pain points. Here's the stuff that keeps breaking. Here are the things that I wish was better. Here is the observability hard-to-understand bits.
And so we have a couple of different systems at play and recurring meetings and sort of unique ceremonies around that, and so this work was very much a fallout of that. It was actually a recurring topic that we kept trying a couple of different stabs at, and we never quite landed it. And then I showed up this one Monday morning, and I was like, "I found a thing; what do we think?" And then, critically, from there, I made sure I paired with other folks on the team as we pushed on the implementation.
And then, actually, I mentioned Primalize, the library that we're using. We have now since deprecated Primalize within the app because we kept just adding to it so much that eventually, we're like, at this point, should we own this stuff? So we ended up rewriting the core bits of Primalize to better fit our use cases. And now we've actually removed Primalize, wonderful library. I highly recommend it to anyone who has that particular use case but then the additional type generation for the front end.
Plus, we have some custom types within our app, Money being the most interesting one. We decided to model Money as our first-class consideration rather than just letting JavaScript have the sole idea of a number. But yes, in a very long-winded way, yes, I'm very sensitive to the thing you described. And I hope, in this case, I did not fall prey to the CTO goes away for the weekend and made a thing.
JOËL: I think what I'm hearing is the key difference here is that you got buy-in from the team around this idea before you went out and implemented it. So you're not off doing your own things disconnected from the team and then imposing it from on high. The team already agreed this is the thing we want to do, and then you just did it for them.
CHRIS: Largely, yes. Although I will say there are times that each developer on the team, myself included, have sort of gone away, come back with something, and said, "Hey, here's a WIP PR exploring an area." And there was actually...I'm forgetting what the context was, but there was one that happened recently that I introduced. I was like; I had to do this. And the team talked me out of it, and I ended up closing that PR. Someone else actually made a different PR that was an alternative implementation. I was like, no, that's better; we should absolutely do that.
And I think that's really healthy. That's a hard thing to maintain but making sure that everyone feels like they've got a strong voice and that we're considering all of the different ways in which we might consider the work. Most critically, you know, how does this impact users at the end of the day? That's always the primary consideration. How do we make sure we build a robust, maintainable, observable system, all those sorts of things?
And primarily, this work should go in that other direction, but I also don't want to stifle that creative spark of I got this thing in my head, and I had to explore it. Like, we shouldn't then need to never mind, throw away the work, put it into a ticket. Like, for as long as we can, that more organic, intuitive process if we can retain that, I like that. Critically, with the ability for everyone to tell me, "No, this is a bad idea. Stop it. What are you doing?" And that has happened recently. I mean, they were kinder about it, but they did talk me out of a bad idea. So here we are.
JOËL: So you showed up on Monday morning, not with telling everyone, "Hey, I merged this thing over the weekend." You're showing up with a work-in-progress PR.
CHRIS: Yes, definitely. I mean, everything goes through a PR, and everything has discussion and conversation around it. That's a strong, strong like Derek Prior's wonderful talk Building a Culture of Code Review. I forget the exact name of it. But it's one of my favorite talks in talking about the utility of code review as a way to share ideas and all of those wonderful things. So everything goes through code review, and particularly anything that is of that more exploratory architectural space.
Often we'll say any one review from anyone on the team is sufficient to merge most things but something like that, I would want to say, "Hey, can everybody take a look at this? And if anyone has any reservations, then let's talk about it more." But if I or anyone else on the team for this sort of work gets everybody approving it, then cool, we're good to go. But yeah, code review critical, critical part of the process.
JOËL: I'm curious about Primalize, the gem that you mentioned. It sounds like it's some kind of validation layer between some Ruby data structure and your serializers.
CHRIS: It is the serializer, but in the process of serializing, it does run-time type validation, essentially. So as it's accessing, you know, you say first name. You have a user object. You pass it in, and you say, "Serializer, there's a first name, and it's a string." It will call the first name method on that user object. And then, it will check that it has the expected type, and if it doesn't, then, in our case, it sends to Sentry.
We have configured it...it's actually interesting. In development and test mode, it will raise for a type mismatch, and in production mode, it will alert Sentry so you can configure that differently. But that ends up being really nice because these type mismatches end up being very loud early on. And it's surprisingly easy to maintain and ends up telling us a lot of truths about our system because, really, what we're doing is connecting data from many different systems and flowing it in and out. And all of the inputs and outputs from our system feel very meaningful to lock down in this way. But yeah, it's been an adventure.
JOËL: It seems to me there could almost be two sets of types here, the inputs coming into Primalize from your Ruby data structures and then the outputs that are the actual serialized values. And so you might expect, let's say, an integer on the Ruby side, but maybe at the serialization level, you're serializing it to a string. Do you have that sort of conversion step as part of your serializers sometimes, or is the idea that everything's already the right type on the Ruby side, and then we just, like, to JSON it at the end?
CHRIS: Yep. Primalize, I think, probably works a little closer to what you're describing. They have the idea of coercions. So within Primalize, there is the concept of a timestamp; that is one of the types that is available. But a timestamp is sort of the union of a date, a time, or I think they might let through a string; I'm not sure if there is as well. But frankly, for us, that was more ambiguity than we wanted or more blurring across the lines. And in the implementation that we've now built, date and time are distinct. And critically, a string is not a valid date or time; it is a string, that's another thing.
And so there's a bunch of plumbing within the way you define the serializers. There are override methods so that you can locally within the serializer say, like, oh, we need to coerce from the shape of data into this other shape of data, even little like in-line proc, so we can do it quickly. But the idea is that the data, once it has been passed to the serializer, should be up the right shape. And so when we get to the type assertion part of the library, we expect that things are in the asserted type and will warn if not. We get surprisingly few warnings, which is interesting now.
This whole process has made us pay a little more intention, and it's been less arduous simultaneously than I would have expected because like this is kind of a lot of work that I'm describing. And yet it ends up being very natural when you're the developer in context, like, oh, I've been reading these docs for days. I know the shape of this JSON that I'm working with inside and out, and now I'll just write it down in the serializer. It's very easy to do in that moment, and then it captures it and enforces it in such a useful way.
As an aside, as I've been looking at this, I'm like, this is just GraphQL, but inside out, I'm pretty sure. But that is a choice that we have made. We didn't want to adopt the whole GraphQL thing. But just for anyone out there who is listening and is thinking, isn't this just GraphQL but inside out? Kind of. Yes.
JOËL: I think my favorite part of GraphQL is the schema, which is not really the selling point for GraphQL, you know, like the idea that you can traverse the graph and get any subset of data that you want and all that. I think I would be more than happy with a REST API that has some kind of schema built around it. And someone told me that maybe what I really just want is SOAP, and I don't know how to feel about that comment.
CHRIS: You just got to have some XML, and some WSDLs, and other fun things. I've heard people say good things about SOAP. SOAP seems like a fine idea. If anything, I think a critical part of this is we don't have a JSON API. We have a very tightly coupled front end and back end, and a singular front end, frankly. And so that I think naturally...that makes the thing that I'm describing here a much more comfortable fit.
If we had multiple different downstream clients that we're trying to consume from the same back end, then I think a GraphQL API or some other structured JSON schema, whatever it is type of API, and associated documentation and typing layer would be probably a better fit. But as I've said many a time on this here, Bike Shed, Inertia is one of my favorite libraries or frameworks (They're probably more of a framework.) one of my favorite technological approaches that I have ever found.
And particularly in buildings Sagewell, it has allowed us to move so rapidly the idea that changes are, you know, one fell swoop changes everything within the codebase. We don't have to think about syncing deploys for the back end and the front end and how to coordinate across them. Our app is so much easier to understand by virtue of that architecture that Inertia implies.
JOËL: So, if I understand correctly, you don't serialize to JSON as part of the serializers. You're serializing directly to JavaScript.
CHRIS: We do serialize to JSON. At the end of the day, Inertia takes care of this on both the Rails side and the client side. There is a JSON API. Like, if you look at the network inspector, you will see XHR requests happening. But critically, we're not doing that. We're not the ones in charge of it. We're not hitting a specific endpoint.
It feels as an application coder much closer to a traditional Rails app. It just happens to be that we're writing our view layer. Instead of an ERB, we're writing them in Svelte files. But otherwise, it feels almost identical to a normal traditional Rails app with controllers and the normal routing and all that kind of stuff.
JOËL: One thing that's really interesting about JSON as an interchange format is that it is very restrictive. The primitives it has are even narrower than, say, the primitives that Ruby has. So you'd mentioned sending a date through. There is no JSON date. You have to serialize it to some other type, potentially an integer, potentially a string that has a format that the other side knows how it's going to interpret. And I feel like it's those sorts of richer types when we need to pass them through JSON that serialization and deserialization or parsing on the other end become really interesting.
CHRIS: Yeah, I definitely agree with that. It was a struggling point for a while until we found this new approach that we're doing with the serializers in the type system. But so far, the only thing that we've done this with is Money. But on the front end, a while ago, we introduced a specific TypeScript type. So it's a phantom type, and I believe I'm getting this correct. It's a phantom type called Cents, C-E-N-T-S. So it represents...I'm going to say an integer. I know that JavaScript doesn't have integers, but logically, it represents an integer amount of cents. And critically, it is not a number, like, the lowercase number in the type system. We cannot add them together. We can't --
JOËL: I thought you were going to say, NaN.
CHRIS: [laughs] It is not a number. I saw a n/a for not applicable somewhere in the application the other day. I was like, oh my God, we have a NaN? It happened? But it wasn't, it was just n/a, and I was fine. But yeah, so we have this idea of Cents within the application. We have a money input, which is a special input designed exactly for this. So to a user, it is formatted to look like you're entering dollars and cents. But under the hood, we are bidirectionally converting that to the integer amount of cents that we need. And we strictly, within the type system, those are cents.
And you can't do math on Cents unless you use a special set of helper functions. You cannot generate Cents on the fly unless you use a special set of helper functions, the constructor functions. So we've been really restrictive about that, which was kind of annoying because a lot of the data coming from the server is just, you know, numbers.
But now, with this type system that we've introduced on the Ruby side, we can assert and enforce that these are money.new on the Ruby side, so using the Money gem. And they come down to the front end as capital C Cents in the type system on the TypeScript side. So we're able to actually bind that together and then enforce proper usage sort of on both sides.
The next step that we plan to do after that is dates and times. And those are actually almost weirder because they end up...we just have to sort of say what they are, and they will be ISO 8601 date and time strings, respectively. But we'll have functions that know this is a date string; that's a thing. It is, again, a phantom type implemented within our TypeScript type system.
But we will have custom functions that deal with that and really constrain...lock ourselves down to only working with them correctly. And critically, saying that is the only date and time format that we work with; there is no other. We don't have arbitrary dates. Is this a JSON date or something else? I don't know; there are too many date syntaxes.
JOËL: I like the idea of what you're doing in that it sounds like you're very much narrowing that sort of window of where in the stack the data exists in the sort of unstructured, free-floating primitives that could be misinterpreted. And so, at this point, it's almost narrowed to the point where it can't be touched by any user or developer-written code because you've pushed the boundaries on the Rails side down and then on the JavaScript side up to the point where the translation here you define translations on one side or, I guess, a parser on one side and a serializer on the other. And they guarantee that everything is good up until that point.
CHRIS: Yep, with the added fun of the runtime reflection on the Ruby side. So it's an interesting thing. Like, TypeScript actually has similar things. You can say what the type is all day long, and your code will consistently conform to that asserted type. But at the end of the day, if your JSON API gets in some different data...unless you're using a library like io-ts, is one that I've looked at, which actually does parsing and returns a result object of did we parse to the thing that you wanted or did we get an error in that data structure?
So we could get to that level on the client side as well. We haven't done that yet largely because we've essentially pushed that concern up to the Ruby layer. So where we're authoring the data, because we own that, we're going to do it at that level. There are a bunch of benefits of defining it there and then sort of reflecting it down.
But yeah, TypeScript, you can absolutely lie to yourself, whereas Elm, a language that I know you love dearly, you cannot lie to yourself in Elm. You've got to tell the truth. It's the only option. You've got to prove it. Whereas in TypeScript, you can just kind of suggest, and TypeScript will be like, all right, cool, I'll make sure you stay honest on that, but I'm not going to make you prove it, which is an interesting sort of set of related trade-offs there.
But I think we found a very comfortable resting spot for right now. Although now, we're starting to look at the edges of the Ruby system where data is coming in. So we have lots of webhooks and other external partners that we're integrating with, and they're sending us data. And that data is of varying shapes. Some will send us a payload with the word amount, and it refers to an integer amount of cents because, of course, it does. Some will send us the word amount in their payload, and it will be a floating amount of dollars. And I get a little sad on those days.
But critically, our job is to make sure all of those are the same and that we never pass dollars as cents or cents as dollars because that's where things go sad. That is job number one at Sagewell in the engineering team is never get the decimal place wrong in money.
JOËL: That would be a pretty terrible mistake to make.
CHRIS: It would. I mean, it happens. In fintech, that problem comes up a lot. And again, the fact that...I'm honestly surprised to see situations out there where we're getting in floating point dollars. That is a surprise to me because I thought we had all agreed sort of as a community that it was integer cents but especially in a language that has integers. JavaScript, it's kind of making it up the whole time. But Ruby has integers. JSON, I guess, doesn't have integers, so I'm sort of mixing concerns here, but you get the idea.
JOËL: Despite Ruby not having a static type system, I've found that generally, when I'm integrating with a third-party API, I get to the point where I want something that approximates like Elm's JSON decoders or io-ts or something like that. Because JSON is just a big blob of data that could be of any shape, and I don't really trust it because it's third-party data, and you should not trust third parties. And I find that I end up maybe cobbling something together commonly with like a bunch of usage of hash.fetch, things like that. But I feel like Ruby doesn't have a great approach to parsing and composing these validators for external data.
CHRIS: Ruby as a language certainly doesn't, and the ecosystem, I would say, is rather limited in terms of the options here. We have looked a bit at the dry-rb stack of gems, so dry-validation and dry-schema, in particular, both offer potentially useful aspects. We've actually done a little bit of spiking internally around that sort of thing of, like, let's parse this incoming data instead of just coercing to hash and saying that it's got probably the shape that we want. And then similarly, I will fetch all day instead of digging because I want to be quite loud when we get it wrong.
But we're already using dry-monads. So we have the idea of result types within the system. We can either succeed or fail at certain operations. And I think it's just a little further down the stack. But probably something that we will implement soon is at those external boundaries where data is coming in doing some form of parsing and validation to make sure that it conforms to unknown data structure. And then, within the app, we can do things more cleanly.
That also would allow us to, like, let's push the idea that this is floating point dollars all the way out to the edge. And the minute it hits our system, we convert it into a money.new, which means that cents are properly handled. It's the same type of money or dollar, same type of currency handling as everywhere else in the app. And so pushing that to the very edges of our application is a very interesting idea. And so that could happen in the library or sort of a parsing client, I guess, is probably the best way to think about it. So I'm excited to do that at some point.
JOËL: Have you read the article, Parse, Don't Validate?
CHRIS: I actually posted that in some code review the other day to one of the developers on the team, and they replied, "You're just going to quietly drop one of my favorite articles of all time in code review?" [laughs] So yes, I've read it; I love it. It's a wonderful idea, definitely something that I'm intrigued by. And sort of bringing dry-monads into Ruby, on the one hand, feels like a forced fit and yet has also been one of the other, I think strongest sort of architectural decisions that we've made within the application.
There's so much imperative work that we ended up having to do. Send this off to this external API, then tell this other one, then tell this other one. Put the whole thing in a transaction so that our local data properly handles it. And having dry-monads do notation, in particular, to allow us to make that manageable but fail in all the ways it needs to fail, very expressive in its failure modes, that's been great. And then parse, don't validate we don't quite do it yet. But that's one of the dreams of, like, our codebase really should do that thing. We believe in that. So let's get there soon.
JOËL: And the core idea behind parse, don't validate is that instead of just having some data that you don't trust, running a check on it and passing that blob of now checked but still untrusted data down to the next person who might also want to check it. Generally, you want to pass it through some sort of filter that will, one, validate that it's correct but then actually typically convert it into some other trusted shape.
In Ruby, that might be something like taking an amorphous blob of JSON and turning it into some kind of value object or something like that. And then anybody downstream that receives, let's say, money object can trust that they're dealing with a well-formed money value as opposed to an arbitrary blob of JSON, which hopefully somebody else has validated, but who knows? So I'm going to validate it again.
CHRIS: You can tell that I've been out of the podcasting game for a while because I just started responding to yes; I love that blog post without describing the core premise of it. So kudos to you, Joël; you are a fantastic podcast host over there.
I will say one of the things you just described is an interesting...it's been a bit of a struggle for us. We keep sort of talking through what's the architecture. How do we want to build this application? What do we care about? What are the things that really matter within this codebase, and then what is all the other stuff? And we've been good at determining the things that really matter, thinking collectively as a group, and I think coming up with some novel, useful, elegant...I'm saying too many positive adjectives for what we're doing.
But I've been very happy with sort of the thing that we decide. And then there's the long-tail work of actually propagating that change throughout the rest of the application. We're, like, okay, here's how it works. Every incoming webhook, we now parse and yield a value object. That sentence that you just said a minute ago is exactly what I want. That's like a bunch of work. It's particularly a bunch of work to convert an existing codebase. It's easy to say, okay, from here forward, any new webhooks, payloads that are coming in, we're going to do in this way. But we have a lot of things in our app now that exist in this half-converted way.
There was a brief period where we had three different serializer technologies at play. Just this week, I did the work of killing off the middle ground one, the Primalized-based thing, and we now have only our new hotness and then the very old. We were using Blueprinter as the serializer as the initial sort of stub. And so that still exists within the codebase in some places. But trying to figure out how to prioritize that work, the finishing out those maintenance-type conversions is a tricky one. It's never the priority. But it is really nice to have consistency in a codebase. So it's...yeah, do you have any thoughts on that?
JOËL: I think going back to the article and what the meaning of parsing is, I used to always think of parsing as taking strings and turning them into something else, and I think this really broadened my perspective on the idea of parsing. And now, I think of it more as converting from a broader type to a narrower type with failures.
So, for example, you could go from a string to an integer, and not all strings are valid integers. So you're narrowing the type. And if you have the string hello world, it will fail, and it will give you an error of some type. But you can have multiple layers of that. So maybe you have a string that you parse into an integer, but then, later on, you might want to parse that integer into something else that requires an integer in a range. Let's say it's a percentage. So you have a value object that is a percentage, but it's encoded in the JSON as a string.
So that first pass, you parse it from a string into an integer, and then you parse that integer into a percentage object. But if it's outside the range of valid percentage numbers, then maybe you get an error there as well. So it's a thing that can happen at multiple layers. And I've now really connected it with the primitive obsession smell in code. So oftentimes, when you decide, wait, I don't want a primitive here; I want a richer type, commonly, there's going to be a parsing step that should exist to go from that primitive into the richer type.
CHRIS: I like that. That was a classic Joël wildly concise summary of a deeply complex technical topic right there.
JOËL: It's like I'm going to connect some ideas from functional programming and a classic object-oriented code smell and, yeah, just kind of mash it all together with a popular article.
CHRIS: If only you had a diagram. Podcast is not the best medium for diagrams, but I think you could do it. You could speak one out loud, and everyone would be able to see it in their mind's eye.
JOËL: So I will tell you what my diagram is for this because I've actually created it already. I imagine this as a sort of like pyramid with different layers that keep getting smaller and smaller. So the size of type is sort of the width of a layer. And so your strings are a very wide layer. Then on top of that, you have a narrower layer that might be, you know, it could be an integer, or you could even if you're parsing JSON, you first start with a string, then you parse that into a Ruby hash, not all strings are valid hashes. So that's going to be narrower.
Then you might extract some values out of that hash. But if the keys aren't right, that might also fail. You're trying to pull the user out of it. And so each layer it gets a richer type, but that richer type, by virtue of being richer, is narrower. And as you're trying to move up that pyramid at every step, there is a possibility for a failure.
CHRIS: Have you written a blog post about this with said diagram in it? And is that why you have that so readily at hand? [laughs]
JOËL: Yes, that is the case.
CHRIS: Okay. Yeah, that made sense to me. [laughs]
JOËL: We'll make sure to link to it in the show notes.
CHRIS: Now you have to link to Joël blog posts, whereas I used to have to link to them [chuckles] in almost every episode of The Bike Shed that I recorded.
JOËL: Another thing I've been thinking about in terms of this parsing is that parsing and serializing are, in a sense, almost opposites of each other. Typically, when you're parsing, you're going from a broad type to a narrow one. And when you're serializing, you're going from a narrow type to a broader one. So you might go from a user into a hash into a string. So you're sort of going down that pyramid rather than going up.
CHRIS: It is an interesting observation and one that immediately my brain is like, okay, cool. So can we reuse our serializers but just run them in reverse or? And then I try and talk myself out of that because that's a classic don't repeat yourself sort of failure mode of, like, actually, it's fine. You can repeat a little bit. So long as you can repeat and constrain, that's a fine version. But yeah, feels true, though, at the core.
JOËL: I think, in some ways, if you want a single source of truth, what you want is a schema, and then you can derive serializers and parsers from that schema.
CHRIS: It's interesting because you used the word derive. That has been an interesting evolution at Sagewell. The engineering team seems to be very collected around the idea of explicitness, almost the Zen of Python; explicit is better than implicit. And we are willing to write a lot of words down a lot of times and be happy with that. I think we actually made the explicit choice at one point that we will not implement an automatic camel case conversion in our serializer, even though we could; this is a knowable piece of code.
But what we want is the grepability from the front end to the back end to say, like, where's this data coming from? And being able to say, like, it is this data, which is from this serializer, which comes from this object method, and being able to trace that very literally and very explicitly in the code, even though that is definitely the sort of thing that we could derive or automatically infer or have Ruby do that translation for us.
And our codebase is more verbose and a little noisier. But I think overall, I've been very happy with it, and I think the team has been very happy. But it is an interesting one because I've seen plenty of teams where it is the exact opposite. Any repeated characters must be destroyed. We must write code to write the code for us. And so it's fun to be working with a team where we seem to be aligned around an approach on that front.
JOËL: That example that you gave is really interesting because I feel like a common thing that happens in a serialization layer is also a form of normalization. And so, for example, you might downcase all strings as part of the serialization, definitely, like dates always get written in ISO 8601 format whenever that happens. And so, regardless of how you might have it stored on the Ruby side, by the time it gets to the JSON, it's always in a standard format. And it sounds like you're not necessarily doing that with capitalization.
CHRIS: I think the distinction would be the keys and the values, so we are definitely doing normalization on the values side. So ISO 8601 date and time strings, respectively that, is the direction that we plan to go for the value. But then for the key that's associated with that, what is the name for this data, those we're choosing to be explicit and somewhat repetitive, or not even necessarily repetitive, but the idea of, like, it's first_name on the Ruby side, and it's first capital N name camel case, or it's...I forget the name. It's not quite camel case; it's a different one but lower camel, maybe.
But whatever JavaScript uses, we try to bias towards that when we're going to the front end. It does get a little tricky coming back into the Ruby side. So our controllers have a bunch of places where they need to know about what I think is called lower camel case, and so we're not perfect there. But that critical distinction between sort of the names for things, and the values for things, transformations, and normalizations on the values, I'm good with that. But we've chosen to go with a much more explicit version for the names of things or the keys in JSON objects specifically.
JOËL: One thing that can be interesting if you have a normalization phase in your serializer is that that can mean that your serializer and parsers are not necessarily symmetric. So you might accept malformed data into your parser and parse it correctly. But then you can't guarantee that the data that gets serialized out is going to identically match the data that got parsed in.
CHRIS: Yeah, that is interesting. I'm not quite sure of the ramifications, although I feel like there are some. It almost feels like formatting Prettier and things like that where they need to hold on to whitespace in some cases and throw out in others. I'm thinking about how ASTs work. And, I don't know, there's interesting stuff, but, again, not sure of the ramifications.
But actually, to flip the tables just a little bit, and that's an aggressive terminology, but we're going to roll with it. To flip the script, let's go with that, Joël; what's been up in your world? You've been hosting this wonderful show. I've listened in to a number of episodes. You're doing a fantastic job. I want to hear a little bit more of what's new in your world, Joël.
JOËL: So I've been working on a project that has a lot of flaky tests, and we're trying to figure out the source of that flakiness. It's easy to just dive into, oh, I saw a flaky Test. Let me try to fix it. But we have so much flakiness that I want to go about it a little bit more systematically. And so my first step has actually been gathering data. So I've actually been able to make API requests to our CI server. And the way we figure out flakiness is looking at the commit hash that a particular test suite run has executed on.
And if there's more than one CI build for a given commit hash, we know that's probably some kind of flakiness. It could be a legitimate failure that somebody assumed was flakiness, and so they just re-run CI. But the symptom that we are trying to address is the fact that we have a very high level of people re-verifying their code. And so to do that or to figure out some stats, I made a request to the API grouped by commit hash and then was able to get the stats of how many re-verifications there are and even the distribution.
The classic way that you would do that is in Ruby; you would use the GroupBy function from enumerable. And then, you would transform values instead of having, like, say; each commit hash then points to all the builds, an array of builds that match that commit hash. You would then thumb those. So now you have commit hashes that point to counts of how many builds there were for that commit hash. Newer versions of Ruby introduced the tally method, which I love, which allows you to basically do all of that in one step.
One thing that I found really interesting, though, is that that will then give me a hash of commit hashes that point to the number of builds that are there. If I want to get the distribution for the whole project over the course of, say, the last week, and I want to say, "How many times do people run only one CI run versus running twice in the same commit versus running three times, or four times, or five or six times?" I want to see that distribution of how many times people are rerunning their build. You're effectively doing that tally process twice.
So once you have a list of all the builds, you group by hash. You count, and so you end up with that. You have the Ruby hash of commit SHAs pointing to number of times the build was run on that. And then, you again group by the number of builds for each commit SHA. And so now what you have is you'll have something like one, and then that points to an array of SHA one, SHA two, SHA three, SHA four like all the builds.
And then you tally that again, or you transform values, or however, you end up doing it. And what you end up with is saying for running only once, I now have 200 builds that ran only once. For running twice in the same commit SHA, there are 15. For running three times, there are two. For running four times, there is one. And now I've got my distribution broken down by how many times it was run. It took me a while to work through all of that. But now the shortcut in my head is going to be you double tally to get distribution.
CHRIS: As an aside, the whole everything you're talking about is interesting and getting to that distribution. I feel like I've tried to solve that problem on data recently and struggled with it. But particularly tally, I just want to spend a minute because tally is such a fantastic addition to the Ruby standard library. I used to have in sort of like loose muscle memory transform value is grouped by ampersand itself, transform values count, sort, reverse to H. That whole string of nonsense gets replaced by tally, and, oof, what a beautiful example of Ruby, and enumerable, and all of the wonder that you can encapsulate there.
JOËL: Enumerable is one of the best parts of Ruby. I love it so much. It was one of the first things that just blew my mind about Ruby when I started. I came from a PHP, C++ background and was used to writing for loops for everything and not the nice for each loops that a lot of languages have these days. You're writing like a legit for or while loop, and you're managing the indexes yourself. And there's so much room for things to go wrong.
And being introduced to each blew my mind. And I was like, this is so beautiful. I'm not dealing with indexes. I'm not dealing with the raw implementation of the array. I can just say do a thing for each element. This is amazing. And that is when I truly fell in love with Ruby.
CHRIS: I want to say I came from Python, most recently before Ruby. And Python has pretty nice list comprehensions and, in fact, in some ways, features that enumerable doesn't have. But, still, coming to Ruby, I was like, oh, this enumerable; this is cool. This is something. And it's only gotten better. It still keeps growing, and the idea of custom enumerables. And yeah, there's some real neat stuff in there.
JOËL: I'm going to be speaking at RubyConf Mini this fall in November, and my talk is all about Enumerators and ranges in enumerable and ways you can use those to make the APIs of the objects that you create delightful for other people to use.
CHRIS: That sounds like a classic Joël talk right there that I will be happy to listen to when it comes out. A very quick related, a semi-related aside, so, tally, beautiful addition to the Ruby language. On the Rails side, there was one that I used recently, which is where.missing. Have you seen where.missing?
JOËL: I have not heard of this.
CHRIS: So where.missing is fantastic. Let's assume you've got two related objects, so you've got like a has many blah, so like a user has many posts. I think you can...if I'm remembering it correctly, it's User.where.missing(:posts). So it's where dot missing and then parentheses the symbol posts. And under the hood, Rails will do the whole LEFT OUTER JOIN where the count is null, et cetera. It turns into this wildly complex SQL query or understandably complex, but there's a lot going on there. And yet it compresses down so elegantly into this nice, little ActiveRecord bit.
So where.missing is my new favorite addition into the Rails landscape to complement tally on the Ruby side, which I think tally is Ruby 2.7, I want to say. So it's been around for a while. And where.missing might be a Ruby 7 feature. It might be a six-something, but still, wonderful features, ever-evolving these tool sets that we use.
JOËL: One of the really nice things about enumerable and family is the fact that they build on a very small amount of primitives, and so as long as you basically understand blocks, you can use enumerable and anything in there. It's not special syntax that you have to memorize. It's just regular functions and blocks.
Well, Chris, thank you so much for coming back for a visit. It's been a pleasure. And it's always good to have you share the cool things that you're doing at Sagewell.
CHRIS: Well, thank you so much, Joël. It's been an absolute pleasure getting to come back to this whole Bike Shed. And, again, just to add a note here, you're doing a really fantastic job with the show. It's been interesting transitioning back into listener mode for the show. Weirdly, I wasn't listening when I was a host. But now I've regained the ability to listen to The Bike Shed and really enjoy the episodes that you've been doing and the wonderful spectrum of guests that you've had on and variety of topics. So, yeah, thank you for hosting this whole Bike Shed. It's been great.
JOËL: And with that, let's wrap up.
The show notes for this episode can be found at bikeshed.fm.
This show is produced and edited by Mandy Moore.
If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show.
If you have any feedback, you can reach us at @_bikeshed, or reach me at @joelquen on Twitter, or at hosts@bikeshed.fm via email. Thank you so much for listening to The Bike Shed, and we'll see you next week. Byeeeeeeeeeee!!!!!!!!
ANNOUNCER: This podcast was brought to you by thoughtbot. thoughtbot is your expert design and development partner. Let's make your product and team a success.
Support The Bike Shed