410: All About Documentation

Episode 410 · December 12th, 2023 · 32 mins 2 secs

About this Episode

Joël shares his experiences with handling JSON in a Postgres database. He talks about his challenges with ActiveRecord and JSONB columns, particularly the unexpected behavior of storing and retrieving JSON data. Stephanie shares her recent discovery of bookmarklets and highlights a bookmarklet named "Check This Out," which streamlines searching for books on Libby, an ebook and audiobook lending app.

The conversation shifts to using constants in code as a form of documentation. Stephanie and Joël discuss how constants might not always accurately reflect current system behavior or logic, leading to potential misunderstandings and the importance of maintaining accurate documentation.


STEPHANIE: Hello and welcome to another episode of the Bike Shed, a weekly podcast from your friends at thoughtbot about developing great software. I'm Stephanie Minn.

JOËL: And I'm Joël Quenneville. And together, we're here to share a bit of what we've learned along the way.

STEPHANIE: So, Joël, what's new in your world?

JOËL: What's new in my world is JSON and how to deal with it in a Postgres database. So, I'm dealing with a situation where I have an ActiveRecord model, and one of the columns is a JSONB column. And, you know, ActiveRecord is really nice. You can just throw a bunch of different data at it, and it knows the column type, and it will do some conversions for you automatically.

So, if I'm submitting a form and, you know, form values might come in as strings because, you know, I typed in a number in a text field, but ActiveRecord will automatically parse that into an integer because it knows we're saving that to an integer column. So, I don't need to do all these, like, manual conversions.

Well, I have a form that has a string of JSON in it that I'm trying to save in a JSONB column. And I expected ActiveRecord to just parse that into a hash and store it in Postgres. That is not what happens. It just stores a raw string, so when I pull it out again, I don't have a hash. I have a raw string that I need to deal with. And I can't query it because, again, it is a raw string. So, that was a bit of an unexpected behavior that I saw there.

STEPHANIE: Yeah, that is unexpected. So, is this a field that has been used for a while now? I'm kind of surprised that there hasn't been already some implementations for, like, deserializing it.

JOËL: So, here's the thing: I don't think you can have an automatic deserialization there because there's no way of knowing whether or not you should be deserializing. The reason is that JSON is not just objects or, in Ruby parlance, hashes. You can also have arrays. But just raw numbers not wrapped in hashes are also valid JSON as are raw strings.

And if I just give you a string and say, put this in a JSON field, you have no way of knowing, is this some serialized JSON that you need to deserialize and then save? Or is it just a string that you should save because strings are already JSON? So, that's kind of on you as the programmer to make that distinction because you can't tell at runtime which one of these it is.

STEPHANIE: Yeah, you're right. I just realized it's [laughs] kind of, like, an anything goes [laughs] situation, not anything but strings are JSON, are valid JSON, yep [laughs]. That sounds like one of those things that's, like, not what you think about immediately when dealing with that kind of data structure, but...

JOËL: Right. So, the idea that strings are valid JSON values, but also all JSON values can get serialized as strings. And so, you never know: are you dealing with an unserialized string that's just a JSON value, or are you dealing with some JSON blob that got serialized into a string? And only in one of those do you want to then serialize before writing into the database.

STEPHANIE: So, have you come to a solution or a way to make your problem work?

JOËL: So, the solution that I did is just calling a JSON parse before setting that attribute on my model because this value is coming in from a form. I believe I'm doing this when I'm defining the strong parameters for that particular form. I'm also transforming that string by parsing it into a hash with the JSON dot parse, which then gets passed to the model. And then I'm not sure what JSONB serializes as under the hood. When you give it a hash, it might store it as a string, but it might also have some kind of binary format or some internal AST that it uses for storage. I'm not sure what the implementation is.

STEPHANIE: Are the values in the JSONB something that can be variable or dynamic? I've seen some people, you know, put that in getter so that it's just kind of done for you for anyone who needs to access that field.

JOËL: Right now, there is a sort of semi-consistent schema to that. I think it will probably evolve to where I'll pull some of these out to be columns on the table. But it is right now kind of an everything else sort of dumping ground from an API.

STEPHANIE: Yeah, that's okay, too, sometimes [laughs].

JOËL: Yeah. So, interesting journey into some of the fun edge cases of dealing with a format whose serialized form is also a valid instance of that format. What's been new in your world?

STEPHANIE: So, I discovered something new that has been around on the internet for a while, but I just haven't been aware of it. Do you know what a bookmarklet is?

JOËL: Oh, like a JavaScript code that runs in a bookmark?

STEPHANIE: Yeah, exactly. So, you know, in your little browser bookmark where you might normally put a URL, you can actually stick some JavaScript in there. And it will run whenever you click your bookmark in your browser [chuckles]. So, that was a fun little internet tidbit that I just found out about. And the reason is because I stumbled upon a bookmarklet made by someone. It's called Check This Out.

And what it does is there's another app/website called Libby that is used to check out ebooks and audiobooks for free from your local public library. And what this Check This Out bookmarklet does is you can kind of select any just, like, text on a web page, and then when you click the bookmarklet, it then just kind of sticks it into the query params for Libby's search engine. And it takes you straight to the results for that book or that author, and it saves you a few extra manual steps to go from finding out about a book to checking it out.

So, that was really neat and cute. And I was really surprised that you could do that. I was like, whoa [laughs]. At first, I was like, is this okay? [laughs] If you, like, you can't read, you know, you don't know what the JavaScript is doing, I can see it being a little sketchy. But –-

JOËL: Be careful of executing arbitrary JavaScript.

STEPHANIE: Yeah, yeah. When I did look up bookmarklets, though, I kind of saw that it was, you know, just kind of a fun thing for people who might be learning to code for the first time to play around with. And some fun ideas they had for what you could do with it was turning all the font on a web page to Comic Sans [laughs]. So yeah, I thought that was really cute.

JOËL: Has that inspired you to write your own?

STEPHANIE: Well, we did an episode a while ago on productivity tricks. And I was thinking like, oh yeah, there's definitely some things that I could do to, you know, just stick some automated tasks that I have into a bookmarklet. And that could be a really fun kind of, like, old-school way of doing it, as opposed to, you know, coding my little snippets or getting into a new, like, Omnibar app [laughs].

JOËL: So, something that is maybe a little bit less effort than building yourself a browser extension or something like that.

STEPHANIE: Yeah, exactly.

JOËL: I had a client project once that involved a...I think it was, like, a five-step wizard or something like that. It was really tedious to step through it all to manually test things. And so, I wrote a bookmarklet that would just go through and fill out all the fields and hit submit on, like, five pages worth of these things. And if anything didn't work, it would just pause there, and then you could see it. In some way, it was moving towards the direction of, like, an automated like Capybara style test. But this was something that was helping for manual QA. So, that was a really fun use of a bookmarklet.

STEPHANIE: Yeah, I like that. Like, just an in-between thing you could try to speed up that manual testing without getting into, like you said, an automated test framework for your browser.

JOËL: The nice thing about that is that this could be used without having to set up pretty much anything, right? You paste a bit of JavaScript into your bookmark bar, and then you just click the button. That's all you need to do. No need to make sure that you've got Ruby installed on your machine or any of these other things that you would need for some kind of testing framework. You don't need Selenium. You don't need ChromeDriver. It just...it works.

So, I was working...this was a greenfield startup project. So, I was working with a non-technical founder who didn't have all these things, you know, dev tooling on his machine. So, he wanted to try out things but not spend his days filling out forms. And so, having just a button he could click was a really nice shortcut.

STEPHANIE: That's really cool. I like that a lot. I wasn't even thinking about how I might be able to bring that in more into just my daily work, as opposed to just something kind of fun. But that's an awesome idea. And I hope that maybe I'll have a good use for one in the future.

JOËL: It feels like the thing that has a lot of potential, and yet I have not since written...I don't think I've written any bookmarklets for myself. It feels like it's the kind of thing where I should be able to do this for all sorts of fun tooling and just automate my life away. Somehow, I haven't done that.

STEPHANIE: Bring back the bookmarklet [laughs]. That's what I have to say.

JOËL: So, I mentioned earlier that I was working with a JSONB column and storing JSON on an ActiveRecord model. And then I wanted to interact with it, but the problem is that this JSON is somewhat arbitrary, and there are a lot of magic strings in there. All of the key names might change. And I was really concerned that if the schema of that JSON ever changed, if we changed some of the key names or something like that, we might accidentally break code in multiple parts of the app.

So, I was very careful while building that model to quarantine any references to any raw strings only within that model, which meant that I leaned really heavily on constants. And, in some way, those constants end up kind of documenting what we think the schema of that JSON should be. And that got me thinking; you were telling me recently about a scenario where some code you were working with relied heavily on constants as a form of documentation, and that documentation kind of lied to you.

STEPHANIE: Yeah, it did. And I think you mentioned something that I wanted to point out, which is that the magic strings that you think might change, and you wanted to pull that out into a constant, you know, so at least it's kind of defined in one place. And if it ever does change, you know, you don't have to change it in all of those places.

And I do think that, normally, you know, if there's opportunities to extract those magic strings and give a name to them, that is beneficial. But I was gripping a little bit about when constants become, I guess, like, too wieldy, or there's just kind of, like, too much of a dependency on them as the things documenting how the app should work when it's constantly changing. I realized that I just used constant and constantly [laughs].

JOËL: The only constant is that it is not constant.

STEPHANIE: Right. And so, the situation that I found myself in—this was on a client project a little bit ago—was that the constants became, like, gatekeepers of that logic where dev had to change it if the app's behavior changed, and maybe we wanted to change the value of it. And also, one thing that I noticed a lot was that we, as developers, were getting questions about, "Hey, like, how does this actually work?"

Like, we were using the constants for things like pricing of products, for things like what is a compatible version for this feature. And because that was only documented in the code, other people who didn't have access to it actually were left in the dark. And because those were changing with somewhat frequency, I was just kind of realizing how that was no longer working for us.

JOËL: Would you say that some of these values that we stored as constants were almost more like config rather than constants or maybe they're just straight-up application data? I can imagine something like price of an item you probably want that to be a value in the database that can be updated by an admin. And some of these other things maybe are more like config that you change through some kind of environment variable or something like that.

STEPHANIE: Yeah, that's a good point. I do think that they evolved to become things that needed to be configured, right? I suppose maybe there wasn't as much information or foresight at the beginning of like, oh, this is something that we expect to change. But, you know, kind of when you're doing that first pass and you're told, like, hey, like, this value should be the price of something, or, like, the duration of something, or whatever that may be. It gets codified [chuckles]. And there is some amount of lift to change it from something that is, at first, just really just documenting what that decision was at the time to something that ends up evolving.

JOËL: How would you draw a distinction between something that should be a constant versus something that maybe would be considered config or some other kind of value? Because it's pretty easy, right? As developers, we see magic numbers. We see magic strings. And our first thought is, oh, we've seen this problem before—constant. Do you have maybe a personal heuristic for when to reach for a constant versus when to reach for something else?

STEPHANIE: Yeah, that's a good question. I think when I started to see it a lot was especially when the constants were arrays or hashes [laughs]. And I guess that is actually kind of a signal, right? You will likely be adding more stuff [laughs] into that data structure [laughs]. And, again, like, maybe it's okay, like, the first couple of times. But once you're seeing that request happen more frequently, that could be a good way to advocate for storing it in the database or, like, building a lightweight admin kind of thing so that people outside of the dev team can make those configuration changes.

I think also just asking, right? Hey, like, how often do we suspect this will change? Or what's on the horizon for the product or the team where we might want to introduce a way to make the implementation a bit more flexible to something that, you know, we think we know now, but we might want to adjust for?

JOËL: So, it's really about change and how much we think this might change in the future.

STEPHANIE: Speaking of change, this actually kind of gets into the broader topic of documentation and how to document a changing and evolving entity [chuckles], you know, that being, like, the codebase or the way that decisions are made that impact how an application works. And you had shared, in preparation for this topic, an article that I read and enjoyed called Hierarchy of Documentation.

And one thing that I liked about it is that it kind of presented all of the places that you could put information from, you know, straight in the code, to in your commit messages, to your issue management system, and to even wikis for your repo or your team. And I think that's actually something that we would want to share with new developers, you know, who might be wondering, like, where do I find or even put information? I really liked how it was kind of, like, laid out and gave, like, different reasons for where you might want to put something or not.

JOËL: We think a lot about documentation as code writers. I'm curious what your experience is as a code reader. How do you tend to try to read code and understand documentation about how code works? And, apparently, the answer is, don't read the constants because these constants lie.

STEPHANIE: I think you are onto something, though, because I was just thinking about how distrustful I've become of certain types of documentation. Like, when I think of code comments, on one hand, they should be a signal, right? They should kind of draw your attention to something maybe weird or just, like, something to note about the code that it's commenting on, or where it's kind of located in a file.

But I sometimes tune them out, I'm not going to lie. When I see a really big block of code [chuckles] comment, I'm like, ugh, like, do I really have to read all of this? I'm also not positive that it's still relevant to the code below it, right? Like, I don't always have git blame, like, visually enabled in my editor. But oftentimes, when I do a little bit of digging, that comment is left over from maybe when that code was initially introduced. But, man, there have been lots of commits [chuckles] in the corresponding, you know, like, function sense, and I'm not really sure how relevant it is anymore.

Do you struggle with the signal versus noise issue with code comments? How much do you trust them, and how much do you kind of, like, give credence to them?

JOËL: I think I do tend to trust them with maybe some slight skepticism. It really depends on the codebase. Some codebases are really bad sort of comment hygiene and just the types of comments that they put in there, and then others are pretty good at it. The ones that I tend to particularly appreciate are where you have maybe some, like, weird function and you're like, what is going on here? And then you've got a nice, little paragraph up top explaining what's going on there, or maybe an explanation of ways you might be tempted to modify that piece of code and, like, why it is the way it is.

So, like, hey, you might be wanting to add an extra branch here to cover this edge case. Don't do that. We tried it, and it causes problems for XY reasons. And sometimes it might be, like, a performance thing where you say, look, the code quality person in you is going to look at this and say, hey, this is hard to read. It would be better if we did this more kind of normalized form. Know that we've particularly written this in a way that's hard to read because it is more performant, and here are the numbers. This is why we want it in this way. Here's a link to maybe the issue, or the commit, or whatever where this happened.

And then if you want to start that discussion up again and say, "Hey, do we really need performance here at the cost of readability?", you can start it up again. But at least you're not going to just be like, oh, while I'm here, I'm going to clean up this messy code and accidentally cause a regression.

STEPHANIE: Yeah. I like what you said about comment hygiene being definitely just kind of, like, variable depending on the culture and the codebase.

JOËL: I feel like, for myself, I used to be pretty far on the spectrum of no comments. If I feel the need to write a comment, that's a smell. I should find other ways to communicate that information. And I think I went pretty far down that extreme, and then I've been slowly kind of coming back. And I've probably kind of passed the center, where now I'm, like, slightly leaning towards comments are actually nice sometimes. And they are now a part of my toolkit. So, we'll see if I keep going there.

Maybe I'll hit some point where I realize that I'm putting too much work into comments or comments are not being helpful, and I need to come back towards the center again and focus on other ways of communicating. But right now, I'm in that phase of doing more comments than I used to. How about you? Where do you stand on that sort of spectrum of all information should be communicated in code tokens versus comments?

STEPHANIE: Yeah, I think I'm also somewhere in the middle. I think I have developed an intuition of when it feels useful, right? In my gut, I'm like, oh, I'm doing something weird. I wish I didn't have to do this [chuckles]. I think it's another kind of intuition that I have now. I might leave a comment about why, and I think that is more of that signal, right?

Though I also recently have been using them more as just, like, personal notes for myself as I'm, you know, in my normal development workflow, and then I will end up cleaning them up later. I was working on a codebase where there was a soft delete functionality. And that was just, like, a concern that was included in some of the models. And I didn't realize that that's what was going on. So, when I, you know, I was calling destroy, I thought it was actually being deleted, and it turns out it wasn't. And so, that was when I left a little comment for myself that was like, "Hey, like, this is soft deleted."

And some of those things I do end up leaving if I'm like, yes, other people won't have the same context as me. And then if it's something that, like, well, people who work in this app should know that they have soft delete, so then I'll go ahead and clean that up, even though it had been useful for me at the time.

JOËL: Do you capture that information and then put it somewhere else then? Or is it just it was useful for you as a stepping-stone on the journey but then you don't need it at the end and nobody else needs to care about it?

STEPHANIE: Oh, you know what? That's actually a really great point. I don't think I had considered saving that information. I had only thought about it as, you know, just stuff for me in this particular moment in time. But that would be really great information to pull out and put somewhere else [chuckles], perhaps in something like a wiki, or like a README, or somewhere that documents things about the system as a whole. Yeah, should we get into how to document kind of, like, bigger-picture stuff?

JOËL: How do you feel about wikis? Because I feel like I've got a bit of a love-hate relationship with them.

STEPHANIE: I've seen a couple of different flavors of them, right? Sometimes you have your GitHub wiki. Sometimes you have your Confluence ecosystem [laughs]. I have found that they work better if they're smaller [laughs], where you can actually, like, navigate them pretty well, and you have a sense of what is in there, as opposed to it just being this huge knowledge base that ends up actually, I think, working against you a little bit [laughs].

Because so much information gets duplicated if it's hard to find and people start contributing to it maybe without keeping in mind, like, the audience, right? I've seen a lot of people putting in, like, their own personal little scripts [laughs] in a wiki, and it works for them but then doesn't end up working for really anyone else. What's your love-hate relationship to them?

JOËL: I think it's similar to what you were saying, a little bit of structure is nice. When they've just become dumping grounds of information that is maybe not up to date because over the course of several years, you end up with a lot of maybe conflicting articles, and you don't know which one is the right thing to do, it becomes hard to find things.

So, when it just becomes a dumping ground for random information related to the company or the app, sometimes it becomes really challenging to find the information I need and to find information that's relevant, to the point where oftentimes looking something up in the wiki is my last resort. Like, I'm hoping I will find the answer to my question elsewhere and only fallback to the wiki if I can't.

STEPHANIE: Yeah, that's, like, the sign that the wiki is really not trustworthy. And it kind of is diminishing returns from there a bit. I think I fell into this experience on my last project where it was a really, really big wiki for a really big codebase for a lot of developers. And there was kind of a bit of a tragedy of the commons situation, where on one hand, there were some things that were so manual that the steps needed to be very explicitly documented, but then they didn't work a lot of the time [laughs].

But it was hard to tell if they weren't working for you or because it was genuinely something wrong with, like, the way the documentation laid out the steps. And it was kind of like, well, I'm going to fix it for myself, but I don't know how to fix it for everyone else. So, I don't feel confident in updating this information.

JOËL: I think that's what's really nice about the article that you mentioned about the hierarchy of documentation. It's that all of these different forms—code, comments, commit messages, pull requests, wikis—they don't have to be mutually exclusive. But sometimes they work sort of in addition to each other sort of each adding more context.

But also, sometimes it's you sort of choose the one that's the highest up on that list that makes sense for what you're trying to do, so something like documenting a series of steps to do something maybe a wiki is a good place for that. But maybe it's better to have that be executable. Could that be a script somewhere? And then maybe that can be a thing that is almost, like, living documentation, but also where you don't need to maybe even think about the individual steps anymore because the script is running, you know, 10 different things.

And I think that's something that I really appreciated from the book Sustainable Rails is there's a whole section there talking about the value of setup scripts and how people who are getting started on your app don't want to have to care about all the different things to set it up, just run a script. And also, that becomes living documentation for what the app needs, as opposed to maybe having a bulleted list with 10 elements in it in your project README.

STEPHANIE: Yeah, absolutely. In the vein of living documentation, I think one thing that wikis can be kind of nice for is for putting visual supplements. So, I've seen them have, like, really great graphs. But at the same time, you could use a gem like Rails-ERD that generates the entity relationship diagram as the schema of your database changes, right? So, it's always up to date.

I've seen that work well, too, when you want to have, like I said, those, like, system-level documentation that sometimes they do change frequently and, you know, sometimes they don't. But that's definitely worth keeping in mind when you choose, like, how you want to have that exist as information.

JOËL: How do you feel about deleting documentation? Because I feel like we put so much work into writing documentation, kind of like we do when writing tests. It feels like more is always better. Do you ever go back and maybe sort of prune some of your docs, or try to delete some things that you think might no longer be relevant or helpful?

STEPHANIE: I was also thinking of tests when you first posed that question. I don't know if I have it in my practice to, like, set aside time and be like, hmm, like, what looks outdated these days? I am starting to feel more confident in deleting things as I come across them if I'm like, I just completely ignored this or, like, this was just straight up wrong [laughs]. You know, that can be scary at first when you aren't sure if you can make that determination.

But rather than thrust that, you know, someone else going through that same process of spending time, you know, trying to think about if this information was useful or not, you can just delete it [laughs]. You can just delete tests that have been skipped for months because they don't work. Like, you can delete information that's just no longer relevant and, in some ways, causing you more pain because they are cluttering up your wiki ecosystem so that no one [laughs] feels that any of that information is relevant anymore.

JOËL: I'll be honest, I don't think I've ever deleted a wiki article that was out of date or no longer relevant. I think probably the most I've done is go to Slack and complain about how an out-of-date wiki page led me down the wrong path, which is probably not the most productive way to channel those feelings. So, maybe I should have just gone back and deleted the wiki page.

STEPHANIE: I do like to give a heads up, I think. It's like, "Hey, I want to delete this thing. Are there any qualms?" And if no one on your team can see a reason to keep it and you feel good about that it's not really, like, serving its purpose, I don't know, maybe consider just doing it.

JOËL: To kind of wrap up this topic, I've got a spicy question for you.

STEPHANIE: Okay, I'm ready.

JOËL: Do you think that AI is going to radically change the way that we interact with documentation? Imagine you have an LLM that you train on maybe not just your code but the Git history. It has all the Git comments and maybe your wiki. And then, you can just ask it, "Why does function foo do this thing?" And it will reference a commit message or find the correct wiki article. Do you think that's the future of understanding codebases?

STEPHANIE: I don't know. I'm aware that some people kind of can see that as a use case for LLMs, but I think I'm still a little bit nervous about the not knowing how they got there kind of part of it where, you know, yes, like I am doing this manual labor of trying to sort out, like, is this information good or trustworthy or not? But at least that is something I'm determining for myself. So, that is where my skepticism comes in a little bit. But I also haven't really seen what it can do yet or seen the outcomes of it. So, that's kind of where I'm at right now.

JOËL: So, you think, for you, the sort of the journey of trying to find and understand the documentation is a sort of necessary part of building the understanding of what the code is doing.

STEPHANIE: I think it can be. Also, I don't know, maybe my life would be better by having all that cut out for me, or I could be burned by it because it turns out that it was bad information [laughs]. So, I can't say for sure.

On that note, shall we wrap up?

JOËL: Let's wrap up.

STEPHANIE: Show notes for this episode can be found at bikeshed.fm.

JOËL: This show has been produced and edited by Mandy Moore.

STEPHANIE: If you enjoyed listening, one really easy way to support the show is to leave us a quick rating or even a review in iTunes. It really helps other folks find the show.

JOËL: If you have any feedback for this or any of our other episodes, you can reach us @_bikeshed, or you can reach me @joelquen on Twitter.

STEPHANIE: Or reach both of us at hosts@bikeshed.fm via email.

JOËL: Thanks so much for listening to The Bike Shed, and we'll see you next week.

ALL: Byeeeeeeee!!!!!!


Did you know thoughtbot has a referral program? If you introduce us to someone looking for a design or development partner, we will compensate you if they decide to work with us.

More info on our website at: tbot.io/referral. Or you can email us at: referrals@thoughtbot.com with any questions.

Support The Bike Shed