Code refactoring — The Lost chapter

Coding is fun, but coding without thinking through is only fun at the moment you write it (It’s not even that fun). Bad code is very scary…

Do Tran

10 Aug 2023 — 10 min read

Coding can be enjoyable, but it’s only fun if you take the time to think it through. Writing code without planning ahead can be tempting, but it’s a recipe for disaster. Bad code is hard to maintain and debug, and it can come back to haunt you months or even years later. It can also make it difficult for other developers to work on your code, which can slow down the entire project.

In this post, I will use pseudocode to illustrate my point because it is a plain language description of an algorithm that is easy to understand for anyone with basic programming knowledge. Do not worry if the pseudocode does not compile, as it is not intended to be executable code.

Your function is misleading in several ways.

The image should illustrate the main points of this section.

Can you give me a brief overview of this function?

Based on the name and the arguments, I would guess that

the ProductMatchId() function takes a product ID and a list of products as input.
It then compares the product ID to the product IDs in the list. If the product ID is found in the list, the function returns the corresponding product.
Otherwise, the function returns null or throws an exception.

Let’s look at the implementation and see if it matches our expectations.

Our initial guess was partially correct. We correctly identified that the function takes a product ID and a list of products as input, and that it compares the product ID to the product IDs in the list. However, we did not know that the function also accesses the global variable _user to check for admin permission, or that it can access the database from its parent scope and perform an additional query. We are curious how you knew about these details, as you are not the author of the function. We also wonder if the author will still remember these details after 6 months.

What if the network fails and the _database.Products.Find(productId) method throws an exception? The function signature does not reveal this information. Additionally, I am not sure why comparing a product ID against a list of products in memory would throw a DatabaseConnectionException. Perhaps the list of products is stored in a database, and the Find method tries to access the database but fails due to a network error.

Let’s describe the function’s behavior and the additional work required to use it.

I can get a general idea of what the function does by looking at its name, but I need to read the function’s code to understand exactly what it does. It usually takes me about 5 minutes to read the function and figure out what it does.
I will need to check the result for null and use a try/catch block to handle any exceptions that the function might throw. I do not know if the function throws exceptions, so I need to be prepared for that possibility.
I need to be aware that this function accesses the database. If the database fails, this function will also fail and throw an exception. I need to be prepared to handle this possibility by using a try/catch block.
If I do not have administrator privileges and I invoke this function, it will throw an exception. This is because the function checks the user’s privileges before executing. If the user does not have the necessary privileges, the function will not be able to complete its task and will throw an exception.

In each of the following sections, I will eliminate the unknowns one by one so that we can make the function description closer to our intuition. This will help us create code that is more predictable and less likely to surprise us.

It has a non-descriptive name.

The name of a function is like the title of a book. Just like a book title, the name of a function should give you a brief description of what the function does.

I can guess Jin/Hong/Kim/Chen come from Asia, James/Thomas/David comes from US or UK but i have no clue where this guy comes from

When you name something, you can remember what it does, even if you give it a bad name. But this only works for a short period of time. If you or someone else reads the code 6 months later, you won’t have a clue what it does. You’ll have to skim through the whole thing again, which is a waste of time.

The solution is to use meaningful names and avoid nonsense names. This will make your code more readable and understandable, and it will also make it easier to maintain and update.

Go ahead and read it out during the code review session. If folks can kinda get an idea of what your function does just from its name, then you’re good to go.

Rather than using ProductMatchId, we could opt for FindProductInList as the name.

Implicit input/implicit output

The implicit nature of this segment arises due to the requisite comprehension of the function’s contents. However, an alternative course of action involves its transformation into an explicit element. To illustrate, consider the following approach that could be employed to establish explicitness.

The function’s output lacks explicitness on the surface. However, upon deeper consideration, envision the situation in which the _db entity represents an Object-Relational Mapping (ORM) framework, such as the likes of Entity Framework, incorporating an active state of entity tracking. In this context, one could reasonably construe the behavior as constituting an implicit output, albeit veiled. Importantly, it is worth acknowledging that this clandestine aspect could potentially lead to complications within specific scenarios.

I’ve come across functions such as StartBackgroundJob and StopBackgroundJob that perform their stated actions, albeit in a manner that catches one off guard. Specifically, StartBackgroundJob, which you’d assume initiates a job, actually takes an unforeseen approach. Prior to commencing the job, it discreetly clears any ongoing jobs by invoking StopBackgroundJob. I found myself having to peruse the function to fathom the vanishing and reemerging of background jobs. It would have been prudent to label it as RestartBackgroundJob, thereby alerting readers to this behavior.

It does more than it states

The provided code accepts a product list and a product ID for cross-referencing within the list. However, it also extends its search to the database if the item is not located within the list. Using this code without delving into its implementation could yield unanticipated outcomes. Consequently, to ascertain its behavior, one would be compelled to peruse the entirety of the function.

It’s seriously frustrating when you realize you only need the first part of something, but you’re stuck dealing with the second part too. So, you decide to fix it, but guess what? You’ve got a bunch of other functions happily chugging along, depending on that thing you’re trying to change. They’re all doing fine, no complaints. Now you’re in a bind — you either tweak your function in a weird way just to make it work with the old one, or you split it up, ending up with duplicate code all over the place. Ugh, what a headache!

On the flip side, we could break it down into several smaller functions, each handling just one task.

Returning to the earlier mentioned StartBackgroundJob and StopBackgroundJob functions, an additional optimization would involve a clear-cut approach of explicitly invoking StopBackgroundJob followed by StartBackgroundJob within the higher-level function (the one that triggers StartBackground).

Return `NULL` is not good

“My mama always said, a function is like a box of chocolates. You never know what you’re gonna get.”

Return NULL is very common, it’s fine as long as you don’t forget to check NULL before using it

Remember to check NULL before using something

Therefore, working with NULL might be straightforward when jotting it down, but managing it can quickly become unwieldy. As an alternative, we can formulate a function that wraps up the outcome of an action, as demonstrated below.

Just an idea here, No optimization + won’t compile

Then, we can update our code accordingly

At this point, we’ve introduced a pair of fresh functions. Notably, they’re not only designed for singular purposes, but they’re also devoid of any unexpected twists for users. Furthermore, we’ve strategically compelled consumers to engage with the Result type initially, as they must invoke an Unwrap to access and utilize the actual outcome. If you are interested, you should have a look at Rust programming language Result type and Option type as well.

Restructure the code and quick summarize

We position “Authorize” at the forefront for consumers at the top level, as the other two functions are straightforward and transparent, making them accessible to everyone. Additionally, these functions maintain an honest representation of their actions.

Let’s describe our functions again

It named FindProductInListand FindProductInDb, takes a productId and a list of Product.
Given the function’s name and signature, it’s quite straightforward to infer its purpose. The initial one involves comparing the productId with the list, while the second one involves searching for it in the database.
Essentially, what I’m thinking is based on my gut feeling.

And it doesn’t demand any additional exertion.

Now that I’m aware my function is truthful, it takes less time to comprehend its purpose.
I’m spared from inspecting NULL or employing try/catch blocks since the Result encapsulates all of those concerns.
It doesn’t access the database implicitly; in fact, the database exception is also contained within the Result type.
The validation for Admin authorization has been elevated to a higher tier and rendered overt.

In essence, we’ve adjusted our function to align more closely with our intuition, eradicating any unexpected elements.

More and more

Your function fails to accommodate the entirety of conceivable inputs.

Let’s look at the below function

It’s a straightforward division, right? However, what if b equals 0? Even if we were to trigger an exception when b is 0, the issue would remain implicit for the user. The function’s signature wouldn’t be able to convey this constraint.

Instead, we could fashion a novel data type in the following manner.

We fabricate a fresh type that embodies NonZero. The sole means of crafting it is through the static method Create, utilizing a non-zero numerical value.

Now, we have the capacity to adapt our function to exclusively accommodate the NonZero type for parameter b. This adjustment ensures that our function operates seamlessly across the entire spectrum of conceivable values.

You can read more about this via Type Driven API design and Primitive obsession codesmell

It has a cousins

In many cases, we nail it with names like GetProductById or GetTopSellerProducts. However, as complexity ramps up, we start naming functions based on the context, like GetSpecialProduct or GetTopSellerSpecialProduct. The issue isn’t the business rule itself, rather, we’re missing a universal approach to articulate our intent here.

On the flip side, we have the option to draft a versatile function that takes in a filter callback and another callback to remap the initial result into the intended outcome.

Subsequently, you can encapsulate your business logic using code originating from a different location. For instance:

And here’s the cool part: you can now keep the code that deals with querying the database separate from your actual logic. Plus, you can even mix and match your logic to come up with some awesome new stuff!

Note the NewHomemadeProduct is a combination of NewProduct and NewHomeMadeProduct

You should have used more callback

Take a peek at the next two functions we’ve got here.

These two functions actually handle two distinct tasks, but they’re copying code from the infrastructure. Now, picture this: if we ever decide to switch up our database, we’ll have to hunt down and alter all those instances of using (var connection = new DbConnection()) scattered throughout various files. Now we have the Shotgun Surgery codesmell.

You miss 100% the shot you don’t take

Rather than repeatedly opening the connection, it’s more prudent to abstract it into a separate function.

All we do is move the shared code into a separate function and make the callback a required input parameter. Issue resolved.

Conclusion

There’s quite a bit to cover, so I’m thinking we might need to split this into a series of posts.

If you find my post enjoyable, perhaps you’d consider treating me to a cup of coffee.

Code refactoring — The Lost chapter

Do Tran