Wired magazine published an article about why Musk’s Plan to Reveal The Twitter Algorithm Won’t Solve Anything.
Several of my non-programmer friends were interested in this, and we started chatting. Because the idea itself is obviously shockingly out of left field, I discovered this was a perfect opportunity to explain the properties of Coupling & Cohesion and why they matter.
Coupling & cohesion are defined in terms of a change you want to make to a system. In this case, Elon Musk would like to open source “the algorithm”, which he defines as all the bits of code that “make any changes to people's tweets, if they're emphasized or de-emphasized”.
I want to be clear that nothing here is based on my experience with the Twitter code base. I wouldn't speak to any private information, and my experience was nearly a decade ago. Things have most certainly changed.
The information from current developers in the article is plenty for us to speculate about how coupled and cohesive the system is with regards to this particular change.
Wired magazine reports that there is “no single algorithm that guides the way Twitter decides to elevate or bury content”. Like many high-traffic platforms, Twitter uses a microservice architecture
. According to the Wired article, multiple places in the code may promote or hide content, scattered across a multitude of services.
This is an example of low cohesion
. We want to change all the code related to promoting or suppressing tweets at once. That code is scattered in many locations & mixed in with totally unrelated code. It isn't cohesive
In order to understand all the ways a tweet might be promoted or hidden, every piece of the system involved in any of those would need to be open sourced and the reader would need to understand how those components interacted. This makes the change Musk wants expensive & error-prone.
To make the change easier, Twitter would need to rearchitect their system. This would involve moving all the related behavior together in one place. It would also involve separating any behavior in those components that isn’t about promoting or hiding a tweet. A service or a group of services that only handled promoting or hiding tweets would be high cohesion and possible to open-source.
There are two sources of coupling: code that the code being changed relies on, and code that relies on the code being changed. (Anyone know better words to distinguish those two? Let me know, because that is a mouthful.)
Luckily for Twitter, from Wired’s description it sounds like they are mostly dealing with only one of those two kinds of coupling. If not much else depends on which tweets are promoted or hidden, it makes the change a lot easier.
Wired reports that the scattered pieces of code “perform a complex dance atop mountains of data and a multitude of human actions. Results are also tailored to each user based on their personal information and behavior.” That is to say, the code that promotes or hides tweets is highly coupled to many different parts of the current system.
This coupling could prevent Twitter from extracting the behavior into a cohesive unit. Even if the code was centralized, it would still require understanding code that had nothing to do with promoting or hiding tweets in order to understand what is happening. If it is particularly tightly coupled, it might even be impossible to separate without an intermediate step.
Reducing coupling is less straight-forward than increasing cohesion. Twitter would need to consider why those dependencies were needed & what the purpose the data served. They would then turn that understanding into an interface of some kind, with names that reflect that understanding. Twitter’s current data could then be swapped out for some other source of data that satisfied the same purpose. That would let the system be loosely coupled with respect to this change.
I want to be clear here that nothing I describe here is the result of Twitter’s code being “bad”. Twitter’s code is built in a way that would make this particular change hard, but all code makes some kinds of changes hard.
With respect to different changes, Twitter’s system is already highly cohesive
and loosely coupled
. Twitter grew revenue 16% yoy last quarter. It has recently made obvious strides in reducing harassment & abuse on the platform. All of that involved hundreds of engineers safely evolving running software.
Writing "good" code isn't enough. Even the best code in the universe isn’t loosely coupled and highly cohesive with regards to every possible change.
I don’t think this goal is useful or plausible or something that will happen even if the sale does go through. I’m just using it here as a concrete example of a sweeping change that someone might want to make to an existing system.
Its very absurdity is useful. It clearly demonstrates why You Aren’t Going To Need It is a powerful approach to managing coupling & cohesion.
It is impossible to predict what billionaire in a midlife crisis will get angry your company didn’t let him post Hitler memes in the wake of his breakup & decide to buy your company to make one specific change. Attempting to anticipate that eventuality would have been a massive waste of time & money.
But when something like that happens, regardless of what the system looks like today we can adapt it. And as long as the system solved the previous goals as gracefully as possible, supporting all the existing features plus this one new change will be as easy as we could make it.
Imagine that Twitter had guessed that a billionaire would get mad about an ad they showed him. They might have spent a similar amount of time & effort as this project will take today making the ad targeting logic cohesive & decoupled. The code still wouldn't be any more cohesive or loosely coupled with regards to the change that a billionaire actually wants. It would have cost a bunch of money to do, making all the other work over the years harder, and it still wouldn't make this change any easier.
Attempting to anticipate the future doesn't help us build systems that can adapt to it.
If Musk follows through on wanting to publish all the code that contributes to promoting or hiding a tweet it will cost Twitter a great deal of money. The result is unlikely to be particularly useful, especially when the greatest factor in whether a tweet is "promoted" is whether other human beings hit the “retweet” button.
But by employing these principles, by first increasing cohesion and reducing coupling with regards to the specific change, it would be possible.