Yeah. OP’s overly complicated explanation doesn’t convey the reason why this happens, which is really kinda just human psychology. People are probably thinking it’s like observability in quantum mechanics or some shit.
Goodhart’s Law Example:
App has poor testing and low quality. New bugs are introduced weekly. Customers complain.
Management sets a test coverage KPI of 90% for the apps codebase.
Dev team focuses on tests that hit as many lines of code as possible; NOT on requirements, business logic, or anything that would improve quality, or prevent bugs.
Quality does not improve because KPI was an arbitrary statistic that holds no value in isolation. Productivity drops because dev team wasted hundreds of hours writing useless tests to achieve KPI. Codebase worse, abd less maintainable. Company wasted millions of dollars. Customers still not happy. Devs hate their lives.
Thanks but I only joined when the companies several dozen codebases were already at step 4, and that example wasn’t even in the top 5 of their worst problems. I completed a small greenfield project to high praise. They wanted me to fix other codebases. I handed them a laundry list of problems (with suggested solutions), told them their problems are due to incompetent management, and left.
Actually this is a pretty common thing in software development. It’s become a bit of a trope. So much so that when management proposed things like KPIs tied to code (which they often do!) the devs are like, “can we get bonuses based on these KPIs? 🤑”
“If they’re such a good measure clearly we should be compensated when we do fantastic job! How about $5k extra for exceeding KPIs!”
“Oh, you don’t have enough trust in the system for that? Then why would you trust it for improving quality? I mean, you’re the one that made them…”
I never heard of this before, but I now know the word to describe exercise problems!
Body builders who are judged by how bulging their muscles are feel like garbage despite supposedly being “peak”.
People with high muscle mass or tall being screwed over by BMI targets.
People who are told weight indicates health ignore everything else in exchange for lowering calories.
Even in high school I remember how they would judge you based on like how many push ups you could do… no one who did a ton did proper push ups. Which led to them not helping at all as actual exercise, and even possibly leading to injury.
Heck, we can even use this for stupid Dog Shows, where because they measure specific things for the “goodness” of the dog, they screw over the dog in every way imaginable that isn’t being judged.
This is a good law to know. I like knowing this law. It’s sad how often it’s used, but it’s good to know.
Body builders who are judged by how bulging their muscles are feel like garbage despite supposedly being “peak”.
Plenty of good examples listed in your post, but I disagree with the bodybuilding one. The point isn’t to make you feel good. It’s to play this game where you compete against others to best accomplish a specific task. Just like any other sport, when you compete at the elite level, it’s never going to feel good, and it’s never going to be good for your health.
Hmm okay I just am not a fan of sports that destroy people’s bodies.
“Cut” diets that focus on looking great by not eating/drinking water before showing off just sound awful and not a fan of them. But you’re right, it’s not much worse than any other sport which damages athlete’s body.
A big part of it seems to be manipulation of the results? So, like, devs writing tests for more parts of the code base, but ones that are written to always pass.
Yes, of course. Fundamentally the end goal is to improve the app’s quality. However “quality” is not a measurable thing. Therefore, someone observed that as test coverage goes up, bugs tend to decrease, and as bugs decrease app quality tends to go up. So they make code coverage a KPI, and start putting pressure on developers to increase it.
The problem is that once people are pressured into optimizing a certain number, they will get very creative at doing so. And this creativity often breaks the measure’s relationship with the actual underlying quality we were trying to improve.
Test coverage is defined as the percentage of your application’s functionality that is being covered by the automated tests.
Usually this is measured in lines of code. You run the automated tests, then for every line of code, you track whether it’s executed or not. If 20% of lines were never executed during the test run, your test coverage is 80%.
Software teams will often aspire to reach high coverage, because lines that are never executed during testing are a good place where bugs can hide. However it’s generally acknowledged that this isn’t a foolproof method to get rid of bugs, and reaching 100% coverage can be more effort than it’s worth. Often you have critical code sections that should be covered by multiple tests, and unimportant sections that are unlikely to fail.
Yeah. OP’s overly complicated explanation doesn’t convey the reason why this happens, which is really kinda just human psychology. People are probably thinking it’s like observability in quantum mechanics or some shit.
Goodhart’s Law Example:
It sounds like this is an example you chose from first hand experience. If so, I’m very sorry. That sounds incredibly frustrating.
Thanks but I only joined when the companies several dozen codebases were already at step 4, and that example wasn’t even in the top 5 of their worst problems. I completed a small greenfield project to high praise. They wanted me to fix other codebases. I handed them a laundry list of problems (with suggested solutions), told them their problems are due to incompetent management, and left.
Not my monkeys. Not my circus.
Actually this is a pretty common thing in software development. It’s become a bit of a trope. So much so that when management proposed things like KPIs tied to code (which they often do!) the devs are like, “can we get bonuses based on these KPIs? 🤑”
“If they’re such a good measure clearly we should be compensated when we do fantastic job! How about $5k extra for exceeding KPIs!”
“Oh, you don’t have enough trust in the system for that? Then why would you trust it for improving quality? I mean, you’re the one that made them…”
I never heard of this before, but I now know the word to describe exercise problems!
Body builders who are judged by how bulging their muscles are feel like garbage despite supposedly being “peak”.
People with high muscle mass or tall being screwed over by BMI targets.
People who are told weight indicates health ignore everything else in exchange for lowering calories.
Even in high school I remember how they would judge you based on like how many push ups you could do… no one who did a ton did proper push ups. Which led to them not helping at all as actual exercise, and even possibly leading to injury.
Heck, we can even use this for stupid Dog Shows, where because they measure specific things for the “goodness” of the dog, they screw over the dog in every way imaginable that isn’t being judged.
This is a good law to know. I like knowing this law. It’s sad how often it’s used, but it’s good to know.
Plenty of good examples listed in your post, but I disagree with the bodybuilding one. The point isn’t to make you feel good. It’s to play this game where you compete against others to best accomplish a specific task. Just like any other sport, when you compete at the elite level, it’s never going to feel good, and it’s never going to be good for your health.
Hmm okay I just am not a fan of sports that destroy people’s bodies.
“Cut” diets that focus on looking great by not eating/drinking water before showing off just sound awful and not a fan of them. But you’re right, it’s not much worse than any other sport which damages athlete’s body.
Lol I really don’t think anyone was thinking that
I was, at first
I don’t think anyone other than this guy was thinking that
Still, an interesting take, same terms mean different things to different people
A big part of it seems to be manipulation of the results? So, like, devs writing tests for more parts of the code base, but ones that are written to always pass.
Yes, of course. Fundamentally the end goal is to improve the app’s quality. However “quality” is not a measurable thing. Therefore, someone observed that as test coverage goes up, bugs tend to decrease, and as bugs decrease app quality tends to go up. So they make code coverage a KPI, and start putting pressure on developers to increase it.
The problem is that once people are pressured into optimizing a certain number, they will get very creative at doing so. And this creativity often breaks the measure’s relationship with the actual underlying quality we were trying to improve.
Could anyone explain what a “test coverage” means?
Test coverage is defined as the percentage of your application’s functionality that is being covered by the automated tests.
Usually this is measured in lines of code. You run the automated tests, then for every line of code, you track whether it’s executed or not. If 20% of lines were never executed during the test run, your test coverage is 80%.
Software teams will often aspire to reach high coverage, because lines that are never executed during testing are a good place where bugs can hide. However it’s generally acknowledged that this isn’t a foolproof method to get rid of bugs, and reaching 100% coverage can be more effort than it’s worth. Often you have critical code sections that should be covered by multiple tests, and unimportant sections that are unlikely to fail.
Thanks! TIL =)
ya, code testing doesn’t actually increase code complexity, nor worsen the code base, and tends to actually reduce and avoid bugs