Just a little coding oopsie
Just a little coding oopsie
Just a little coding oopsie
Mars Climate orbiter holds the record I think for coding problem and spacecraft failure. That one cost $460m.
A great runner up would be the loss of the maiden flight of the new Ariane 5 rocket at $370m:
"On June 4th, 1996, the very first Ariane 5 rocket ignited its engines and began speeding away from the coast of French Guiana. 37 seconds later, the rocket flipped 90 degrees in the wrong direction, and less than two seconds later, aerodynamic forces ripped the boosters apart from the main stage at a height of 4km. This caused the self-destruct mechanism to trigger, and the spacecraft was consumed in a gigantic fireball of liquid hydrogen.
The disastrous launch cost approximately $370m, led to a public inquiry, and through the destruction of the rocket’s payload, delayed scientific research into workings of the Earth’s magnetosphere for almost 4 years. The Ariane 5 launch is widely acknowledged as one of the most expensive software failures in history. What went wrong?
The fault was quickly identified as a software bug in the rocket’s Inertial Reference System. The rocket used this system to determine whether it was pointing up or down, which is formally known as the horizontal bias, or informally as a BH value. This value was represented by a 64-bit floating variable, which was perfectly adequate.
However, problems began to occur when the software attempted to stuff this 64-bit variable, which can represent billions of potential values, into a 16-bit integer, which can only represent 65,535 potential values. For the first few seconds of flight, the rocket’s acceleration was low, so the conversion between these two values was successful. However, as the rocket’s velocity increased, the 64-bit variable exceeded 65k, and became too large to fit in a 16-bit variable. It was at this point that the processor encountered an operand error, and populated the BH variable with a diagnostic value."
The kicker on this one was the bug was copied from the previous successful Ariane 4 rocket code, but the Ariane 4 never experienced it because the Ariane 4 first stage was dropped in each flight before the bug would show itself, so it was never an issue there. Because the Ariane 5 had a slightly different flight profile it was in the air a longer period of time...enough time to experience the bug and cause a loss of the rocket in flight.
Static type checking ftw
I'll keep it going:
Don't forget about the time Initech had it's credit union hacked with a virus that was supposed to only take a negligible percentage of each transaction but the programmer figured he must have "put the decimal in the wrong place or something."
The group got away under pretty mysterious circumstances...
Didn't their corporate office burn down afterwards? Suspicious indeed...
Why the fuck is/was NASA using the US customary system? Science is always done in metric, even in the US.
IIRC they had outsourced to a contractor and that contractor was using imperial
It was just a simple transposition right? 2.45 (wrong) vs 2.54 (right)
E: never mind, I was wrong
Always loved the story of what they saw in the source code of software they used in historic NASA missions from decades past.
https://interestingengineering.com/science/code-moon-landings-released-surprising-hilarious
Turns out, the programmers back then were just as unsure about what they were doing as much as programmers are today ... except the guys back then had computers less powerful than a modern smart watch controlling a missile that was aimed at the moon.
I also heard about a fuckup with the European space agency who had hired an American to work on a particular bit of the project. He used an imperial measurement somewhere and it caused the whole thing to fail.
That's why there are SI-units.
That man's name? Filbert Einstein. No relation.
this is why I hate working with hardware.
Let's be clear, this isn't the single programmer's fault. Everybody will eventually make a mistake. The fact that it wasn't caught by mitigating measures such as reviews, tests, and audits is the real error we can learn from here.
A Proton-M booster carrying a GLONASS satellite crashed shortly after takeoff at Baikonur in 2013. The failure was caused by a gyroscope package that had been installed upside down. The receptacle had a metal indexing pin that should've prevented the incorrect installation. The worker simply pushed so hard that it bent out of the way.
When you make a foolproof design, God makes a better fool.
How did someone like this land a job at NASA?
I think it was a different era, to borrow an awful phrase. In 1962 they were still figuring out best practices for reviews, tests, and audits. Even today, lone hero outputs can get pretty far when processes aren't follow.
Which they did learn from!
I guarantee every mistake like this at any good company leads to a leap forward in tooling for simulation, testing, code building, review, merging, local dev environments etc.
The good companies share their work (via open sourcing their solution, blogging their learnings) or by contribute to existing solutions.
NASA's ROI cannot be measured. The amount of industries their R&D has touched is massive