NoSQL Zone is brought to you in partnership with:

John Cook is an applied mathematician working in Houston, Texas. His career has been a blend of research, software development, consulting, and management. John is a DZone MVB and is not an employee of DZone and has posted 172 posts at DZone. You can read more from them at their website. View Full User Profile

Fault-Tolerant Systems Are Faulty

08.08.2012
| 4575 views |
  • submit to reddit

Richard Cook wrote an excellent article How Complex Systems Fail. The author packs a lot of ideas into four pages, divided into 18 points. Here are his first five points.

  1. Complex systems are intrinsically hazardous systems.
  2. Complex systems are heavily and successfully defended against failure.
  3. Catastrophe requires multiple failures – single-point failures are not enough.
  4. Complex systems contain changing mixtures of failures latent within them.
  5. Complex systems run in degraded mode.

Cook (no relation) elaborates his fifth point:

A corollary to the preceding point is that complex systems run as broken systems. The system continues to function because it contains so many redundancies and because people can make it function, despite the presence of many flaws.

Complex systems are necessarily fault-tolerant. If they weren’t fault-tolerant, they likely wouldn’t survive long enough to become complex. Unfortunately, the down side is that fault-tolerant systems are always faulty.

We want our software to be fault-tolerant, but it is very difficult to tolerate faults without encouraging and concealing them at the same time. (Think of badly formed HTML, for example.) Fault tolerance can work smoothly if you have a well-defined range of faults that your system is designed to tolerate. But misguided attempts to tolerate errors can mask problems, delaying but not preventing failure. This in turn makes the failure harder to diagnose and repair.

Some systems must be complex and fault-tolerant. But when we decide to make software fault-tolerant, especially if some realistic alternative could make it simpler, we should be aware of the consequences.

Published at DZone with permission of John Cook, author and DZone MVB. (source)

(Note: Opinions expressed in this article and its replies are the opinions of their respective authors and not those of DZone, Inc.)

Tags: