Cascading Failures


When something goes wrong, the surrounding supporting infrastructure must suddenly accept a new load.

The affected area

If that load-spike exceeds the supporting infrastructure’s capacity, then it fails, and you get a cascading failure. Extreme examples of cascading failures don’t stop failing until there is nothing left to fail. [nyt] writes about the blackout of 2003:

A surge of electricity to western New York and Canada touched off a series of power failures and enforced blackouts yesterday that left parts of at least eight states in the Northeast and the Midwest without electricity. The widespread failures provoked the evacuation of office buildings, stranded thousands of commuters and flooded some hospitals with patients suffering in the stifling heat.

I remember that day; my phone started ringing as soon as the power came back on; it was journalists asking “is this the Chinese?” What had actually happened was what is often described as a “software glitch” though I consider it more of a “design error.” The power grid shutdown was a result of systems behaving the way they were supposed to behave – the consequences, however, were unforeseen.

Wikipedia explains it better than I need to [wik] – the short form is that a high voltage line began to draw an unexpected load because it had drooped into wet branches. The unexpected load caused the local station to switch so that power was being drawn from the local grid instead of sent across the questionable line. The local grid systems’ software saw an unusual surge which triggered their load alarms and they re-routed power from nearby stations. Suddenly, the smart-grid software saw a large surge/unusual load and a wave of overload shutdowns swept through the whole system. My guess is that the initial “shut it down and draw current from elsewhere” smart-grid parameter was chosen to be within engineering tolerances (a factor of 2 or 3 overhead) and the “fix” was to tweak the system parameters to have them be less sensitive to a system-wide problem. It seemed to me that an attack from the Chinese or whoever would first bring the system down, then tear down its telemetry and command/control. That would be bad. Remember when Terry Childs locked the City of San Francisco out of its own fiber-WAN backbone? [wired] He changed the administrators’ passwords to random junk and then left. A serious attack would look more like the Shamoon attacks against Saudi Aramco: system BIOS wipes and hard drive calibration wipes – poof! all your computers are bricks. I suppose the most elegant cyberattack would be to trigger a cascading failure, but why bother? If your point is to demonstrate you control someone’s system, smoking wreckage is as eloquent a message as any other.

There are lots of places where cascading failures can occur; I used to spend a lot of subconscious time trying to identify them. Perhaps that’s why I am so worried about the command/control systems for nuclear weapons; there appear to be a lot of places where failures can escalate (basically, all the failure modes seem to boil down to ‘and then there’s a full exchange’)

Anyhow, I saw this and immediately recognized a cascading failure: [it gets interesting at 0:35]

You can clearly see what’s going on, here: each rack is capable of holding a tremendous amount of weight bearing directly down through its supports. As soon as even a fraction of that weight is in the form of a sideways push or a pull on the upper corner the whole situation unravels.

------ divider ------

I have to wonder if perhaps the racks were not installed correctly. With steel truss systems like that, it can be a matter of one bolt being missing.

Comments

  1. johnson catman says

    WOW! Was anyone killed or seriously injured in that crash? If not, that is truly amazing. I would definitely hate to have to clean that up!

  2. rq says

    I’ll have to watch the video later, but you really made my weekend with this:

    all the failure modes seem to boil down to ‘and then there’s a full exchange’

    Talk about peace-of-mind.

  3. kestrel says

    I have a bad feeling about how that forklift operator fared in this. Looks like he got buried. I hope the other guys were OK.

  4. militantagnostic says

    Kestrel @3
    His chances were fairly good since he had a roof and rollover protective structure.

  5. Jazzlet says

    militantagnostic @#4
    I’m not so sure, there is no protection apart from the roof and roll over structure, and you could see that the forklift was knocked over, so it was the open side that all of the boxes fell on. I guess how well the forklift driver did will have depended on what was in the boxes.

  6. komarov says

    all the failure modes seem to boil down to ‘and then there’s a full exchange’

    Not being able to take part in said exchange is probably considered worst possible failure mode in a “proper” military mindset. Better to end a thousand worlds over silly mistakes than live in one where you can’t take part in the fun. Risk mitigation to be prioritised accordingly.

  7. Sunday Afternoon says

    @Jazzlet – the forklift was knocked sideways, not over. At the end of the clip you can still see top of the yellow mast of the forklift poking vertically out of the pile. I’m hoping this means that the roof provided protection.

    What’s astonishing to me is how easily everything came down. A mechanically secure structure should be able to survive receiving a glancing blow from machinery that operates within it.

  8. bryanfeir says

    I remember the blackout, yes; that was a bad year in Toronto, given SARS and West Nile Virus also being that year, and tourism being down already due to 9/11 not long before. (The World Science Fiction convention was in Toronto that year, and had a lot lower turnout than usual.)

    I actually did a report back in 1991 for one of my grad school courses about this sort of situation, using as examples the 1990 MLK Jr. Day collapse of the AT&T phone system (caused by a glitch where a restarting node could crash the neighbouring nodes), and the 1980 ARPAnet IMP routing failure (caused by a set of three routing messages each of which were ‘newer’ than the previous all circulating). My point being that the systems that were supposed to make individual systems more reliable could actually cause failures in the larger network.

  9. militantagnostic says

    t looks like the shear (diagonal) bracing was only on the ends of the units. You can see that the end of the shelving assembly remains intact with very little deformation.

  10. ridana says

    Full story with pictures (sorry, Daily Mail source).

    Apparently it took 8 hours, 13 fire crews, and cutting a hole in the side of the building to rescue the forklift operator, who was unharmed despite being buried under 20 kg blocks of cheese.

    If you scroll down far enough, you can get a good look at the pre-destruction shelving, which might give knowledgeable folk clues as to exactly why it was so fragile.

    Knowing no one was injured allows me to gawk in amaze without much guilt at the life-scale domino run. I just hope he didn’t lose his job over it, since he was only at fault in the most tangental way. Sooner or later, high humidity adding a few extra grams, or maybe a fly landing on it, would’ve brought it all down anyway.

  11. says

    ridana@#10:
    Since the racks only had horizontal stiffeners at the end of the rows, they were assembled negligently. It’s scary to think that whoever assembled them had all those pieces left over and didn’t think about it. Any kind of high load bearing assembly, you simply do not improvise – it can kill. And steel is tough, but so is a whole lot of cheddar cheese.

  12. says

    Power distribution systems can be made more secure by adding strategic amounts of storage in certain places. 2017 saw one such power reserve installed in Australia after a catastrophic grid failure in 2016. Link. Much cheaper and easier now than 10-20 years ago.

    As for the shelving collapse, clearly no competent engineer was allowed into the building process. I remember well as a callow lad half way through his engineering degree doing work experience in the holiday break. A similarly constructed but small scale shelf was built in the back shed of the small company I worked for. Against my will we built the shelf without additional bracing, that being deemed irrelevant by the storesperson in charge. Then it was filled with tractor parts. Next morning began the task of unloading the collapsed shelves and rebuilding them with additional bracing. A fun few days of wasted effort just because of a clod without engineering knowledge over ruling some basic structural requirements.

  13. kestrel says

    Thanks to Ridana, #10, for finding that out. It may seem foolish but I feel better knowing the guy is OK.

    And hooray for Australia planning ahead like that. It would be great if others emulated that example.

  14. jrkrideau says

    @ Marcus
    I remember the “blackout of 2003” quite well. What impressed me was the speed at which various US politicians blamed Canadian power suppliers for the outage.

  15. voyager says

    That is an awesome video and I don’t use that word very often. Dominos pale in comparison.

  16. says

    Lofty@#14:
    Power distribution systems can be made more secure by adding strategic amounts of storage in certain places.

    Power distribution systems seems to be similar to computer networks, in that they evolve in place and adapt to expected needs, without a great opportunity for strategic advance-planning. It seems to me that often they are playing catch-up, or reacting to the unexpected because when you’re evolving a system you don’t have much opportunity to see problems coming. When the people who manage a power grid are able to add strategic storage, that’s a great thing.

    I don’t know what the situation is like, right now, because the various local power plants come on and offline as they keep upgrading the scrubber systems to make them cleaner, but someone once said that my area of the country hangs by a pretty thin thread if one of the big plants unexpectedly goes offline we’ll be out of power for a long time. So much of our lives comes to depend on a service like electricity or internet, but it’s surprisingly fragile.

  17. says

    kestrel@#15:
    It may seem foolish but I feel better knowing the guy is OK.

    It’s not foolish at all; I should have looked up the end result and posted it before I linked something so scary. My experience watching an accident in progress is different, depending on what I know about the outcome and I also try to avoid laughing at incidents where someone is injured or killed. I don’t like seeing people killed, at all – I think that as we age and understand how fragile and important lives are, it gets less funny because we are more sympathetic. On the flip side, I can watch nazis get punched, probably because I feel like they deserve it – but I don’t feel like anyone deserves to have a boat fall on them, or whatever. Also, some accidents are scary as hell, and I can imagine the terror of being in the driver’s seat during a train-wreck, and I’m sympathetically terror-stricken too.

  18. says

    voyager@#17:
    That is an awesome video and I don’t use that word very often. Dominos pale in comparison.

    And knowing that it’s cheese somehow makes it even better.

  19. Jazzlet says

    Yes looking t the video again I can see I was wrong about the forklift truck going over. I am still amazed the driver survived, although glad he did so, because the cheese could have come in the sides of the forklift. Cheese is heavy stuff, he was very lucky.

  20. Owlmirror says

    The news article shows that the racking collapse incident took place in May of 2016. I am curious to know if there was some sort of official report or finding, but despite lots of searching, I can’t find anything about a report.

    There’s a news posting that the HSE (Health and Safety
    Executive
    ) was to investigate the matter — but nothing I can find in HSE’s archives about Edwards Transport (owner of the warehouse), or anything about a racking collapse in 2016.

    The only followup news item I could find is this (in case anyone’s curious about what happened to the cheese involved)(copypasted because the site uses an annoying modal overlay. IE, shoo!, edit class)

    Hinstock warehouse collapse: Investigations ongoing after man buried under cheese
    North Shropshire | News | Published: Nov 3, 2016

    The collapse happened at Edwards Transport, in Hinstock near Market Drayton, in May where racking inside the building fell, trapping Tomasz Wiszniewski.

    Mr Wiszniewski spent nine hours buried in his fork lift truck under about four metres of 20kg blocks of cheese.

    Shropshire Council have today said a full health and safety investigation is ongoing, and that they have disposed of the cheese.

    Work to get rid of the cheese began on September 12 via anaerobic digestion, which is a process that turns it into gas.

    Mal Price, Shropshire Council’s Cabinet member for planning, housing, regulatory services and environment, said: “Following the racking collapse, a lot of complex work has needed to be carried out by our regulatory services officers.

    “Most important in cases like this is protecting the health and safety of consumers, and the work our officers have done to ensure that is the case has been exemplary.

    “We are still in the process of conducting a full health and safety investigation.”

  21. Jazzlet says

    Owlmirror
    I wouldn’t be at all surprised if the investigation was still ‘ongoing’, both the HSE and councils have been cut to the bone on things like this. Well on everything.

  22. Owlmirror says

    @Jazzlet: It occurs to me that precisely because there were no fatalities, or even serious injuries, the investigation may have lower priority. A lot of the HSE notices were about cases where injury or death had occurred in various industrial settings.

  23. says

    Owlmirror and Jazzlet@#23/4:
    The US NTSB has a pretty impressive online gallery of reenactments and analysis on youtube. I believe they are expected/required to present their results publicly, so that’s how they do it.

    Unfortunately, a lot of their stuff is about extremely unfunny events so I generally don’t watch it. But if you’ve ever wondered what a freight train-wreck looks like from the driver’s seat, now you can know.

  24. Owlmirror says

    I have this tab still left open after researching, so I might as well post the link here:

    What Causes a Pallet Rack Collapse? How Can You Prevent it?

    (“Pallet rack” vs “racking” appears to be a US vs UK terminology thing)

    I note that the post is a month after the Edwards Transport warehouse incident. It doesn’t mention the incident specifically, but it does give prominence to forklift collisions. The image looks like it could be from the incident, but there are no source details, and tineye finds no earlier matches.

  25. Curt Sampson says

    If you really want to geek out on it, A Case Study on the Collapse of Industrial Storage Racks, one of many papers available from Nelson Forensics, makes decent enough reading, though not nearly as good as an NTSB report.

    The TLDR I got from it was that flexural buckling due to verticals being out of plumb can be a huge problem, and forklift strikes and the like are a pretty easy way to make things go out of plumb, particularly if you don’t have your racks anchored to the floor.