Periodically, we
discuss the investigation or root cause analysis of a disaster in an industry other
than healthcare. Most often that is regarding an accident in the aviation or
other transportation industry. We do this because the typical cascade of human
and system errors and root causes that come to light are so similar to those
that occur in healthcare incidents that result in patient harm. Hopefully, the
lessons learned from these accidents may be used to avert similar incidents in
healthcare.
AirAsia Flight QZ8501 crashed into the Java Sea on December 28, 2014, killing all 162 people aboard (KNKT 2015). It was cruising at 32,000 feet on a flight from Indonesia to Singapore when the cascade of precipitating events took place. The weather and meteorological conditions did not appear to be factors. The pilot in charge had over 20,000 hours of flying experience and the second-in-command over 2000 flying hours, most in the Airbus 320, the type of aircraft involved in the crash. Both had passed their last proficiency checks in the month prior to the crash. Though there was an issue regarding approval for flight (an issue about what days this airline could fly), that did not appear to be a factor in the crash.
At the center of the
cascade: cracked soldering on a small circuit board. This was on the Rudder
Travel Limiter Unit (RTLU). There was cracking of a solder joint of two
channels that resulted in loss of electrical continuity and led to RTLU
failure. Maintenance records showed that there had been 23 Rudder Travel
Limiter problems starting from January 2014 to 27 December 2014 and that the
interval of the malfunctions became shorter in the last 3 months even though
maintenance actions had been performed since the first malfunction was
identified in January 2014.
Just 3 days prior to
the day of the accident one of the pilots of the fatal flight was flying
another flight on this aircraft and had problems with an alarm indicating RTLU
dysfunction. Airline mechanics addressed the problem while the pilot was
observing. One of the corrective actions involved pulling and resetting a
circuit breaker. The pilot asked if he would be able to do that if the alarm
again went off and was told he would be able to if the computer/dashboard
message indicated so.
On the fateful day,
the Airbus 320 was cruising at 32,000 feet when the alarm regarding the RTLU
triggered. In fact, it triggered multiple times. The first 3 times the
pilot-in-command (PIC) and the second-in-command (SIC), who was flying the
plane at the time (under autopilot), responded appropriately to the alarm and
performed the specified actions. The fourth time, however, they did something
different – they pulled the circuit breaker.
These events led to
disengagement of the autopilot and auto-thrust and meant that the pilot flying
(PF) was now manually flying the aircraft. You’ll recall from some of our prior
columns the phenomenon of “automation surprise”, a concept describing what
happens when computers working in the background cease functioning and multiple
factors visible to the computer are not readily apparent to the now manually
flying pilot. Some of those factors may include speed of the plane, attitude
(orientation to the ground), rotation, etc.
One of the
parameters that changed was airspeed. Aerodynamic stalls occur when airplane
speed drops below the speed at which airflow over the wings and the “angle of
attack” fails to provide the “lift” that keeps the plane airborne. The plane
also began to change its attitude and rotation. The “stall” alarm sounded.
Pilots, of course,
train for aerodynamic stalls as a core element of their training and
preparation for flying and do stall simulations as well. Most stalls, however,
occur when a plane is changing speeds or angles of attack such as on takeoffs
and landings. Stalls occurring on “level” flying at high altitude cruising are
much rarer.
The usual response
to an aerodynamic stall is to put the nose downward until lift is restored.
This is done via a “stick” or lever next to the pilot that is somewhat like the
joystick you might be familiar with on computer games. To go down you push the
lever forward. To go up you pull it back. But when the stall began, the PIC
(pilot-in-command) yelled “pull down…pull down” (repeated 4 times). This order
is ambiguous because if you pull the level/stick down, the plane goes up and
accentuates a stall.
In addition, the
plane has dual controls in that there is a second stick/lever at the side of
the pilot not flying (PNF). But if one stick/lever system is not deactivated
both provide input and the net result is the algebraic sum of the action of
both sticks/levers. It appears in this incident that the PIC did begin using
his stick/lever but it was at the same time the SIC was using his and they were
likely moving them in opposite directions, negating the actions of each other.
The net effect was that the desired downward angle did not occur. In fact, the
nose further rose and the plane began to rotate and spin.
In addition,
immediately increasing the thrust of the engines may result in the nose of the
plane lifting up, which further accentuates the stall. You may actually need to
reduce the thrust temporarily if the nose is not angled down.
Note that this is
all somewhat counterintuitive. The natural human response might be to point up
and increase the speed. That is why training for such emergencies is critical.
The pilots need to have immediate knowledge of what to do in such cases and
have practiced it at least in simulation exercises. The pilots were trained and
had experience of recovery from the approaching stall. But the condition of
stall at zero pitch (as in this instance) had never been trained as the
training for stall was always with a high pitch attitude.
The plane was now spinning
out of control in what is known as an “upset”. Upset Recovery training was
included in the aircraft operators training manual. But the aircraft operator
told investigators that the flight crew had not been trained for the upset
recovery training on Airbus A320, reflected in the Flight Crew Training Manual
“Operational Philosophy”: “The effectiveness of fly-by-wire architecture, and
the existence of control laws, eliminates the need for upset recovery maneuvers
to be trained on protected Airbus”. (And apparently the regulatory body in
Indonesia did not find that to be out of compliance.)
Below we summarize
some of the key factors that played a role in this tragic accident. In some
cases we provide analogies to healthcare incidents. But, in general, we leave
it to your imagination to envision how such human factors and system failures
seen in this case might occur every day in various venues in your healthcare
system.
Communication
In healthcare
sentinel events, communication breakdowns contribute in 75-85% or more of
events. The same obviously occurs in aviation or other transportation
accidents. There were several examples of such miscommunication in the current
case.
Prior to removal of
the circuit breaker there should have been a discussion between the PIC and SIC
about what might happen if and when the circuit breaker was removed. In their
training manual there is a chapter on “Computer Reset” that says: “ In flight, as a general rule, the crew must
restrict computer resets to those listed in the table. Before taking any action
on other computers, the flight crew must consider and fully understand the
consequences.” The investigators note this statement was potentially
ambiguous to the readers and might be open for multiple interpretations. But
had it been followed, the crew might have realized the autopilot would
disengage and be prepared for that.
A series of serious
miscommunications occurred once the stall alarm triggered. The PIC shouted
“level…level…level” (repeated 4 times). But it was not clear whether he meant
to level the wings or level the “attitude” or orientation of the plane to the
ground. Then he followed with the command to “pull down…pull down” (repeated 4
times). As above, this order is ambiguous because if you pull the
level/stick down, the plane goes up and accentuates a stall.
Another
miscommunication was one that did not take place but should have. When the PIC
began to manipulate his stick/lever, standard operating procedure would have
been to call out “I HAVE CONTROL” and responded by the other pilot transferring
the control by call out “YOU HAVE CONTROL”. Had that happened, perhaps the
cancelling action of operating to sticks/levers simultaneously would not have
occurred.
“Hearback”
is a feature of good communication that is a critical feature of cockpit
resource management. In all likelihood the events in this case unfolded so
rapidly that there was not enough time for it. But that is one of the reasons
that training and simulation exercises are so important. They prepare you for
what to say and do and hear and respond under the most emergent circumstances.
One factor not
commented upon in this case that has been a contributing factor in other
transportation accidents is language/cultural disparity. The PIC here was Indonesian
and the SIC was French. We don’t know whether this might have led to any
difficulties in communication. And we are not simply talking about language. In
some of the other accidents, hierarchical cultural factors have led to SIC’s
being afraid to challenge PIC’s.
Automation surprises
Several of our
previous columns have discussed automation surprises, whether that is a
computer changing things in the background or a system operating with several
“modes” controlled via a single switch. Here the obvious automation surprise
was not seeing all the issues arising when control of the plane was abruptly
turned over to the pilot flying (PF) when the autopilot disengaged.
Most of you have
already personally experienced automation surprises. Have you ever been driving
on a highway and started slowing down, only to find that the cruise control on
your car (you forgot you had set it!) speeds you up?
In healthcare, much
of our equipment in ICU’s (and other areas) is high tech and computers are
often doing things in the background that we are not immediately aware of.
Similarly, we often do have equipment that runs several different “modes” off a
single switch. We gave an example of such a surprise back in 2007 when we
described a ventilator operating on battery power when all thought it was
operating on AC current from a wall socket.
Alarm fatigue
When we talk about
alarm fatigue we are usually discussing situations in which the overabundance of
clinically irrelevant alarms causes us to ignore alarms that are clinically
important. But in this accident one of the proximate causes was the inordinate
attention placed upon shutting off the alarm that had previously produced so
many false alarms. So it really was a form of alarm fatigue.
Loss of situational awareness
Loss of situational
awareness is often a factor contributing to many
accidents. In this case, there was some such loss in that the SIC probably was
not immediately aware of multiple important flight parameters when the
autopilot abruptly disengaged.
Loss of spatial orientation
Loss of spatial
orientation occurs in a couple ways. The investigators note “Mulder’s Law” in
which there is a threshold (called Mulder’s Constant) below which accelerations
are not sensed by the human vestibular system. That certainly might have come
into play as the plane spiraled out of control. The other would be not being
able to see the horizon or other land structures as the plane nosed upward.
Such disorientation has been raised in previous aircraft accidents like the Mt.
Erebus crash in the Antarctic or the crash in the Atlantic involving John
Kennedy, Jr.
Startle reflex
The investigators note that
for pilots the main effects of the startle reflex are the interruption of the
on-going process and distraction of attention towards the stimulus. These
happen almost immediately, and can be quickly dealt with if the cause is found
to be non-threatening. However, the distraction can potentially reduce a pilot‟s concentration on flight critical tasks.
Cognitive biases
When we discuss
diagnostic error we note there are numerous cognitive biases (see our numerous
columns on diagnostic error). Particularly when we are in the “pattern
recognition” or “fast thinking” mode we are more prone to such biases. In this
case “availability bias” was likely operative in that the PIC resorted
to pulling the circuit breaker because he had recently seen the mechanic do
that to fix the faulty RTLU alarm.
Maintenance issues
In our August 7,
2007 Patient Safety Tip of the Week “Role
of Maintenance in Incidents” we noted that many of the most famous
disasters in industry history have followed equipment or facilities maintenance
activities, whether planned, routine, or problem-oriented. Well-known examples
include Chernobyl, Three-Mile Island, the Bhopal chemical release, and a
variety of airline incidents and oil/gas explosions. In this case the failure
to perform a permanent fix for the faulty circuit board was a critical factor.
In that column we noted James Reason and Alan Hobbs in their
2003 book “Managing Maintenance Error. A Practical Guide” do an outstanding job
of describing the types of errors encountered in maintenance activities, where
and under what circumstances the various types of error are likely to occur,
and steps to minimize the risks.
Failure to prepare for emergencies
Emergencies,
particularly those involving events that rarely occur, are ripe for disastrous
outcomes if all parties are not prepared to respond appropriately. Pilots use
checklists to help them not only with routine procedures but also to help them
in many emergent situations. We’ve also discussed the use of checklists for
some OR emergencies (see our August 16, 2011 Patient Safety Tip of the Week “Crisis
Checklists for the OR” and our February 2013 What's
New in the Patient Safety World column “Checklists
for Surgical Crises”). In the OR you might have a checklist to help you respond
appropriately to an unexpected case of malignant hyperthermia.
But some
emergencies, such as that in the current case, merit responses that don’t allow
time to consult checklists. They need immediate rehearsed responses that can
only be prepared for by simulation exercises or drills. While the pilots in the
current case had practiced stall recoveries, they had inadequate training for
“level attitude” stalls at high speed and no “upset recovery” training. In
healthcare an example needing such training or drills would be a surgical fire.
Other training issues
One very interesting
item in the investigation report was that the airline does not practice full “role
reversal” in its operations training. In flights there is task sharing and
duties allocation between the pilots and copilots, the PIC (pilot-in-command)
and SIC (second-in-command), or between the PF (pilot flying) and PNF (pilot
not flying) and one sits in the “right-hand seat” and the other in the
“left-hand seat”. The implication by the investigators is that it would be very
useful during training to have these various actors switch places and roles and
get a better understanding of the others’ perspective.
What a novel idea!
This is one we most definitely should adopt in some of our simulation exercises
in healthcare!
Overconfidence
What better example
of overconfidence than the Flight Crew Training Manual “Operational
Philosophy”: “The effectiveness of fly-by-wire architecture, and the existence
of control laws, eliminates the need for upset recovery maneuvers to be trained
on protected Airbus”. That’s like saying the Titanic is so well designed and
built it will never sink so don’t train for evacuating it.
Other root causes
Most of the above
are more proximate causes or contributing factors. But if we dig deeper we
undoubtedly would find other root causes. For example, why wasn’t the faulty
circuit board simply replaced rather than attempting to just solder it each
time when clearly such resoldering was failing? Were
there financial considerations that led to that?
The same questions
apply to the lack of training. Was that truly because the manual implied such
events could never happen on the Airbus 320 (an obviously erroneous
assumption)? Or were there financial or other operational barriers to providing
that training?
And what about the
regulatory oversight of the airline industry in Indonesia? Was it an oversight
that they did not require all airlines to provide “upset recovery” training for
all pilots? Should they not have seen that one particular part had required 23
maintenance interventions?
Failure to learn from previous accidents or incidents
The similarities to
the 2009 Air France Flight 447 crash and the 2009 crash of Continental Flight
3407 near Buffalo, New York are strikingly similar. In the former, ice crystals
obstructed the aircraft’s pitot tubes, leading to inconsistencies between the
displayed and actual airspeed measurements, and causing the autopilot to
disconnect, after which the crew reacted incorrectly and ultimately led the
aircraft to an aerodynamic stall from which they did not recover (Wikipedia 2015).
In the latter accident (see our Patient Safety Tip of the Week “Learning
from Tragedies. Part II”) the pilot also did the opposite of what he should
have done when the plane went into an aerodynamic stall.
Lingo and abbreviations
In our Patient
Safety Tip of the Week “Learning
from Tragedies. Part II”) we noted the dizzying array of abbreviations used
in the aviation industry. When one reads the investigation report of the
AirAsia Flight QZ8501 crash one wonders how pilots could ever function in
cockpits without confusing abbreviations and terms. Again, we think aviation
clearly needs to have a “do not use” abbreviation list similar to what we use
in healthcare (see our Patient Safety Tips of the Week for July 14, 2009 “Is
Your “Do Not Use” Abbreviations List Adequate?” and December 22, 2015“The
Alberta Abbreviation Safety Toolkit”).
Design flaws
We don’t pretend to
be experts in design from a human factors perspective. But having a system with
sticks/levers that make it counterintuitive to operate in an aerodynamic stall
situation seems to us to be an obvious design flaw. And a system that allows
two pilots to simultaneously use controls that may counterbalance each other?
We’ll bet Don Norman, author of the classics “The Design of Everyday
Things” and “The Design of Future Things” (see our November 6, 2007 Patient
Safety Tip of the Week “Don
Norman Does It Again!”) would never
let that happen!
The second in command was flying
Perhaps it’s
coincidence but in this crash, in the Air France Flight 447 crash, in the
Asiana Flight 214 crash (see our January 7, 2014 Patient Safety Tip of the Week
“Lessons
from the Asiana Flight 214 Crash”) and others where there were aerodynamic
stalls from which recovery failed, the plane was being flown by copilots or
second-in-command pilots. That does not necessarily imply that these generally
less experienced pilots are more likely to err. Remember, they are often flying
those parts of a flight that are considered relatively safe (i.e. the “cruise”
part). And we do understand the importance of allowing the copilots to get the
hours of airtime experience.
But maybe we are
also deluding ourselves and letting our guard down once we get into the “safe”
part of a flight. Isn’t this a little like when the attending physician leaves
one OR to begin a second case in another OR because the first case is now in
the “safe” portion (see our November 10, 2015 Patient Safety Tip of the Week “Weighing
in on Double-Booked Surgery”)?
Scapegoats?
Yes, almost every
incident with serious harm, whether in healthcare or aviation, involves a
combination of human error and system error. Many of the press reports on this
accident seemed to vilify the pilots, using terms such as “unfathomable” to
describe some of their actions. But the bottom line is that their actions are
not really so “unfathomable” given that other pilots and copilots have made
similar mistakes in similar circumstances. It took us many years to realize
that giving fatal doses of concentrated potassium to patients was really a
system problem rather than “unfathomable” actions of a few nurses left at the
“sharp end”.
Other factors
Interestingly, some
of the other factors we have often seen contributing to such accidents (eg. fatigue, violation of the sterile cockpit rule, etc.) do
not appear to have contributed to the current accident.
And the most
important lesson? Just as in serious healthcare incidents, there was a
cascade of errors and events with multiple contributing factors that led to the
disaster. Prevention of almost any one of many factors might have prevented
this crash. What if the airline had provided training for “level” high
speed stalls? What if the sticks/levers had been designed so that both could
not be used simultaneously and cancel each other out? What if terminology used
was standardized and not ambiguous? What if the circuit board had simply been
replaced rather than attempted to be repaired by resoldering?
What if the pilots had discussed what would happen when the circuit breaker was
removed and realized that autopilot would disengage? What if the actions of the
pilots in the prior Air France or Buffalo crash were incorporated into
training? What if the regulator mandated the airline to fix the recurrently
faulty circuit board?
Hopefully, some of
the lessons learned from the investigation of this crash may be used to help
prevent other disasters, not only in aviation but also in healthcare or other
industries.
See some of our previous columns that use aviation analogies for healthcare:
May 15, 2007 “Communication, Hearback and Other Lessons from Aviation”
August 7, 2007 “Role of Maintenance in Incidents”
August 28, 2007 “Lessons Learned from Transportation Accidents”
October 2, 2007 “Taking Off From the Wrong Runway”
May 19, 2009 “Learning from Tragedies”
May 26, 2009 “Learning from Tragedies. Part II”
January 2010 “Crew Resource Management Training Produces Sustained Results”
May 18, 2010 “Real Time Random Safety Audits”
April 5, 2011 “More Aviation Principles”
April 26, 2011 “Sleeping Air Traffic Controllers: What About Healthcare?”
May 8, 2012 “Importance of Non-Technical Skills in Healthcare”
March 5, 2013 “Underutilized Safety Tools: The Observational Audit”
April 16, 2013 “Distracted
While Texting”
May 2013 “BBC Horizon 2013: How to Avoid Mistakes in Surgery”
August 20, 2013 “Lessons
from Canadian Analysis of Medical Air Transport Cases”
December 17, 2013 “The
Second Victim”
January 7, 2014 “Lessons
from the Asiana Flight 214 Crash”
References:
KNKT (KOMITE NASIONAL KESELAMATAN TRANSPORTASI). Aircraft Accident Investigation Report. PT. Indonesia Air Asia Airbus A320-216; PK-AXC. Karimata Strait. Coordinate 3°37’19”S - 109°42’41”E. Republic of Indonesia. 28 December 2014. KNKT 2015
http://kemhubri.dephub.go.id/knkt/ntsc_aviation/baru/Final%20Report%20PK-AXC.pdf
Wikepedia. Air France Flight 447. Wikipedia accessed December 23, 2015
https://en.wikipedia.org/wiki/Air_France_Flight_447
Print “PDF
version”
http://www.patientsafetysolutions.com/