|
ATM Network Reliability
Group
San Antonio Rd., Mountain View, CA
April 1, 1992
For Immediate Release:
Possible Cell Virus Attacks ATM Network
The ATM Network Reliability Group today
announced the immediate formation of the
Cell Behaviour Task Force to address serious
problems in several recently installed networks
based on Asynchronous Transfer Mode (ATM).
Sometimes known as cell-switching, ATM
has been touted as the technology to break
the long-standing barriers between packet
and circuit based communications services.
Utilizing very small packets, called cells,
with simple headers that can be processed
and routed in hardware, the new networks
are able to interleave packet and circuit
data on the same transmission lines without
resorting to wasteful and expensive time-division
multiplexing techniques.
However, eleven similar incidents on four
separate networks (identities
withheld by request) have raised doubts
about the reliability of the new
technology. In each case, a large ATM switch
suffered congestive overload and failure,
apparently due to uncontrolled replication
of normally benign control messages known
as OAM cells.
These single-celled messages provide in-band
control functions for virtual circuits,
including hop-by-hop and end-to-end functions
such as path connectivity, delay measurement,
etc. There are two distinct varieties, type
4 and type 5, which normally make up only
a very small fraction of the population
of cell traffic in a typical ATM switch.
For reference, a relatively small ATM switch
with a gigabit capacity can handle roughly
two million cells per second of which no
more than 200 per second, or 0.01%, would
be OAM cells.
The failed switches suffered from an extreme
overload of OAM cells, which apparently
began replicating themselves using the multicast
capabilities of the switch through an as
yet undiscovered control mechanism. The
rapid influx of OAM cells quickly overwhelmed
the control processors that normally handle
such messages and blocked the outgoing control
paths so that no warning could be given.
Memory dumps later showed buffers and queues
full of mostly identical OAM-like cells,
with two interesting features. First, the
buffers did not contain standard type 4
or type 5 cells but a combination of parts
of each. Second, slight one- and two-bit
mutations of the most common message appeared
later in the queues, which were then further
replicated.
The intriguing behavioral possibilities
were first noted by Dr. R. Jones, a
former high-school biology teacher, who
overhead discussions during lunch at the
Straw Hat Pizza Parlor near the Reliability
Group's Mountain View offices.
Now a home-computer enthusiast, he helped
develop the sensitive masking technique
used to analyze the cell buffers, based
on his experiences with similar tests for
blood cell typing. He also suggested two
of the more important investigative areas
being explored. First, the original replicated
cells may only have been a symptom of an
earlier problem for which OAM cells were
being automatically generated in an attempt
to fix the problem.
The OAM cells may themselves have caused
more problems, with a cascading effect similar
to a biological auto-immune reaction. The
second area, which has caused more concern,
is the possibility of infection of neighboring
switches,
though so far only individual switches have
failed.
The actual cause of the problem is still
unknown. The most commonly held opinion
is that a combination of line errors and
software or hardware bugs conspired to bring
about the initial duplication. A NJ-based
member of the investigating team has suggested
that variant encodings of the Application
Signalling Bit in the cell header might
also be responsible.
But some have suggested that the problem
seems more similar to computer virus attacks
and the Internet "worm" in particular,
and are particularly intrigued by its dual-inheritance
and mutation aspects. Experts discount the
possibility of a true, replicating, single-cell
virus in the ATM network, observing that
the 48-octet payload of an ATM cell is too
small to contain a sufficiently complex
program. An investigator who wished to remain
nameless maintained that the team's greatest
difficulty was interpreting the ATM standards
themselves, and hinted that the whole affair
might come down to a misunderstanding on
the part of the switch manufacturer.
Until the problem is solved, however, the
task force is recommending that communications
customers use caution when buying or using
ATM network services, especially for critical
functions such as remote banking, and cash
machine operations in particular.
Principal Investigator,
Tracy Mallory
|