Autorouting with the A* Algorithm


1. Introduction

A few years ago, a friend of mine designed an adapter board for the IBM PC.
The tools he used were blue and red strips of tape, a sharp knife, large
sheets of clear plastic, and a generous amount of patience.  It took him
several weeks, and after the first board was tested he discovered that some of
the traces were incorrect, and had to be cut with the knife and rerouted with
solder and wires.  This caused me to start thinking about ways to use the
power of the computer to simplify this tedious, error-prone job.

What you want to do when designing a printed circuit board is implement an
electronic circuit.  First you will create a schematic which represents the
circuit.  From this, you will derive a list of TTL chips and other components
that implement the needed functions, and a list of the pins that need to be
connected.  Together, these lists are referred to as the netlist.  As long as
the connections are made correctly, you usually don't care where the traces
(the wires that will be embedded in the board) are placed.  Autorouters are
computer programs that do this task for you.

As we will see, the autorouting problem is really a search problem, and there
are algorithms from the field of artificial intelligence we can use to solve
it.  We will look at two of these, the breadth-first and A* search algorithms.
The C source code for a freely-copyable implementation of an autorouter using
the A* search algorithm accompanies this article, and programs to view the
routed printed circuit board and print it on a laser printer are also
available.


2. Autorouters

Autorouting can be viewed as a global optimization problem, which are very
difficult problems to solve.  To produce a good circuit you want to minimize
some things (trace length, board size, number of routing holes (holes created
by the autorouter to transfer a trace from one side of the board to the other,
also called vias), signal crosstalk, number of layers, cost to manufacture)
while maximizing others (signal strength, reliability, ease of debugging).
The measure of the overall value of a circuit will be a function of all of
these often conflicting variables (assuming the circuit behaves correctly,
otherwise its value is zero).  Usually it is acceptable to find a solution
which satisfies a set of constraints, because finding the globally optimal
solution is infeasible for all but the most trivial problems.

Autorouting can also be viewed as a collection of search problems.  The
individual search problems deal with finding a route, and laying down a trace,
between two locations on a printed circuit board.  There are many algorithms
for solving search problems, each with different running time characteristics
and data space requirements.  The two search algorithms we will be discussing
are breadth-first and A*.


3. Autorouting by Searching

When search algorithms are used to solve the autorouting problem, they
typically operate in two phases (reference [1]), and treat the board as a
matrix of cells.  The first phase starts at the source cell and searches for
the target cell, usually by going in several directions at the same time.
While searching, an auxiliary data structure will be built which keeps track
of how each cell was reached (this is referred to as Pred in the algorithms in
figures 1 and 2).  Once the target cell has been found, the first phase is
finished, and the second phase is started.  However, if the first phase
exhausts all possibilities without reaching the target cell, then no route
exists between them, and there is no reason to do the second phase.

In the second phase, the auxiliary data structure is used to trace the route
from the target cell back to the source cell, actually laying down the
electrical connections.  The second phase is identical for the breadth-first
and A* search algorithms.  But the first phase is different, and it is what
gives these algorithms different behaviors.

The main data structures used in the first phase are the Open queue and the
Closed set, and are used to hold cell coordinates.  Since a cell's coordinates
uniquely identify that cell, we'll say that the Open queue and Closed set
contain cells.  Cell coordinates will be represented as r2c3s1 for the cell at
row 2, column 3, side 1, or as r2c3 when it is understood that all cells are
on the same side of the board.  To remind ourselves that Open is a queue and
Closed is a set, when we talk about adding cells to them, we will put the
cells "on" the queue, and "in" the set.  Initially, the Open queue contains
the source cell and the Closed set is empty.  The first phase is a loop which
removes an element from the head of the Open queue, puts it in the Closed set
(which indicates that it has been searched), and checks to see if it is the
target cell.  If it is, the first phase is done.  Otherwise, the neighbors of
the cell (the cells that are adjacent to it) are placed on the Open queue, and
the loop continues.  As we will see below, the essential difference in the
breadth-first and A* search algorithms is the order in which the neighbors are
placed on the Open queue.


4. Breadth-First Search

The breadth-first search algorithm (figure 1) works by processing a
first-in-first-out (FIFO) queue of open cells (cells that have been reached,
but not yet searched).  Initially, the open queue contains only the source
cell.  A cell is removed from the head of the open queue, placed in the set of
closed cells (cells that have been searched), checked to see if it is the
target cell, and if it is not, its neighboring cells are placed at the tail of
the open queue.  Neighboring cells which have already been reached are
ignored.  (If a cell's coordinates are on the open queue or in the closed set,
then it has been reached.  Otherwise, it has not been reached.)  This
continues until the target cell has been found, or the open queue is empty, in
which case the target cell cannot be reached from the source cell.

A version of breadth-first search known as Lee's algorithm (reference [2]) was
applied to the autorouting problem in the early 1960's, and is still the basis
of some autorouters.  (In the original algorithm, cells diagonally adjacent to
each other were not considered to be neighbors.  Consequently, the
backtracking phase could only create horizontal and vertical traces.  We will
enhance the algorithm so that diagonally adjacent cells are neighbors, thus
diagonal traces can also be produced.)  Unfortunately, Lee's algorithm suffers
from a behavior inherent in the breadth-first search technique which limits
its application to problems of relatively small size.  As the distance between
the source and target cells increases by a factor of N, the number of cells
processed by Lee's algorithm (and therefore the processing time) increases by
the square of N.

The behavior of Lee's algorithm while searching for a path between the source
cell S (r5c5) and the target cell T (r8c8) is shown in figure 2.  Lee's
algorithm does not specify the order in which neighboring cells are placed on
the open queue, but we will use the compass directions north, east, south, and
west, followed by northeast, southeast, southwest, and northwest.  This order
tends to produce traces with a minimal number of turns.

In figure 2a, the source cell (r5c5) has been searched, and its eight
neighbors have been placed on the open queue.  The arrows indicate the
direction from which each cell was reached, and correspond to the Pred data
structure.  After the first eight cells on the open queue have been reached
and moved to the closed set, the configuration in figure 2b is searched, where
there are 16 cells on the open queue.  Once these 16 cells have been searched,
the configuration in figure 2c is reached.  Now the target cell (r8c8) is
fourth from the end on the open queue, and a solution is imminent.  When r8c8
is searched, it will be recognized as the target cell, and the Pred data
structure will be used to construct a trace back to the source cell.

You can see that the search progresses outward from the source cell in all
directions, like the ripples in a pond when you throw a pebble into the water.
If we double the size of the problem (S and T are six cells apart), the number
of cells searched (and therefore the processing time) will be roughly four
times as large, and if we triple the size of the problem, the number of cells
searched will be roughly nine times as large.  Thus, the behavior of Lee's
algorithm is quadratic in the size of the problem, which makes it infeasible
for large problems.


5. A* Search

The A* search algorithm (figure 3) also works by processing a queue of open
cells, which initially contains only the source cell.  But this is a priority
queue, which means cells are inserted according to the estimated distance to
the target cell (reference [3]), not just at the end.  Cells that are on the
shortest estimated path from source to target are placed at the head of the
queue.  Cells are still removed from the head of the open queue, then they are
checked to see if they are the target cell, and if they are not, their
neighboring cells are put on the open queue at the proper position.
Neighboring cells which have already been searched are checked to see if the
new path between them and the source cell is better (shorter) than the
previous one.  If it is, they are repositioned on the open queue according to
the new estimated path length from source to target.  As in breadth-first
search, this continues until the target cell has been found or the open queue
is empty.

A* depends on being able to estimate the distance between a cell and the
target cell.  In the case of autorouting, a simple measure of this distance is
available, and this helps A* to concentrate the search in the direction most
likely to succeed.  The more accurate the estimate is, the faster the search
will be.

In practice, A* does not suffer from the quadratic behavior of Lee's
algorithm, so it solves the same problems faster, and can be applied to the
larger problems that Lee's algorithm performs so poorly on.  As the distance
between the source and target cells increases, the number of cells processed
by A* will increase, but not as dramatically as with Lee's algorithm.

The behavior of the A* search algorithm is shown in figure 4.  The A*
algorithm does not specify whether new cells are placed in front of or behind
cells already on the open queue which evaluate to identical estimated path
lengths.  We will use the convention that they are placed in front of cells of
the same distance.  This will minimize the amount of time to insert a cell on
the open queue.

In figure 4a, the source cell (r3c3) has been searched, and its eight
neighbors have been placed on the open queue.  Each cell on the open queue
also includes the estimated length of the shortest path from S to T that goes
through that cell.  After the first cell on the open queue (r4c4) has been
searched and moved to the closed set, the configuration in figure 4b is
reached, where there are 12 cells on the open queue.  Once the next cell
(r5c5) has been searched, the configuration in figure 4c is reached.  Now the
target cell (r6c6) is at the head of the open queue, and a solution will be
found on the next iteration of the loop.  When r6c6 is searched, it will be
recognized as the target cell, and the Pred data structure will be used to
construct a trace back to the source cell.

You can see that the search progresses more directly toward the target cell.
The search is drawn toward the target just as the earth's gravity pulls
objects toward the center of mass.  If we double the size of the problem, the
search will search roughly twice as many cells, and if we triple the size of
the problem, the search will search roughly three times as many cells.  This
linear behavior makes A* more attractive for autorouting than the quadratic
Lee's algorithm.  With the incorporation of the heuristic (the rule which
guides the search in the direction most likely to succeed), it is more
difficult to specify a worst case behavior.  However, A* will never take more
time than Lee's algorithm, and will never search any cells that Lee's
algorithm could avoid.


6. Optimizations and Generalizations

The algorithms in figures 1 and 3 solve the general search problem.  When we
implement these algorithms and customize them to a particular application,
there are a few things we can do to speed them up.

The A* algorithm as presented in figure 3 recomputes the heuristic H(y) when
it discovers a better way to reach a cell.  Depending on how difficult this
heuristic is to compute, we can probably save some work at the expense of
complicating the algorithm.  When lines 20 and 21 of figure 3 are executed,
the previous values of G[y] and F[y] are destroyed.  But F[y] = G[y] + H(y),
so we could save F[y] - G[y] (which is H(y)) in a temporary variable, and use
that variable instead of recomputing H(y) on line 21.  Also, the common
subexpression G[x] + Distance(x,y) should be placed in a temporary variable,
instead of being computed twice (lines 18 and 20).

Often, rather than searching for a path between two individual cells, what is
really desired is a path between one of a set of source cells and one of a set
of target cells (as when connecting power and ground pins).  Both algorithms
can be modified by adding the entire set of source cells to the initial open
queue, and checking for a member of the set of target cells on each iteration.
When this is done, the heuristic that the A* algorithm uses becomes more
complicated.  It must estimate the minimum distance from the current cell to
any one of the target cells.

For breadth-first search, once the target cell is placed on the open queue, it
is pointless to add any more cells to the open queue.  In fact, once this
happens the problem has been solved.  An appropriate shortcut would be to
insert a check before line 13 in figure 1 to see if y is the target cell.  If
it is, immediately use Pred[y] to construct the trace back to the source cell,
and return.


Distance Calculations    [sidebar topic, with figures 5, 6, and 7]

The A* search algorithm depends on a heuristic to estimate the distance
between the current cell and the target cell.  As implemented in the
accompanying program, the heuristic is a simple geometric distance
approximation.

Figure 5 illustrates all of the possible cell types used to construct a trace,
grouped by type.  For each group, the distance of that cell type is also
given.  These distances are calculated based on a cell size of 50 mils by 50
mils.  A mil is a thousandth of an inch, so the autorouter uses 20 cells per
inch.  A typical full-length adapter board for an IBM PC is 4 inches high and
13 inches long, or 80 cell rowss by 260 cell columns.

The traces of groups B and C can coexist in the same cell, so that a hole can
have up to 16 traces connecting it with other cells (8 on each side of the
board).  Also, the parallel traces of group F can coexist in the same cell (on
the same side of the board), as shown by group J.  This allows the routing of
two traces through the same cell, providing the higher density required by
some circuits (memory arrays, for example).  Aside from these exceptions,
cells can only contain one type of trace (on each side of the board).

In figure 6, we want to know the approximate distance of a trace that will
connect the two holes.  Viewing the board as a matrix, the differences in cell
coordinates are three rows and five columns.  The shortest path between them
will use a diagonal trace across three cells and a horizontal trace across two
cells, as shown.  Using the cell types in figure 5, the length of the trace
will be 23 + (2 * 71) + 60 + 50 + 12 = 287 mils.

A trace that uses a routing hole to go from one side of the board to the other
covers a greater distance than one that goes diagonally across a cell (group E
in figure 5) and stays on the same side of the board.  This is because part of
its path goes around the edge of a circle.

A hole is 25 mils in diameter, and is at the center of a cell.  To calculate
the distance of a trace through a routing hole, we measure the section of the
hole between the two connecting traces.  Figure 7 shows an entering trace
connecting to a hole at point A, and possible exiting traces on the opposite
side of the board at points B, C, D, and E.  The distances between A and each
of these points are 10, 20, 29, and 39 mils, respectively.  To calculate
these, we use the geometric formula Circumference = PI * Diameter
(approximately 78.5 mils) and divide by eight (a one-eighth section of a hole
is approximately 9.8 mils), then add one, two, three, and four of these
sections, and round off to an integer.

The heuristic in the accompanying program includes a penalty when a trace
takes a turn or switches to the other side of the board through a routing
hole.  This is because turns are often the weak points in a circuit, and
traces are more likely to break at a turn than in a straight part.  Including
a penalty encourages A* to use straight lines, and even allows a slightly
longer trace to be selected over one with too many turns.  The amount of
penalty depends on the kind of turn; sharper turns are assessed a larger
penalty.  Routing holes incur a significant penalty, since overusing them
early in the routing process can make later traces more difficult or even
impossible to construct.  This is because a routing hole dedicates a cell
exclusively to a single trace, for both sides of the board.  Such a cell is
not available to later routing, thus reducing the total number of cells that
can be used.


7. Memory Requirements

Both of the search algorithms require quite a bit of memory in order to solve
problems of non-trivial size.  The breadth-first search algorithm needs memory
to represent the board, the predecessor structure, and the closed set.  The A*
search algorithm needs these too, plus structures for F[x] and G[x].  In
addition, both algorithms need to dynamically allocate memory for the open
cell queue.

In this program, the board is represented as a pair of two-dimensional arrays
(one for the front side of the printed circuit board, and one for the back
side), where the dimensions are the number of rows and columns of cells.  Not
counting holes and traces relating to holes (figure 5, groups A, B, and C),
there are 30 possible cell contents, which can be represented with five bits
per cell.  The hole-related cells are more difficult to enumerate, since they
can be combined in many ways.  If we simply assign one bit to each of the
eight traces in groups B and C, and add one more bit to indicate a hole, 14
bits will be sufficient to represent any cell.  On a board of N rows and M
columns, we'll need N*M*28 bits total.

The predecessor structure will also be a pair of two-dimensional arrays, where
an entry must be able to represent one of the eight compass directions or an
indication for the opposite side of the board.  This will take four bits per
cell, or N*M*8 bits total.

The closed set can be represented by a pair of two-dimensional single-bit
arrays, where a bit is one if the corresponding cell has been searched, and
zero otherwise.  This will take N*M*2 bits total.

F[x] and G[x] will be similar to the board arrays, but they must contain a
16-bit integer for each cell.  This will take N*M*64 bits total.  Note that if
memory usage needs to be minimized at the cost of increased processing time,
we could omit the F[x] arrays, and calculate the F values as they are needed
from the G[x] arrays and the heuristic function, H(x).

Breadth-first search will require N*M*38 bits and A* will require N*M*102 bits
of static memory.  For a printed circuit board that is 4 inches by 13 inches
(80 cells by 260 cells), breadth-first search will need 98800 bytes and A*
will need 265200 bytes.  It is often the case that different algorithms to
solve the same problem trade off memory against processing time to achieve
different behaviors.  This is the case with breadth-first search and A*.


8. Locality of Reference

Despite the fact that A* requires more memory than breadth-first search, A*
exhibits better memory usage patterns.  This is because it shows better
locality of reference than breadth-first search.  Locality of reference deals
with the sequence in which memory locations are used, and consists of two
rules of thumb: (1) the memory location currently being referenced is likely
to be referenced again in the near future, and (2) memory locations near the
one currently being referenced are likely to be referenced in the near future.

When the first rule holds true for a given program, that program can probably
benefit from a memory cache.  When the second rule holds true, the program can
probably benefit from a virtual memory environment with a least-recently-used
page preemption policy.  Most computer systems with virtual memory and caches
apply them to both code and data, so programs that exhibit good locality of
reference should benefit from both rules.

This becomes a factor when solving large problems (say, routing a printed
circuit board that is 10 inches in both dimensions).  In a virtual memory
environment, improved locality of reference can minimize swapping.  In an
environment with cache memory, improved locality of reference can increase the
cache hit rate.  Both of these tend to decrease the total running time.

The memory references in the breadth-first search algorithm go around and
around in circles of constantly increasing size, and do not reflect a common
locality of reference.  Thus, the breadth-first search algorithm is not able
to take good advantage of virtual memory or a memory cache.  However, the
memory references of A* tend to be from the same area of the printed circuit
board for extended periods of time, taking better advantage of these
mechanisms.  On a large problem, this helps to offset the extra memory that A*
requires by adding speed beyond that provided by the basic algorithm.
Improved locality of reference by itself may not be a sufficient reason to
select A* over breadth-first search, but it is icing on the cake.


9. Input and Output Formats

Figure 8 contains an example input file; it defines a circuit to calculate a
parity bit for a 16-bit word using exclusive-or gates.  There are statements
for defining the size of the printed circuit board, the locations of holes and
components, and the connections between them.

The autorouter sorts the connections by approximate trace distance (see the
section on distance calculations, above); the shortest connections are
processed first, and the longest are processed last.  However, connections
which include the "priority" keyword are processed before any of the other
connections.  This allows the designer to have control over the order in which
traces are constructed.

In addition, there are statements for defining chip templates, and associating
position-independent labels to each component and pin.  This makes it easy to
move components from one location to another without having to edit each
connection.  Chips can be rotated in any of four directions.

Figure 9 shows the traces created by the autorouter while processing the
example input file in figure 8.  On an 8 Mhz IBM PC-AT compatible computer,
this board took 37 seconds of processing time to route.

The output from the autorouter is a copy of the printed circuit board as it is
represented in memory.  This consists of the dimensions of the board (number
of rows and columns), followed by a pair of two-dimensional arrays (one for
the front side, and one for the back).  The elements of the arrays are double
words (32 bits) containing bits which encode the contents of each cell.  For
example, if a particular bit is set, there is a horizontal line running across
the cell, and if a different bit is set, there is a hole in the cell.


10. Viewing and Printing the Board

A program to view the routed printed circuit board on an IBM PC equipped with
an enhanced graphics adapter (EGA) is provided with the autorouter.  The
viewing program provides four levels of detail (zooming), panning, and viewing
of the front and back sides of the printed circuit board separately.

Another program prints the routed printed circuit board on a laser printer.
This program allows the user to specify four levels of detail and resolution
ranging from 75 to 300 dots per inch (dpi).  Figure 9 was produced with this
program using the maximum zoom level and 300 dpi.


11. Future Enhancements

The autorouter presented here provides the basics of a personal computer CAD
system, but could be enhanced in many ways.  A few of these are:

* Use Lotus-Intel-Microsoft (LIM) 4.0 expanded memory to provide access to
  more memory.

* Incrementally save the board after each connection is routed, so that a
  large problem can be solved in several short sessions, rather than requiring
  a computer to be dedicated for an extended period of time.

* Automatically compress the output (which often consists of many repeated
  bytes) to save disk space.

* Wide traces for power and ground.

* Calculate how close the solution of each connection is to the optimal
  solution.

* Add symbolic support for components such as capacitors, resistors, and
  diodes.

* Support surface mount technology (a pad, rather than a hole).

* Display a "rat's nest" of direct routes (use a straight lines for each
  connection, without regard for whether traces cross).  This would help the
  designer decide how the components should be arranged.  It may also be
  possible to do an analysis, and suggest in which direction each component
  should be moved.  Reducing the total distance of the traces that must be
  routed can significantly reduce the processing time of any autorouter.

* Provide support for printed circuit boards with more than two routing
  layers.

* A program the designer can use to edit the routed printed circuit board.

* Allow multiple priority levels for connections.

* Visually distinguish holes from routing holes by displaying them in a
  different color.

* Increase the size of the TTL library (chip definitions).

* A program to translate the output into Gerber format (a standard language
  for describing trace and hole placement, used for board fabrication) and
  Excellon format (a standard language for describing hole placement, used to
  drill the holes).


12. Conclusion

When autorouting is viewed as a special kind of search problem, there are
several search algorithms from the field of artificial intelligence that can
be used to solve it.  An autorouting program can be easily implemented on a
microcomputer, such as the IBM PC family of computers, and can greatly reduce
the amount of work required to route a printed circuit board by hand.  When
combined with other tools, such as the viewing and printing programs,
high-resolution output devices (laser printers), and high performance
386-based microcomputers, a complete design system can be built which makes
tape-ups obsolete.  Just a few years ago, such a system would have required an
expensive workstation.


13. Author Biography

Randy Nevin received a BS in computer science from Oregon State University in
1983 and an MS in computer science from the University of Washington in 1988.
He has worked for Microsoft since 1983 on various programming language and
networking products.


14. References

[1] Stephen E. Belter, Computer-aided Routing of Printed Circuit Boards: an
    Examination of Lee's Algorithm and Possible Enhancements, BYTE, June 1987,
    pp. 199-208.

[2] C. Y. Lee, An Algorithm for Path Connections and Its Applications, IRE
    Transactions on Electronic Computers, September 1961, pp. 346-365.

[3] Steven L. Tanimoto, The Elements of Artificial Intelligence, 1987,
    Rockville, Maryland: Computer Science Press, pp. 148-164.  This covers the
    breadth-first and A* search algorithms.