Implementing a 256x256 Sixteen-State (4-bit) Real-Time Cellular Automaton
The simplest and most widely known Cellular Automaton {CA}, is the one used to implement John Conway’s “Game of Life” [Gardner 1970]. It includes a 2-D array of cells, say 256x256, and each sell is either ‘alive’ or ‘dead’ [1 or 0]. We then travel through this array in an orderly fashion visiting each cell in turn.- In order to decide the fate of each cell we look at it and its eight nearest neighbors:
- If the cell is alive it will stay alive just in case it has either two or three neighbors alive.
- If it has fewer than two alive neighbors it will “feel lonely” and die.
- If it has more than three alive neighbors it will ‘feel crowded’ and die.
- If it is dead and has exactly three alive neighbors it will come alive.
The core logic needed to implement a cellular automation is similar to that which is needed for a 3x3 convolver - two row buffers, nine latches, and the ability to recalculate the center of the 3x3 block based on its current value and the values of the surrounding eight data points {cells}.
- The system consists of:
- two cell buffers which are ping-ponged (each buffer in turn is either the source buffer or the destination buffer); each buffer is organized as 256x256x4 bits;
- video output logic (so that the evolution of the system can be followed in real-time);
- two Intel iFX780 FPGAs - one of which is used to implement the two rows buffers, the nine latches, and the recalculation of the center cell; the other iFX780 is used to control the cell buffers.
Increasingly, logic functions both within and without PLDs are being implemented by look-up table {LUT} logic since any function of n inputs and m outputs can be realized with 2**n*m bits of memory - and memory has the cheapest transistors. (The only drawback is the exponential cost of the input function.) Another consideration is that arithmetic operations do not map well into PLA logic while LUT logic is indifferent to the function being performed.
But as this strategy spreads to FPGAs it gives a designer one more place to find memory - inside the FPGA itself. This deign has two uses for intelligent memory: as row-buffer and as look-up table. The FPGAs are also used as counters, unary to binary converters (which are also called counters), magnitude comparators, programmable sequencers and as general logic function implementation.
Convolvers, cellular automata, and neutral nets
all share the similar feature that the value of the pixel, cell, or neuron is replaced by a value dependent on its current value and the values of its neighbors. In the 2-D case, both 3x3 sequential convolvers and 3x3 CAs have identical data flow structures that are best implemented with row buffers, pipeline registers, and two buffers (figure 1a).Cellular Automata
As mentioned above, common to all CAs, a new value for a cell is calculated based on the values of its neighbors and its own internal state. Consider a CA with a 2-D 256x256 cell space, a 3x3 neighborhood and four bits per cell. We would need two 64kx4 cell buffers {buffer A and buffer B} which we would ping-pong, alternately reading from one and writing the other. If the ping-pong pointer was pointing to B for reading and to A for writing, we would read each cell {location} in B in turn along with its eight neighbors. Those 36 bits {3x3x4} would be used to compute the next state for the current cell the value for which would be written to the same address, but into buffer A instead of back into buffer B.CA investigators proceed by varying the initial values in the cell buffers, by halting the CA and changing just some of the values in the cell buffers, but most importantly by changing the rules of the CA. While the potential rule state space is effectively infinite, for a large, important subset of the CAs one can characterize their
- rules as follows:
- a cell definition {which is the same for every cell} consisting of:
- the number of variables contained in a cell,
- the range of values for each of those variables.
- transformation rules for each variable
- a neighborhood definition
- a neighborhood function which looks at one or more of the variables in each of the cells in the neighborhood of the {current} cell {whose new value is being computed }; this value is them used in the computation of {1.3} for at least one of the variables {in the current cell}.
[{1=alive=excited}, {0=dead= discharged}];
and it is this variable that is used in the calculation of the neighborhood function. The ‘excited discharged’ terminology comes from neuron and excitable media research [Markus 1990] & [Gerhardt 1990], while ‘alive dead’ is from John Conway’s game of life [Gardner 1970]. The private variable is a timing variable that times out the excited, live, or 1 state and it also times the recovery or refractory period.
More formally, if the nine cells in a 3x3 neighborhood are labeled as follows: a | b | c --|---|-- d | e | f --|---|-- g | h | iand associated with each cell are two variables (u,v) s.t. u : [0,1] and v : [o..n] and if au is the value of u for cell a at time t and similarly for bu etc., and if eu’ an ev’ are the new values to be computed for cell e at t + 1 then:
ev’ = f(eu,ev)
eu’=g(eu,ev,sigma_u)
where sigma_u=(au+bu+cu+du+fu+gu+hu+iu) {note eu omitted}
and can be summarized with the following fragment of pseudo code:
case eu of {0,1 }
0 : {case=dead, discharged}
begin {0}
ev’= max(ev- v_down, 0)
if ( (sigma _u › u_ thresh) and
( ev’ ‹ v_thresh ) ) then eu’ = 1
else eu’= 0
end {0}
1 : {case = alive, excited}
begin {1}
ev’ = min(ev + v_up, n)
if (ev’ ³ n) then eu’ = 0 else eu’ = 1
end{1}
end ; { case eu of }.
It also happens that on occasion we want to make one or more of: u_thresh, v_thresh, v_up, and v_down functions of sigma_u. If all the cells in a neighborhood are discharged we could adjust v_thresh down - making it harder for the cell to become excited. Perfectly good CAs can be constructed without this capability, but in the particular implementation described below we get it for free.
Several further questions remain to be dealt with. The first one is the boundary/edge problem. Since the CA is based on neighborhood operations, where do we get the neighbors for the cells on the edge? The short answer is: we find them on the opposite edge. That is to say that we identify opposite edges so that the overall topology of the cell buffer is not a rectangle but a torus.
The second question is why do we need two cell buffers (or why can’t we simply write the new data back into the same cell buffer). The short answer is: excessive feedback. If we were proceeding through the cell buffers from the top to bottom and from left to right then, for a 3x3 neighborhood, cells a..d would contain new data and cells f..i would contain old data. Someone who is paying closer attention to this than they probably should might say ‘but what about the row buffers? Don’t they prevent that from occurring?’ To which we reply ‘Yes, but you forgot that when we are processing row 255, row 0 is part of its neighborhood and we need old data for that’.
One further problem remains. How do we change the rules? As we explain below, we implemented the ‘sigma_u > u_thresh’ function as a LUT with the address to the LUT equal to
FPGA Resources
The iFX780 contains eight configuration blocks each of which contains a 128x10 bit high speed static ram. Each block can be configured as either static ram or logic. For example, as we will demonstrate below, we use one block to implement a 256x5 row buffer and another block to implement the synchronous counters needed to control it.
FPGA Implementation
In an iFX780 the computation of eu’ and ev’ the function can be implemented as two, 256x5 blocks with raw_u forming an 8-bit address into the first block (by using one of the inputs as an output enable the block is effectively reconfigured as a 256x5 block - see below) and the four-bit output of that block plus eu and ev are the addresses into the second block. The first block simply acts as a unary to binary converter {counter} and takes as its inputs: au..du, fu..iu {eu omitted}. The second block takes the resulting four outputs {0..8}, and eu’ and ev’. Notice that in this implementation u_thresh, v_thresh, v_up, and v_down can all be functions of sigma_u.
We can generate values needed for the counter LUT {LUT0} with the logic shown in figure 1 a,b (note that iu is used to multiplex 10 output bits down to five). The code given simply counts the ones in the address (i.e., all neighbors are used and weighted equally):
[Toffoli 1985]
Tommaso TOFFOLI and Norman MARGOLUS, “Cellular Automata Machines,” MIT press (1985).
[May 1991]
MAY, Robert M., “Hypercycles Spring to Life,” Nature, 17 October 1991, 607-608
[Gerhardt 1990]
GERHARDT, Martin, Heike SCHUSTER, and John J. TYSON, “A Cellular Automaton Model of Excitable Media Including Curvature and Dispersion,” Science, 30 March 1990, 1563-1566.
[Markus 1990]
MARKUS, Mario and Benno HESS, “Isotropic Cellular Automaton for Modelling Excitable Media,” Nature, 6 September 1990, 56-58
Real-Time_Cellular_Automaton
LUT0 address: 0 1 2 3 4 ..63 64.. 127 128..254 255
LUT0 value: 0 1 1 2 1.. 6 1.. 7 1.. 6 8
but if ‘sigma _u’ is changed to:
sigma_u := bu + du + fu + hu ;
then only the very nearest neighbors are counted (we can even subtract the corners from very nearest neighbors):
sigma_u := bu + du + fu + hu - (au + cu + gu + iu) + 4 ;
then in LUT1:
sigma_u :=ah - 4 ;
or
sigma_u :=ah -4 + (2*eu) ;.
In fact we can use any encoding of ‘sigma_u’ that can be mapped into a 4-bit value. Generating the values for the second LUT is a little more complicated (figure 1a,c). The rows buffers will require one block for the memory plus an additional block for the 8-bit barrel pointer. The following figures describe its operation and the overall operation of the cell buffer / row buffer system
(figure 1 a,d)
Figure 1b - Counter look-up table (LUT0) definition
for au := 0 to 1 do
for bu := 0 to 1 do
for cu := 0 to 1 do
for du := 0 to 1 do
for fu := 0 to 1 do
for gu := 0 to 1 do
for hu := 0 to 1 do
begin
{iu :=0}
Lutadr := au*64 + bu*32 + cu*16 + du*8 + fu*4 + gu*2 + hu ;
sigma _uah := au + bu + cu + du + fu + gu + hu ;{iu :=1}
sigma _uai := sigma _uah + 1 ;
sigma _u :=sigma_uah + 32 * sigma_uai ; {iu == 0 = LSBs}
LUT0[Lutadr] :=sigma_u ;
end; {forforforforforforfor}
Figure 1c- LUT1 definition
for iu := 0 to 1 do
for ev := 0 to 7 do
for ai := 0 to 15 do {value from LUT0 = sigma_u}
begin
Lutadr := iu*64 + ev*16 + ah ;
sigma _u :=ah ; {or any of the functions mentioned above}
u_thresh :=Uthresh Fun(ah, eu); {normally
returns a constant}
v_thresh := VthreshFun(ah,eu); {normally
returns a constant}
{generate eu’ and ev’ according to
desired rules - but for example}
{ eu = 0 }
ev’ = max(ev - v_down, 0)
if ((sigma _u > u_thresh) and
(ev’ < v_thresh)) then
eu’ = 1 else eu’ = 0
euev0 :=8*eu’ = ev’ ; {form into 4-bit word}
{ eu = 1 }
ev’ = min(ev +v_up, 15)
if (ev’ >=15) then
eu’=0 else eu’ = 1
euevl := 8*eu’ + ev’ ; {form into a 4-bit word}
euev’ :=32*euev1 + euev0 ; {eu == 0 = LSBs} {form into a 10-bit word}
LUT1[Lutadr] := euev’ ;
end; {forforfor}
Figure 1d - Cell and Row buffer definition
for y:= 0 to 255 do {generated in iFX780 #2}
begin
row := (y + 255) mod 256 ;{FF 0 1...FE FF 0}{row read pointer}
{generated in iFX780 #2}
feflag :=false ;
for x := 0 to 255 do
begin
clm := (x + 255) mod 256 ; {column read pointer, generated in iFX780 #2}
rbp := clm mod 128 ; {row buffer(/barrel) pointer} {7F 0 1..7F 0.. 7F 0}
i_lat := memA[row,clm] ; {load fufv from rowbuffer}
f_lat := rb1b[rbp] ; {load cu from rowbuffer}
c_lat := rb2b[rbp] ; {load cu from rowbuffer}
al_lt := rb1a[rbp] ; {load intermediate latches}
a2_lt := rb2a[rbp] ; {load intermediate latches}
if (not feflag) then {test if clm <=254}
begin
rb1b[rbp] := al_lt ; {update row buffers with new data}
rb2b[rbp] := a2_lt ;
rb1a[rbp] :=i_lat ;
rb2a[rbp] :=f_lat ;
end; {if <= 254}
if ((y > 1) and (x > 1)) then {memory write counters}
memB[y-2,x-2] := euev” ; {0..255, 0..255 generated in iFX780 #2}
if (clm = 254) then feflag :=true ;
end; {for x}
end; {for y}
References
[Gardner 1970]
Martin GARDNER “The fantastic Combinations of John Conway’s New Solitaire Game ‘Life”,” Sc. Am. 223:4 (April 1970), 120-123