Notes Toward to a Semi-Random Convolver
If the 908 can be considered a sequential convolver (because of its single data-in port and internal row-buffers) and if the bi-linear interpolator from TRW can be considered a random convolver (because of its ability to load four data words and four coefficients in parallel) then what I am proposing is a semi-random convlover.
It might be easiest to think of it as having a column-slice architecture. Each of the N slices would be able to do a one-dimensional convolution of length N.
Each slice would have three frames (A,B,C) of RAM, each one of which could be used for: I/O, current data, and result. First frame B would be made the I/O frame and loaded with data. Once frame B was loaded then the convolution would start with the results deposited in frame C. Meanwhile, frame A would be made the new I/O frame and would be loaded with the next batch of data. At this point frame A would contain input data, frame B would contain old data, and frame C would contain the results that need to be reported out. So, the first row of data in C would be loaded into a row buffer, which would allow that first row to be available for input (frame C is now the I/O frame). Frame A is now the data frame and frame B is the result frame. And so on, with the previous result frame becoming the I/O frame; the previous I/O frame becoming the data frame; and the previous data frame becomes the result frame.
To make the example more concrete - assume that eight slices are to be used on a collective frame size of 256 x 256 data words. Then each frame in a slice has to contain 8k data words with 24k of internal ram altogether per slice. Each slice is capable of doing a one x eight convolution centered on any address in the 256 x 256 space. Internally, each frame would be organized as eight separate rams with address adjust logic for each ram.