Flexible Real-Time Image Address Generation Combined with a Built-In Real-Time Image Filter

Despite the fact that Motorola describes the DSP96002 as a multi-media engine, the proposed Harris IMP100 takes the first real step in the direction of what is needed for a true real-time image DSP processor - flexible real-time image address generation combined with a built-in real-time image filter.

True, both the address generator and the image filter have the absolute minimum capabilty that still allows useful work.

Below I will suggest some ways to extend the IMP100’s capability - whether or not they would be cost effective on the first generation of what hopefully promises to be a family of parts is beyond the scope of this document. However, one consideration is that a somewhat more powerful system can readily be put together using TRW parts.

The IMP100 performs the semi-linear transform:

  x = a11uw + a10u + a01w + a00
  y = b11uw + b10u + b01w + b00

with u,w parametric over the display space {/output buffer} and x,y ranging over the image input buffer;

and is the highest level transform that can be performed by a single adder with a single add cycle at column clock time for each of x and y. Higher-level transforms require either more add cycles for each xi,yi or more adders.

If either the a11 or b11 coefficient is non-zero the resulting transform is hyperbolic - a special case of which is the false perpsective transform given in the app note. But the conditions under which the false-perspective transform works are extremely constrained - the parallel lines must be parallel to the x (or column) axis.

The addition of cascade inputs for x,y would allow the IMP100 to perform higher level transforms and would also facilitate its use as a buffer manager / image filter [BMIF] in systems in which the transform was generated externally to the IMP100. (This need for cascade inputs is even more acute in BMIF applications if a non-fixed radius filter, e.g., an elliptical weighted average filter, is used - see below).

The simplest way to add cascasde inputs would be to have a final output adder each for x,y with the cascade input as one input to the adder, s.t.,

  x = a11uw + a10u + a01w + a00 + x_cascade
  y = b11uw + b10u + b01w + b00 + y_cascade.

With the addition of the external hardware shown in fig. 1 the transform can be extended to:

             j           k
         x = [aj0u**j] + [a0kw**k] + a11uw + a10u + a01w + a00
             2           2

             m           n
         y = [bm0u**m] + [b0nw**n] + b11uw + b10u + b01w + b00.
             2           2

Some of that hardware could be folded into the IMP100 (as shown in fig. 2) with a great simplification in external hardware - only the RAM and some latches needed to form the LUTs would be needed.

If, however, the cascade input logic is configured as shown in fig. 3, then higher-level transforms can also be formed by cascading IMP100s. The IMP100 that provides the cascade output would be put into the mode in which the full 32 bits of the x,y transform would be emitted. Note however that in this mode there are accuracy constraints not present in the schemes shown in figs 1 & 2.

If seperate cascade input pins incur an unac high cost penalty then the scheme in shown in fig. 4 can be used

A rough but essentially accurate guide to the number of bits needed for address generators of the IMP100 sort is:

number of bits = final accuracy {16}
  + 1 guard bit + 1 sign bit
  + highest degree {2} {a11,b11}
  * size of buffer in adr. bits {11}
     = 18 + 22
     = 40 bits

but only 29 bits are required if the a11, b11 terms are both zero.

This equation also implies that TRW's claim that the TMC2302 executes a bi-cubic transform is misleading. Since the 2302 has only 48-bit internal accuracy it can accurately execute a bicubic over a neighborhood of at most 64 pixels x 64 pixels (with 10-bit final accuracy).

For the IMP100 to execute the full bi-quadratic transform would require 18 + 44 bits = 62 bits, and would probably be actually realized as 64 bits.

The scheme shown in fig 5 probably does not save any silicon but does allow 64-bit accuracy with only a 40-bit propagation delay. It costs 16 bits of adder and 16 bits of multiplexer but saves 144 bits of coefficient storage.

While ‘warping’ usually means that the parametric variables are defined over the output space (i.e., for X(u,w),Y(u,w) their domain is the output space and their range is the input buffer), ‘drawing’ (parametric over the input buffer) has certain advantages. For instance, with the transforms supported by the IMP100 you can map a square into an arbitrary quadrilateral - which allows for a somewhat greater range of false perspective transforms. There are three problems:

  1. at the beginning of each frame the entire output buffer must be initialized to some background value or scene;
  2. the range of the transform, both globally and locally, must be smaller in area than the domain; this is done in order to be certain to have a value for each output pixel within the region that the input image is zoomed down to; and,
  3. filtering is much more difficult.

I mention this mode of operation not because I think that the IMP100 should be able to filter in this mode but because I think it would be desireable that none of the features of the IMP100 prevent it from being used in this fashion (with the filtering and output buffer initializing done by other hardware).

The built-in bilinear interpolation image filter should be fine for the construction of higher level fixed radius filters. Whether or not the filter section of the IMP100 could be used to support orientable rectangular filters is a different matter. We are currently developing a simulator for such filters and once that is working to our satisfaction and if it seems possible that the IMP100 could be tricked into supporting such filters we will communicate that result to you.