Proposed Macros: {top-level design done plus adder topology}
1) a pipelined {/flash}, 32-bit integer {/floating point} multiply/divide unit; IEEE compatible format but no support for NANs or denormals; 100MHz pipelined {~~15 gate delays between pipeline stages} with 35 stages; or flash at 10MHz; if the scan test registers are needed anyway then a single design that can operate either flash or pipelined makes sense.
ditto but a programmable number of pipeline delays.
ditto but 64-bit.
2) a polynomial generator / mixer; in general, given {a0..aN, x} generate
aN*x^N + aN-1*x^N-1 + .. + a1*x + a0;
but each design would be for some fixed N {= 2,3..}. at 16 bits with N = 2 {~~12k gates} the generator would be useful for real-time morphing; two such circuits can be used to generate:
X = a2*u^2 + a1*u + a0
Y = b2*w^2 + b1*w + b0
in mixer mode one such circuit could generate:
Vidout = vidin_1 * j/n + vidin_2 * (n-j)/n : j <= n;
further, eight such circuits could be used to generate
X = a22*u^2*w^2 + a21*u^2*w + a20*u^2
+ a12*u*w^2 + a11*u*w + a10*u
+ a02*w^2 + a01*w + a00
Y = b22*u^2*w^2 + b21*u^2*w + b20*u^2
+ b12*u*w^2 + b11*u*w + b10*u
+ b02*w^2 + b01*w + b00
At n=2 and 32 bits {~~48k gates} 12 such devices could be used to generate the bi-quadratic rational polynomials used in such CAD packages as GMSolid (@ 100MHz per point).
3) An HSI <-> RGB color space converter {~~25k gates}. Data Translation is the only company making HSI<->RGB converters and their converters only go one way (and suffer from other inadequacies).
b3*x^3 + b2*x^2 + b1*x + b0