From: Eric Blossom <eb@comsec.com>
To: norm@netcom.com
Message Hash: 624a05a0a4025fcc463a693c63f87ebd86be2e0233ee1bc6aa834792d951ce83
Message ID: <199408122359.QAA04429@modmult.comsec.com>
Reply To: <199408120554.WAA21416@netcom.netcom.com>
UTC Datetime: 1994-08-13 01:00:10 UTC
Raw Date: Fri, 12 Aug 94 18:00:10 PDT
From: Eric Blossom <eb@comsec.com>
Date: Fri, 12 Aug 94 18:00:10 PDT
To: norm@netcom.com
Subject: Multiprecision integer mult using FPU
In-Reply-To: <199408120554.WAA21416@netcom.netcom.com>
Message-ID: <199408122359.QAA04429@modmult.comsec.com>
MIME-Version: 1.0
Content-Type: text/plain
Norm Hardy writes:
> The PowerPC floating point is even more impressive. The fmadd instruction
> can do "a <- b*c+d" every other clock or 30 per microsecond on the low end
> Power Mac. If we store 24 bits of a multiple precision number in successive
> elements of an arrary then the inner loop of a multiply is a routine such
> as:
>
> void m8(float * a, float * b, double * p)
> {p[0] = a[0]*b[0];
> p[1] = a[0]*b[1] + a[1]*b[0];
> p[2] = a[0]*b[2] + a[1]*b[1] + a[2]*b[0];
> p[3] = a[0]*b[3] + a[1]*b[2] + a[2]*b[1] + a[3]*b[0];
> p[4] = a[0]*b[4] + a[1]*b[3] + a[2]*b[2] + a[3]*b[1] + a[4]*b[0];
> p[5] = a[0]*b[5] + a[1]*b[4] + a[2]*b[3] + a[3]*b[2] + a[4]*b[1] + a[5]*b[0];
> ....
> p[13] = a[6]*b[7] + a[7]*b[6];
> p[14] = a[7]*b[7];}
Nice hack Norm.
This would appear to apply to any processor where the floating point
performance is substantially greater than the integer. This is true
of the Pentium too.
Floating point:
latency/throughput
FADD 3/1
FMUL 3/1
FLD 1/1
FST 2/2 1/1 if storing to FPU stack
Integer:
ADD 1
MUL 10
Return to August 1994
Return to “norm@netcom.com (Norman Hardy)”