From: jdd@aiki.demon.co.uk (Jim Dixon)
To: qualcomm!karn
Message Hash: 351c123b521d0b987aab50d10bfd5d737dc4baf0c42174c0f6fac11418e40eae
Message ID: <8145@aiki.demon.co.uk>
Reply To: N/A
UTC Datetime: 1994-08-27 00:19:16 UTC
Raw Date: Fri, 26 Aug 94 17:19:16 PDT
From: jdd@aiki.demon.co.uk (Jim Dixon)
Date: Fri, 26 Aug 94 17:19:16 PDT
To: qualcomm!karn
Subject: Re: DSPs
Message-ID: <8145@aiki.demon.co.uk>
MIME-Version: 1.0
Content-Type: text/plain
In message <199408262009.NAA17046@unix.ka9q.ampr.org> Phil Karn writes:
>
> Yes, but even scalar multiplication is so much faster on a DSP than on
> most general purpose CPUs that it seems like a definite win. The 486
> takes from 13-42 clock cycles to perform a multiply, depending on the
> operand sizes and number of significant bits in the multiplier.
The Motorola DSP96002 does an integer multiply in 2 or 3 clocks, so a
33 MHz device does 11 million multiplies (and moves) a second. The
chip costs about $50.
The newer TI C40 does a 32-bit integer multipy in 1 clock, so a 50 MHz
device can output 200 MB/s of results. It can read in a single clock
cycle but writes take two cycles (sometimes more). So although it can
theoretically read 200 MB/s, it can only write 100 MB/s. However, it
has six serial links, each one of which has a 20 MB/s bandwidth, so
in theory it can pump out 100+120 = 220 MB/s. However, in practice you
would expect the chip to be I/O bound. It costs something like $200
a chip.
The real advantage of the C40 is that C40s can be connected together
using their serial links. This allows them to be arranged in
interesting 3D topologies. In this respect the C40 is intended to be
an upgrade on the transputer, which has only four links, and tends to
die when connected into large 2D meshes, because the transputers
spend too much of their time passing messages.
If C40s are connected in pipelines, with three links used as input
from the preceding stage and three links used to drive the next
stage, you can run them comfortably at 60MB/s. You might choose to
do three multiplies on each 32-bit operand at this rate, giving
you effectively multiplications at 45 MHz at each stage of the
pipeline.
> Even
> if you couldn't keep the pipeline full on a chip like the PowerPC, you'd
> still be well ahead.
Ahead of the 486 maybe, but the C40 makes the PowerPC a dog.
> But then I hear people say that it's not the multiplication that slows
> down modular exponentiation, it's the modular reduction.
Can you elaborate?
--
Jim Dixon
Return to August 1994
Return to “jdd@aiki.demon.co.uk (Jim Dixon)”