Convex Computer Corporation
Convex Computer Corporation
Status
Is a Hewlett Packard company.
Overview of Organization
Formed in 1982, the company designs and manufactures computers aiming
to satisfy scientific and technical users with an increasing demand
for fast, affordable supercomputer performance. Convex see themselves
as Cray substitutes and aim to provide compatibility with Cray and IBM
systems. Convex computers are widely used for visualisation in
computational fluid dynamics, medicine, computer-aided engineering,
petroleum and seismic exploration, and finance.
It used to have approximately 1000 employees worldwide.
Their first computer series, the `C1', appeared in 1985, and included
the `C120' vector uniprocessor -- this machine is no longer marketed nor
produced, being superceded by the first multiprocessor offering, the
`C2' series, in 1988. Just last year, in 1991, the `C3' series was
announced, consisting of faster-noded and expanded C2 models. Full
binary compatibility is claimed throughout the range.
Platforms Documented
Contact Address:
HP High Performance Systems Division
(formorly Convex Computer Corporation)
3000 Waterview Parkway
PO Box 833851
Richardson
Texas 75083-3851
USA
Tel 214 497 4000
701 N. Plano Road
Richardson, Texas 75081
Tel 214-952-0200
FAX 214-952-0550
Division's homepage.
See Also:
- C3800 Series Product Specification flysheet (1991);
- Product Overviews: C3200 \& C3400 (1991);
- C3400-ES ``Compact Super Computer'' pocket guide (1992);
-
HP
3000 Server 997.
Convex C1 Series
Overview of Platform:
Essentially a uniprocessor with the potential to be configured as a dual processor machine.
Compute Hardware:
The C120 has a high performance 64-bit cyustom designed CPU with
integrated scalar/vector functions built from 8k CMOS gate arrays.
Interconnect / Communications System:
Nothing to interconnect.
Memory System:
4 GB of virtual memory and 1 GB or physical memory. 2 level cache
system consisting of 1 kB data cache with 64kB P-cache and 4kB
instruction cache
Benchmarks / Compute and data transfer performance:
40 MFLOPS per PE ((peak); 80MB/S memory path for memory cards.
Operating System Software and Environment:
Enhanced version of Berkeley UNIX.
Networkability/ I/O System / Integrability / Reliability / Scalability:
VAX/VMS command langauge compatibility; DECnet compatibility; VAX/EDT
compatible text editor; VMS compatible job batching stystem; Fortran
language extensions compatible with VAX Fortran.
Basic C120 system: two 19-in. racks and 32 Mbytes memory, 1 I/O
processor, service processor, 434 Mbyte Winchester, 6250 bpi tape
drive.
Size: 25 x 62 x 40 inches for each cabinet. Base system requires two
cabinets, each about 500 lb. Forced air cooling. Power consumption
3200-4500 Watts
Notable Applications / Customers / Market Sectors:
Aeronautics.
Overall Comments:
It was pretty clear who Convex were trying to steal a market share from.
Convex C2 Series
Overview of Platform:
The C2 series includes models C200, C210, C220, C230, C240.
The C2, which was available from January 1988, is a multiple-processor
bus-connected, shared-memory computer. Each CPU is similar to (but a
new design) the single CPUs of the C1 computers.
Compute Hardware:
One to a maximum of four custom vector processors.
The CPUs consist of a scalar and address unit (based on ECL 7K and 10K
density chips) and a vector processor (using CMOS VLSI 20K
gates/chip). The vector architecture is register-to-register with
three asynchronous pipelined functions (load, store, and edit; add,
subtract; multiply, add, divide, and square root). Each CPU has 8
vector registers, each with 128 elements (64-bit elements). VL and VS
registers are also present. The scalar unit performs integer
arithmetic and floating-point multiplies, adds, divisions, and square
roots in hardware. There is a 64 Kbyte cache for the scalar unit with
cache bypass for the vector unit. The cycle time is 40 nsec for the C2
(100 nsec for the C1). Scalar and vector units (fixed and float) can
operate concurrently.
Interconnect / Communications System:
Shared memory accessed via a crossbar switch.
Memory System:
A maximum of 2GB of shared memory.
The C2 has new microcode instructions for vector square root, mask
operations, type conversions, intrinsic functions, and random memory
access.
Real memory is up to 4 Gbytes (1 Gbyte for the C1) of DRAM. The early
C1 memories were in 256 Kbit DRAMs, but the later memories and those
of the C2 use 1 Mbit DRAM. Virtual address space is 4 Gbytes (page
size 4 Kbytes) with 2 Gbytes available per user. Memory is 64-way
interleaved (32 bit) or 32 way (64 bit).
Transfer rates between memory and CPU on the C1 are rated at 80
Mbytes/sec. There is a single memory pipe between memory and
registers.
On the C2, the access between each CPU and the memory is via a
non-contentious, non-blocking 5-bus crossbar using ECL chips, with
each bus rated at 200 Mbytes/sec.
The arithmetic is in floating-point IEEE standard format.
Byte-addressable with integer*1, integer*2, integer*4, integer*8,
complex*8, and complex*16 supported.
There is a 1/2 Mbyte IOP buffer. The IOP is 68000 based with
event-driven monitor and I/O transfer rates of 80 Mbytes/sec on custom
application boards, or standard Multibus at 8 Mbytes/sec, or VME bus
at 16 Mbytes/sec.
Benchmarks / Compute and data transfer performance:
Performance 36 Mflops claimed for single CPUs;
8.24 Mflops obtained on SLALOM benchmark.
Data Transfer 800 MB/s maximum CPU/memory bandwidth.
Peak performance for the C120 is 20 Mflops in double precision (64-bit
arithmetic) and 40 Mflops in single precision (32-bit arithmetic).
LINPACK timings are 3.7 Mflops (100 x 100 matrix with unmodified
code).
Peak scalar performance of C210 is 22 Whetstone mips at 32 bit and 14
Whetstone mips at 64 bit (with in-line subroutine expansion). Peak
vector performance is 50 Mflops. LINPACK benchmark runs at 10.0
Mflops (again for unmodified code on 100 x 100 case).
The following two tables compare the C210 performance in Mflops of a
single processor with the C120. The first table compares the
performances for the algorithm Ai = Bi * k
64 bit 32 bit
C120 6.6 13.3
C210 16.3 25.0
The second table shows a comparison for an indirect vector addressing
algorithm of the form A(Xi)=A(Xi)*B(Xi)*k
64 bit 32 bit
C120 3.6 3.5
C210 12.5 16.7
The C210, used in these benchmarks, is the single processor version of
the C2 computer. The multiple processor C220, C230, and C240 versions
are available and are all field upgradable from the C210.
Operating System Software and Environment:
UNIX 4.2 BSD and COVUE shell offer emulation of most common VMS
commands.
Parallel Fortran, C, vectorized Ada, common Lisp, Prolog
Fortran characteristics: Fortran 77 with VAX extensions and excellent
Fortran vectorizing compiler. C compiler (VC) automatically
vectorizes scalar code. HCR/PASCAL and HCR/UX-BASIC are available as
third party compilers.
A source level debugger and a range of editors, including VAX EDT
emulation, are available.
There is a very extensive range of application software.
General-purpose packages include NAG, IMSL, ABAQUS, MSC, NASTRAN,
ANSYS, DI-3000, DISPLA, GKSGral, UNIRAS, TELEGRAF, Q-Calc, Sir, and
Oracle.
Networkability/ I/O System / Integrability / Reliability / Scalability:
I/O EtherNet, UltraNet, TCP/IP, NFS and a `firm
commitment' to HiPPI and FDDI. Proprietary IDC (Integrated Disk Channel)
and TPI (Tape Library Interface).
All machines are stand-alone, multi-user, interactive machines. They
can be interfaced to most standard communication channels including
Ethernet (TCP/IP), DECnet, and Hyperchannel. Pink book and color
books over LAN and NFS are also available. X25 color book will be
available shortly. Batch job submission from VAX to C2 possible with
output files and results returned to VAX.
A 2 CPU C220 system consumes 12 KW.
Notable Applications / Customers / Market Sectors:
Aeronautics, protein and DNA sequencing analysis and molecular modelling.
CONVEX has sold 380 systems (280 C1, 100 C2) worldwide since 1985.
Overall Comments:
Convex C3 Series
Overview of Platform:
The C3 series includes the C3200, C3400, C3800 models.
Vector Register, Parallel Processor, Bus-Based Architecture
Compute Hardware:
GaAs vector processors, based on C2 series architecture
Interconnect / Communications System:
sharing a maximum of 2GB of memory accessed through a crossbar switch.
C3200 models: one to a maximum of four processors, sharing a maximum of
2GB of physical memory;
C3400 models: one to a maximum of four dual-CPUs, sharing a maximum of
2GB of physical memory;
C3800 models: one to a maximum of four dual-CPUs, sharing a maximum of
4GB of physical memory.
Memory System:
2 or 4 GB of shared memory.
Benchmarks / Compute and data transfer performance:
Performance (claims) 50 Mflops per C3200 CPU, 100 Mflops per
C3400 CPU, 240 Mflops per C3800 CPU. 1920 Mflops claimed for top of
the range C3880 (8 CPU) system, which is a (dubious) perfectly scaled
single-CPU figure.
Data Transfer 200 MB/s maximum node bandwidth to 800 MB/s
total memory bandwidth for C3200 and C3400 systems; 480 MB/s node
bandwidth for C3800 systems.
Operating System Software and Environment:
Operating System ConvexOS provides Unix 4.2 BSD and POSIX
compliance. ASAP (Automatic Self-Allocating Processor) hardware
for operating system management, and lots of caching technology.
Hierarchical file management and advanced storage capabilities.
Extensive file-migration software and robotic tape libraries.
Languages Compilers available from the vendor for
Fortran-77, C, Ada and popular Cray/IBM variants.
Programming Environment Compiler support for vectorisation
and parallelisation. Support for standard X Window System (including
PEX), Application Visualisation System (AVS) and VMS interfaces.
Comprehensive DBMS support, including ANSI SQL.
Networkability/ I/O System / Integrability / Reliability / Scalability:
I/O EtherNet, UltraNet, TCP/IP, NFS and a `firm
commitment' to HiPPI and FDDI. Proprietary IDC (Integrated Disk Channel)
and TPI (Tape Library Interface).
Notable Applications / Customers / Market Sectors:
Generally highly commended by Dornier (but dated '88) and used
extensively by them (and others) for CFD/visualisation, specifically on
the European Space Agency `Hermes' space-shuttle project.
Overall Comments:
Scalability: Currently limited from 1 or 2 to 8 nodes
maximum, unlikely to scale much beyond 16 nodes, and pretty serious
performance (or lack thereof) problems with current
crossbar/shared-memory technology.
On-line storage: Unlikely to achieve 30 Gbytes on-line, but
seem to be well-placed within DBMS market, so I'd guess that their
database software and robotic tape libraries were certainly
interesting. Seem to have a pretty reasonable (and fast) IO system.
Environment: Probably pretty effective for Database Management
applications,
although compiler parallelisations remain to be seen.
HP-Convex SPP-2000
Overview of Platform:
Crossbar-based symmetric multiprocessor (SMP).
Compute Hardware:
A distributed shared memory (DSM) machine based on the HP PA-8000
processors (180 MHz clock speed), the SPP-2000 system is built
around the "hypernode" configuration. Each hypernode can have up
to 16 HP PA-RISC 8000 processors and 16 Gbytes of memory.
The PA-8000 has 10 piplined functional units and 4-way suprescalar.
Interconnect / Communications System:
-
Within a single hypernode, the 16-processors are connected through a
non-blocking crossbar switch, with a datapath of 64-bit wide and
960 MB/second bandwidth.
- Multiple hypernodes are interconnected
through the CTI (Coherent Toroidal Interconnect) in a torus-like
(donut) configuration as shown in the figure below.
-
Each ring is
unidirectional, so remote memory access does not depend on how far
a hypernode is from the hypernode requesting the data. A router
supports alternate routing paths between nodes. The following
figure shows the CTI Rings:
Memory System:
Each hypernode has 4 GBs (256 Mb for CTI Network Cache, 410 Mb for
Buffer Cache, and 80 Mb for OS), and up to 16 GB memory for the
scalled up version. Maximum cabinet storage 612 GB with a
maximum support for 12.96 TB.
Regarding the cache, it doesn't have on-chip cache but external one
with a latency of 3 clock cycles. Bandwidth from cache to CPU is
2.88 Gbytes/sec. Bandwidth of data from memory to cache is 960
Mbyte/sec, and cache coherency is a directory based.
Benchmarks / Compute and data transfer performance:
Below are some representative benchmarks performed on NCSA platforms
using GAUSSIAN-94 (G94).
They show that the SGI Origin2000 and HP/Convex Exemplar SPP-2000 run
virtually neck-and-neck in performance for this test case, in both
scaling and per-processor performance.
Gaussian 94 - Molecule Alpha Pinene Results:
The PA-8000 peak performance: can execute up to 4 FP ops per cycle,
resulting in 720 MFLOPS peak.
Operating System Software and Environment:
SPP-UX, a POSIX.1 conformant extension of HP-UX (HP-UX is based
on Novell's UNIX).
Process scheduling is non-preemptive (only kernel preempts) also
supports POSIX threads.
Networkability/ I/O System / Integrability / Reliability /
Scalability:
- Each of the agent chips has an I/O channel.
- Each I/O channel supports a 120 MB/sec PCI bus.
- I/O path is 32 bits wide.
- Each PCI controller supports 10 Ultra-SCSI disks.
- Each disk has 9 Gbytes capacity.
- Estimated bandwidth is 30-35 MB/sec per controller.
Notable Applications / Customers / Market Sectors:
CFD as well as computational physics are two of many
applications suitable for the SPP-2000.
hawick@npac.syr.edu
saleh@npac.syr.edu