Convex Computer Corporation

Convex Computer Corporation

Status

Is a Hewlett Packard company.

Overview of Organization

Formed in 1982, the company designs and manufactures computers aiming to satisfy scientific and technical users with an increasing demand for fast, affordable supercomputer performance. Convex see themselves as Cray substitutes and aim to provide compatibility with Cray and IBM systems. Convex computers are widely used for visualisation in computational fluid dynamics, medicine, computer-aided engineering, petroleum and seismic exploration, and finance.

It used to have approximately 1000 employees worldwide.

Their first computer series, the `C1', appeared in 1985, and included the `C120' vector uniprocessor -- this machine is no longer marketed nor produced, being superceded by the first multiprocessor offering, the `C2' series, in 1988. Just last year, in 1991, the `C3' series was announced, consisting of faster-noded and expanded C2 models. Full binary compatibility is claimed throughout the range.

Platforms Documented

Contact Address:

HP High Performance Systems Division
(formorly Convex Computer Corporation) 
3000 Waterview Parkway
PO Box 833851
Richardson
Texas 75083-3851
USA
Tel 214 497 4000

701 N. Plano Road
Richardson, Texas 75081
Tel 214-952-0200
FAX 214-952-0550

Division's homepage.

See Also:


Convex C1 Series

Overview of Platform:

Essentially a uniprocessor with the potential to be configured as a dual processor machine.

Compute Hardware:

The C120 has a high performance 64-bit cyustom designed CPU with integrated scalar/vector functions built from 8k CMOS gate arrays.

Interconnect / Communications System:

Nothing to interconnect.

Memory System:

4 GB of virtual memory and 1 GB or physical memory. 2 level cache system consisting of 1 kB data cache with 64kB P-cache and 4kB instruction cache

Benchmarks / Compute and data transfer performance:

40 MFLOPS per PE ((peak); 80MB/S memory path for memory cards.

Operating System Software and Environment:

Enhanced version of Berkeley UNIX.

Networkability/ I/O System / Integrability / Reliability / Scalability:

VAX/VMS command langauge compatibility; DECnet compatibility; VAX/EDT compatible text editor; VMS compatible job batching stystem; Fortran language extensions compatible with VAX Fortran.

Basic C120 system: two 19-in. racks and 32 Mbytes memory, 1 I/O processor, service processor, 434 Mbyte Winchester, 6250 bpi tape drive.

Size: 25 x 62 x 40 inches for each cabinet. Base system requires two cabinets, each about 500 lb. Forced air cooling. Power consumption 3200-4500 Watts

Notable Applications / Customers / Market Sectors:

Aeronautics.

Overall Comments:

It was pretty clear who Convex were trying to steal a market share from.


Convex C2 Series

Overview of Platform:

The C2 series includes models C200, C210, C220, C230, C240. The C2, which was available from January 1988, is a multiple-processor bus-connected, shared-memory computer. Each CPU is similar to (but a new design) the single CPUs of the C1 computers.

Compute Hardware:

One to a maximum of four custom vector processors.

The CPUs consist of a scalar and address unit (based on ECL 7K and 10K density chips) and a vector processor (using CMOS VLSI 20K gates/chip). The vector architecture is register-to-register with three asynchronous pipelined functions (load, store, and edit; add, subtract; multiply, add, divide, and square root). Each CPU has 8 vector registers, each with 128 elements (64-bit elements). VL and VS registers are also present. The scalar unit performs integer arithmetic and floating-point multiplies, adds, divisions, and square roots in hardware. There is a 64 Kbyte cache for the scalar unit with cache bypass for the vector unit. The cycle time is 40 nsec for the C2 (100 nsec for the C1). Scalar and vector units (fixed and float) can operate concurrently.

Interconnect / Communications System:

Shared memory accessed via a crossbar switch.

Memory System:

A maximum of 2GB of shared memory.

The C2 has new microcode instructions for vector square root, mask operations, type conversions, intrinsic functions, and random memory access.

Real memory is up to 4 Gbytes (1 Gbyte for the C1) of DRAM. The early C1 memories were in 256 Kbit DRAMs, but the later memories and those of the C2 use 1 Mbit DRAM. Virtual address space is 4 Gbytes (page size 4 Kbytes) with 2 Gbytes available per user. Memory is 64-way interleaved (32 bit) or 32 way (64 bit).

Transfer rates between memory and CPU on the C1 are rated at 80 Mbytes/sec. There is a single memory pipe between memory and registers.

On the C2, the access between each CPU and the memory is via a non-contentious, non-blocking 5-bus crossbar using ECL chips, with each bus rated at 200 Mbytes/sec.

The arithmetic is in floating-point IEEE standard format. Byte-addressable with integer*1, integer*2, integer*4, integer*8, complex*8, and complex*16 supported.

There is a 1/2 Mbyte IOP buffer. The IOP is 68000 based with event-driven monitor and I/O transfer rates of 80 Mbytes/sec on custom application boards, or standard Multibus at 8 Mbytes/sec, or VME bus at 16 Mbytes/sec.

Benchmarks / Compute and data transfer performance:

Performance 36 Mflops claimed for single CPUs; 8.24 Mflops obtained on SLALOM benchmark.

Data Transfer 800 MB/s maximum CPU/memory bandwidth.

Peak performance for the C120 is 20 Mflops in double precision (64-bit arithmetic) and 40 Mflops in single precision (32-bit arithmetic). LINPACK timings are 3.7 Mflops (100 x 100 matrix with unmodified code).

Peak scalar performance of C210 is 22 Whetstone mips at 32 bit and 14 Whetstone mips at 64 bit (with in-line subroutine expansion). Peak vector performance is 50 Mflops. LINPACK benchmark runs at 10.0 Mflops (again for unmodified code on 100 x 100 case).

The following two tables compare the C210 performance in Mflops of a single processor with the C120. The first table compares the performances for the algorithm Ai = Bi * k

       64 bit   32 bit
C120    6.6     13.3
C210   16.3     25.0 
The second table shows a comparison for an indirect vector addressing algorithm of the form A(Xi)=A(Xi)*B(Xi)*k

       64 bit   32 bit
C120    3.6      3.5
C210   12.5     16.7 
The C210, used in these benchmarks, is the single processor version of the C2 computer. The multiple processor C220, C230, and C240 versions are available and are all field upgradable from the C210.

Operating System Software and Environment:

UNIX 4.2 BSD and COVUE shell offer emulation of most common VMS commands.

Parallel Fortran, C, vectorized Ada, common Lisp, Prolog

Fortran characteristics: Fortran 77 with VAX extensions and excellent Fortran vectorizing compiler. C compiler (VC) automatically vectorizes scalar code. HCR/PASCAL and HCR/UX-BASIC are available as third party compilers.

A source level debugger and a range of editors, including VAX EDT emulation, are available.

There is a very extensive range of application software. General-purpose packages include NAG, IMSL, ABAQUS, MSC, NASTRAN, ANSYS, DI-3000, DISPLA, GKSGral, UNIRAS, TELEGRAF, Q-Calc, Sir, and Oracle.

Networkability/ I/O System / Integrability / Reliability / Scalability:

I/O EtherNet, UltraNet, TCP/IP, NFS and a `firm commitment' to HiPPI and FDDI. Proprietary IDC (Integrated Disk Channel) and TPI (Tape Library Interface).

All machines are stand-alone, multi-user, interactive machines. They can be interfaced to most standard communication channels including Ethernet (TCP/IP), DECnet, and Hyperchannel. Pink book and color books over LAN and NFS are also available. X25 color book will be available shortly. Batch job submission from VAX to C2 possible with output files and results returned to VAX.

A 2 CPU C220 system consumes 12 KW.

Notable Applications / Customers / Market Sectors:

Aeronautics, protein and DNA sequencing analysis and molecular modelling. CONVEX has sold 380 systems (280 C1, 100 C2) worldwide since 1985.

Overall Comments:


Convex C3 Series

Overview of Platform:

The C3 series includes the C3200, C3400, C3800 models.

Vector Register, Parallel Processor, Bus-Based Architecture

Compute Hardware:

GaAs vector processors, based on C2 series architecture

Interconnect / Communications System:

sharing a maximum of 2GB of memory accessed through a crossbar switch. C3200 models: one to a maximum of four processors, sharing a maximum of 2GB of physical memory; C3400 models: one to a maximum of four dual-CPUs, sharing a maximum of 2GB of physical memory; C3800 models: one to a maximum of four dual-CPUs, sharing a maximum of 4GB of physical memory.

Memory System:

2 or 4 GB of shared memory.

Benchmarks / Compute and data transfer performance:

Performance (claims) 50 Mflops per C3200 CPU, 100 Mflops per C3400 CPU, 240 Mflops per C3800 CPU. 1920 Mflops claimed for top of the range C3880 (8 CPU) system, which is a (dubious) perfectly scaled single-CPU figure.

Data Transfer 200 MB/s maximum node bandwidth to 800 MB/s total memory bandwidth for C3200 and C3400 systems; 480 MB/s node bandwidth for C3800 systems.

Operating System Software and Environment:

Operating System ConvexOS provides Unix 4.2 BSD and POSIX compliance. ASAP (Automatic Self-Allocating Processor) hardware for operating system management, and lots of caching technology. Hierarchical file management and advanced storage capabilities. Extensive file-migration software and robotic tape libraries. Languages Compilers available from the vendor for Fortran-77, C, Ada and popular Cray/IBM variants. Programming Environment Compiler support for vectorisation and parallelisation. Support for standard X Window System (including PEX), Application Visualisation System (AVS) and VMS interfaces. Comprehensive DBMS support, including ANSI SQL.

Networkability/ I/O System / Integrability / Reliability / Scalability:

I/O EtherNet, UltraNet, TCP/IP, NFS and a `firm commitment' to HiPPI and FDDI. Proprietary IDC (Integrated Disk Channel) and TPI (Tape Library Interface).

Notable Applications / Customers / Market Sectors:

Generally highly commended by Dornier (but dated '88) and used extensively by them (and others) for CFD/visualisation, specifically on the European Space Agency `Hermes' space-shuttle project.

Overall Comments:

Scalability: Currently limited from 1 or 2 to 8 nodes maximum, unlikely to scale much beyond 16 nodes, and pretty serious performance (or lack thereof) problems with current crossbar/shared-memory technology. On-line storage: Unlikely to achieve 30 Gbytes on-line, but seem to be well-placed within DBMS market, so I'd guess that their database software and robotic tape libraries were certainly interesting. Seem to have a pretty reasonable (and fast) IO system. Environment: Probably pretty effective for Database Management applications, although compiler parallelisations remain to be seen.


HP-Convex SPP-2000

Overview of Platform:

Crossbar-based symmetric multiprocessor (SMP).

Compute Hardware:

A distributed shared memory (DSM) machine based on the HP PA-8000 processors (180 MHz clock speed), the SPP-2000 system is built around the "hypernode" configuration. Each hypernode can have up to 16 HP PA-RISC 8000 processors and 16 Gbytes of memory. The PA-8000 has 10 piplined functional units and 4-way suprescalar.

Interconnect / Communications System:

Memory System:

Each hypernode has 4 GBs (256 Mb for CTI Network Cache, 410 Mb for Buffer Cache, and 80 Mb for OS), and up to 16 GB memory for the scalled up version. Maximum cabinet storage 612 GB with a maximum support for 12.96 TB.

Regarding the cache, it doesn't have on-chip cache but external one with a latency of 3 clock cycles. Bandwidth from cache to CPU is 2.88 Gbytes/sec. Bandwidth of data from memory to cache is 960 Mbyte/sec, and cache coherency is a directory based.

Benchmarks / Compute and data transfer performance:

Below are some representative benchmarks performed on NCSA platforms using GAUSSIAN-94 (G94). They show that the SGI Origin2000 and HP/Convex Exemplar SPP-2000 run virtually neck-and-neck in performance for this test case, in both scaling and per-processor performance.

Gaussian 94 - Molecule Alpha Pinene Results:

The PA-8000 peak performance: can execute up to 4 FP ops per cycle, resulting in 720 MFLOPS peak.

Operating System Software and Environment:

SPP-UX, a POSIX.1 conformant extension of HP-UX (HP-UX is based on Novell's UNIX). Process scheduling is non-preemptive (only kernel preempts) also supports POSIX threads.

Networkability/ I/O System / Integrability / Reliability / Scalability:

Notable Applications / Customers / Market Sectors:

CFD as well as computational physics are two of many applications suitable for the SPP-2000.


hawick@npac.syr.edu
saleh@npac.syr.edu