Hitachi
Hitachi
Status
One of the world largest manufacturers of high performance computers.
Overview of Organization
Hitachi, Ltd. founded in 1910 is a general manufacturer of electrical
and electronic products with a worldwide workforce of some 330,000
employees. It manufactures Computers and communications systems,
consumer products, and power and industrial equipments.
High Performance Computers
Documented Systems
Contact Address
Hitachi, Ltd.
4-6 Kanda Surugadai,
Chiyoda-ku,
Tokyo 101-10,
Japan
The Company's
HPC page.
SR 2201
Overview of the Platform
The SR2201 series, built specifically for large-scale numerical
calculations, employs pseudo-vector processing, a pipelining
technique which fetches data(operands) directly from main memory
(bypassing the cache)and into floating-point registers,without
holding up the execution of subsequent instructions.
Compute Hardware
The Hitachi SR2201 High-end model is capable of performing over 600
billion floating-point operations per second
(600 GFLOPS theoretical performance), and Compact model is capable
of performing over 19 billion floating-point operations per second
(19 GFLOPS theoretical performance). The SR2201 High-end model
ranges from 32 up to a maximum of 2,048 processors,and the Compact
model ranges from 8 up to a maximum of 64 processors.
- PE to PE transfer rate is 300MB/s.
- Each PE is based on 64-bit RISC architecture.
- Total storage(maximum) is 64GB for the Compact model and
2TB for the High-end model.
- Theoretical performance of a PE is a 0.3GFLOPS.
Memory System
Addressable memory per processor is 64/128/256/512/1024 MB, and for
the primary cache is 16kB(instruction) and 16kB(data), and for
the secondary cache is 512kB(instruction) and 512kB(data).
Interconnect / Communications System
The High-end SR2201 line employs a three-dimensional crossbar switch
to provide high-speed connection among individual processing
elements(PEs). With this switch,there are only three output lines
from any PE: one for each of the crossbars. The architecture of
the switch is shown below.
Operating System Software and Environment
The OS used is HI-UX/MPP UNIX-based operating system which is a
microkernel system based on Mach 3.0.
It supports a number of third-party software, such as parallel tools
(MPI, PVM), Structural analysis, Crash analysis, quantum chemistry
analysis, and other libraries.
Networkability/ I/O System / Integrability / Reliability /
Scalability
The Compact models have internal hard disk capacity of 4.3 GB and the
High-end models' internal hard disk capacity is 2.1 GB. The disk
array is 4.6 - 66.6 GB/array (RAID5). The internal DAT capacity is
2 - 8 GB(compressed) and external DAT capacity is
2 - 8 GB(compressed) as well. The console display used is 80
characters x 25 lines. The number of processors can be expanded
from several tens to a thousand or more.
Benchmarks
Nothing is available at this time... Add LINPACK Benchamark if available.
SR 8000
Overview of the Platform
Is a scalable MIMD machine. Each processor
of the SR8000 is based on multiple of 64-bit addressing
microprocessor of RISC. Also each microprocessor has what is
called pseudo vector processing facility to help in large scale
processing. In regard to scalability, it tarts at 4 nodes (32
GFLOPS), with growth possible to 128 nodes (one TFLOPS),
Compute Hardware
The starting base is a 4-processor machines.
The higher end version has up to
128 processors. In between one can have an 8-processor,
16-processor, 32-processor or 64-processor machine.
The inter-node transfer rate is 1GB/sec (single direction) x 2. Each
node peak performance is 8 GFLOPS.
Memory System
Node memory capacity comes in either 2GB, 4GB, or 8GB. The total
maximum memory for the entire system varies with the number of
nodes: For a 4-node machines it is 32GB, for an 8-node machines
it is 64GB, for a 16-node machines it is 128GB, for a 32-node
machine it is a 256GB, for a 64-node machines it is a 512GB, and
for a 128-node machines it is a 1,024GB.
Interconnect / Communications System
Communication between nodes are done using a multidimensional crossbar
network to facilitate the scalibilty factor.
For the 4 and 8-processor versions, the crossbar net is
one-dimensional, for the 16, 32 and 64-processor it is
two-dimensional, and for the 128-processor it is
three-dimensional.
External interfaces are done through
Ultra SCSI, Ethernet/Fast Ethernet, ATM, HIPPI all with the TCP/IP
protocol.
Operating System Software and Environment
The SR8000 OS is HI-UX/MPP which is a micro kernel based on UNIX
operating system and optimized for parallel distributed
processing.
The software environment consists of a host of programming languages,
parallel programming tools and libraries are all
supported. Languages such as FORTRAN 77, FORTRAN 90, C, C++ and
HPF; parallel tools such as MPI and PVM; and scientific
libraries such as those of MATRIX/MPP are all supported.
Networkability/ I/O System / Integrability / Reliability /
Scalability
In regard to scalability, the starting base is a 4-node system running
at 32 GFLOPS (according to the manufacturer), with growth possible
to 128 nodes running at a TFLOP (again, according to the
manufacturer).
In regard to reliability, the RISC microprocessors used come with
parity checking in the data & address lines, as well as arithmetic
units. The memory has Error Correction Code (ECC) capable of
correcting burst errors (errors of multiple consecutive
bits). Also, used an inter node network capable of re-transmitting
messages when errors are detected, etc.
The I/O system and networkability are identical to the SR2201.
Benchmarks
LINPACK benchmark for Hitachi SR2201, S-3800, S-820. S-810, and
M680H: (Due to Jack Dongarra)
- S-3800/480 (4 processors, 2 ns), problem size n=1000, has a TTP
(best effort) of 20640 Mflop/sec and a theoretical peak of 32000 Mflop/sec.
- S-3800/380 (3 processors, 2 ns), problem size n=1000, has a TTP
(best effort) of 16880 Mflop/sec and a theoretical peak of 24000 Mflop/sec.
- S-3800/280 (2 processors, 2 ns), problem size n=1000, has a TTP
(best effort) of 12190 Mflop/sec and a theoretical peak of 16000 Mflop/sec.
- S-3800/180 (1 processor, 2 ns), problem size n=1000, has a TTP
(best effort) of 6431 Mflop/sec and a theoretical peak of 8000 Mflop/sec.
- S-820/80 (4 ns), problem size n=100 and using compiler Fort77/Hap
V23-0C, runs at 107 Mflop/sec. Theoretically it runs at 3000 Mflop/sec.
- SR2201 (1 processor running at 150 MHz) for problem size n=100 it
give about 30 Mflop/sec and for a problem size n=1000 it runs at 248 Mflop/sec.
Theoretical peak is 300 Mflop/sec.
- M680H/vector for problem size n=100, it gives about 16 Mflop/sec.
- S-810/10 using HAP v21.00 and problem size 16 Mflop/sec. Theoretical
peak is 315 Mflop/sec.
- S-810/20 using FORT77/HAP, for n=100 it runs at 17 Mflop/sec, and
theoretical peak of 840 Mflop/sec.
- M680H using Fortran77 E2 and V04-0I runs at 15 Mflop/sec.
saleh@npac.syr.edu