Tera

Tera Computer Company

Status:

Active manufacturer of parallel computers.

Overview of Organization:

Tera Computer Systems was founded by Burton Smith in 199? and funded and continue to be funded by DARPA and NSF. In 1998 they introduced their first general-purpose parallel computer, Tera MTA (Multi-Threaded Architecture). It was specifically designed for use in tackling large scale applications such as reservoir simulation, seismic exploration, 3-D computer aided design, molecular modeling, etc.

The first MTA has been installed at San Diego Supercomputer Center and as stated by the company, the system will be used to implement, optimize, and evaluate a wide range of applications.

Overview of the MTA:

Tera computer system is a shared memory multiprocessor. From its specification, it also implements the true shared memory programming model for which the performance of the system does not depend on the placement of data in memory.

Interconnect / Communications System:

Is a multi-processor accommodating up to 256 processors. The system runs stand-alone and requires no front end. Network connection to workstations and other computer systems is accomplished via 32- or 64-bit HIPPI channels. All data path widths are 64 bits, including the processor-network interface.

Processors:

64 processor performance estimates (by the manufacturer):

  Kernel                          Estimated Time
 --------                        ----------------
Matrix multiply,                      50 ms
 1K x 1K

3D FFT, 256 x                         63 ms
 256 x 256

Sparse matrix                         50 ms
 times vector,
 400M
 nonzeros

Integer sort,                         36 ms
 100M keys

Interconnect / Communications System:

  • The interconnection net is a 3-D packet switched containing p^(3/2) nodes, where p is the number of processors.

  • These nodes are toroidally connected in three dimensions to form a p^(1/2)-ary three-cube, and processor and memory resources are attached to some of the nodes.

  • The latency of a node is three cycles: a message spends two cycles in the node logic proper and one on the wire that connects the node to its neighbors.

  • A p-processor system has worst-case one-way latency of 4.5p^(1/2) cycles.

  • A node has four ports (five if a resource is attached).

  • Each port simultaneously transmits and receives an entire 164-bit packet every 3 ns clock cycle.

  • Of the 164 bits, 64 are data, so the data bandwidth per port is 2.67 GB/s in each direction.

  • The network bisection bandwidth is 2.67p GB/s.

  • The network routing nodes contain no buffers other than those required for the pipeline. Instead, all messages are immediately routed to an output port.

  • Messages are assigned random priorities and then routed in priority order. Under heavy load, some messages are derouted by this process. The randomization at each node insures that each packet eventually reaches its destination.

    The overall hardware config of the sytem:

    Peak Gflops           16           64          256
    
    Memory, Gbytes       16-32        64-128      256-51
    
    HIPPI channels        32           128         512
    
    Processors            16           64          256
    
    I/O, Gbytes/s         6.2          25          102
    

    Memory of the System:

    I/O System:

    Contact Address:

    Tera Computer Company
    2815 Eastlake Avenue East 
    Seattle, Washington 98102
    USA
    
    For more information on the Tera platforms, take a look at this this or look at the company's homepage.


    Saleh Elmohamed, saleh@npac.syr.edu