CDC STAR-100

From Wikipedia, the free encyclopedia
CDC STAR-100
CDC STAR-100 - 8MB and 4MB versions.png
Two CDC STAR-100, in 8 MB version (forefront) and 4 MB version (background)
Design
ManufacturerControl Data Corporation
DesignerJim Thornton
Release date1974 (1974)[1]
Casing
DimensionsFull computer approx :
Height : 212 cm (83 in)
Length : 745 cm (293 in)
Internal sections :[2]
Height : 76 in (190 cm)
Wide : 28.5 in (72 cm)
Deep : 30 in (76 cm)
Weight2,200 pounds (1,000 kg)
Power250 kW @ 208 V 400 Hz[2]
System
Operating systemHELIOS [2]
CPU64-bit processor @ 25 MHz[1]
MemoryUp to 8 megabytes (4 * 4 * 64K x 64bits) [3]
Storage-
MIPS1 MIPS (Scalar)[4][2]
FLOPS100 MFLOPS (Vector)[1]
Predecessor-
SuccessorCDC Cyber 200

The CDC STAR-100 is a vector supercomputer that was designed, manufactured, and marketed by Control Data Corporation (CDC). It was one of the first machines to use a vector processor to improve performance on appropriate scientific applications. It was also the first supercomputer to use integrated circuits and the first to be equipped with one million words of computer memory.[5]

The name STAR was a construct of the words STrings of binary digits that made up ARrays,[6] referring to the vector concept. The 100 came from 100 million floating point operations per second (MFLOPS), the speed at which the machine was designed to operate.[5] This compares to their earlier CDC 7600 which provided peak performance of 36 MFLOPS but more typically ran at around 10 MFLOPS.

The design was part of a bid made to Lawrence Livermore National Laboratory in the mid-1960s.[5] Livermore was looking for a partner who would build a much faster machine on their own budget and then lease the resulting design to the lab. It was announced publicly in the early 1970s, and on 17 August 1971, CDC announced that General Motors had placed the first commercial order for a STAR-100.

A number of basic design features of the machine meant that its real-world performance was much lower than expected when first used commercially in 1974, and was one of the primary reasons CDC was pushed from its former dominance in the supercomputer market when the Cray-1 was announced in 1975. Only three STAR-100 systems were delivered, two to Livermore Laboratory and another to NASA Langley Research Center.

Description[]

The STAR had a 64-bit architecture, consisting of 195 instructions.[7] Its main innovation was the inclusion of 65 vector instructions for vector processing. These new approximated what was available to users of the APL programming language and operated on huge vectors that were stored in consecutive locations in the main memory, which was virtualized for ease of programming. The CPU was designed to use these instructions to set up additional hardware that fed in data from the main memory as quickly as possible. For instance, a program could use single instruction with a few parameters to add all the elements in two vectors that could be as long as 65,535 elements.

To understand why vector instructions improve performance, consider the simple task of adding two 10,000 element arrays. In a traditional design, each element would require the computer to fetch the ADD instruction from memory, decode it, fetch the two operands from memory, perform the addition, and write the results back to memory. In a vector machine, the ADD instruction is read only once, thereby immediately saving 10,000 memory accesses. Additionally, the memory location of the "next" operand is known; it is one word higher in memory than the last. This allows the computer to fetch the next operands while the adder circuitry is still adding the last two values, as it doesn't have to wait for the instruction to be decoded. As soon as the ADD is complete, the adder can hand off the result to be written out and immediately begin work on the next two values. As with instruction pipelines in general, the time needed to complete any one instruction was no better than it was before, but since the CPU is working on a number of data points at once the overall performance dramatically improves due to the assembly line nature of the task.

Many of the STAR's instructions were complex, especially the vector macro instructions, which performed complex operations that normally would have required long sequences of instructions. These instructions, along with the STAR's generally complex architecture, was implemented with microcode.[8]

The main memory had a capacity of 65,536 superwords (SWORDs), which are 512-bit words.[9] The main memory was 32-way interleaved to pipeline memory accesses. It was constructed from core memory with an access time of 1.28 μs. The main memory was accessed via a 512-bit bus, controlled by the storage access controller (SAC), which handled requests from the stream unit. The stream unit accesses the main memory through the SAC via three 128-bit data buses, two for reads, and one for writes. Additionally, there is a 128-bit data bus for instruction fetch, I/O, and control vector access. The stream unit serves as the control unit, fetching and decoding instructions, initiating memory accesses on the behalf of the pipelined functional units, and controlling instruction execution, among other tasks. It also contains two read buffers and one write buffer for streaming data to the execution units.[9]

The STAR-100 has two pipelines where arithmetic is performed. The first pipeline contains a floating point adder and multiplier, whereas the second pipeline is multifunctional, capable of executing all scalar instructions. It also contains a floating point adder, multiplier, and divider. Both pipelines are 64-bit for floating point operations and are controlled by microcode. The STAR-100 can split its floating point pipelines into four 32-bit pipelines, doubling the peak performance of the system to 100 MFLOPS at the expense of half the precision.[9]

The STAR-100 uses I/O processors to offload I/O from the CPU. Each I/O processor is a 16-bit minicomputer with its own main memory of 65,536 words of 16 bits each, which is implemented with core memory. The I/O processors all share a 128-bit data bus to the SAC.

Real-world performance, users and impact[]

The STAR-100's real-world performance was a fraction of its theoretical performance. This was due to a number of reasons. Firstly, the vector instructions, being "memory-to-memory," had a relatively long startup time, since the pipeline from the memory to the functional units was very long. In contrast to the register-based pipelined functional units in the 7600, the STAR pipelines were much deeper. The problem was compounded by the fact that the STAR had a slower cycle time than the 7600 (40 ns vs 27.5 ns). So the vector length needed for the STAR to run faster than the 7600 occurred at about 50 elements; if the loops were working on data sets with fewer elements, the time cost of setting up the vector pipeline was higher than the time savings provided by the vector instruction(s).

When the machine was released in 1974, it quickly became apparent that the general performance was nowhere near what people expected. Very few programs can be effectively vectorized into a series of single instructions; nearly all calculations will rely on the results of some earlier instruction, yet the results had to clear the pipelines before they could be fed back in. This forced most programs to hit the high setup cost of the vector units, and generally the ones that did "work" were extreme examples. Making matters worse was that the basic scalar performance was sacrificed in order to improve vector performance. Any time that the program had to run scalar instructions, the overall performance of the machine dropped dramatically. (See Amdahl's Law.)

Two STAR-100 systems were eventually delivered to the Lawrence Livermore National Laboratory and one to NASA Langley Research Center.[10] In preparation for the STAR deliveries, LLNL programmers developed a library of subroutines, called STACKLIB, on the 7600 to emulate the vector operations of the STAR. In the process of developing STACKLIB, they found that programs converted to use it ran faster than they had before, even on the 7600. This placed further pressures on the performance of the STAR.

The STAR-100 was a disappointment to everyone involved. , formerly Seymour Cray's close assistant on the CDC 1604 and 6600 projects and the chief designer of STAR, left CDC to form Network Systems Corporation. An updated version of the basic architecture was later released in 1979 as the Cyber 203,[10] followed by the Cyber 205 in 1980, but by this point systems from Cray Research with considerably higher performance were on the market. The failure of the STAR led to CDC being pushed from its former dominance in the supercomputer market, something they tried to address with the formation of ETA Systems in September 1983.[10]

Installations[]

Five CDC STAR-100s were built. Deliveries started from 1974:[1]

  • Control Data Corporation, Arden Hills, MN (2)
  • Lawrence Livermore Lab. (2)
  • NASA Langley

References[]

  1. ^ a b c d LARGE COMPUTER SYSTEMS AND NEW ARCHITECTURES, T. Bloch, CERN, Geneva, Switzerland, November 1978
  2. ^ a b c d A Proposal to the Atlas Computer Laboratory for a STAR Computer System, Michael Baylis, Control Data, April 1972
  3. ^ Star-100 Hardware Reference Manual
  4. ^ Whetstone Benchmark History and Results
  5. ^ a b c MacKenzie, Donald (1998). Knowing Machines: Essays on Technical Change. MIT Press. ISBN 9780262631884.
  6. ^ CJ PURCELL. "The Control Data STAR-100". S2CID 43509695. Cite journal requires |journal= (help)
  7. ^ Hwang, Kai; Briggs, Fayé Alayé (1984). Computer Architecture and Parallel Processing. McGraw-Hill. pp. 234–249.
  8. ^ Schneck, P.B. (1987). Supercomputer Architecture. Kluwer Academic. pp. 99–118.
  9. ^ a b c P.M. Kogge, The Architecture of Pipelined Computers, Taylor & Francis, 1981, pp. 162–164.
  10. ^ a b c R.W. Hockney and C.R. Jesshope, Parallel Computers 2: Architecture, Programming and Algorithms, Adam Hilger, 1988, p. 21.

Further reading[]

  • R.G. Hintz and D.P. Tate, "Control Data STAR-100 processor design," Proc. Compcon, 1972, pp. 1–4.

External links[]

Retrieved from ""