computer architecture & related topics ben schrooten shawn borchardt, eddie willett vandana...
TRANSCRIPT
PresentationTopics
Computer Architecture History
Single Cpu Design
GPU Design (Brief)
Memory Architecture
Communications Architecture
Dual Processor Design
Parallel & Supercomputing Design
• Completed:1946
• Programmed:plug board and switches
• Speed:5,000 operations per second
• Input/output:cards, lights, switches, plugs
• Floor space:1,000 square feet
The ENIAC : 1946
The EDSAC(1949) and The UNIVAC I(1951)EDSAC
Technology:vacuum tubes
Memory:1K words
Speed:714 operations per second
First practical stored-program computer
UNIVAC
Speed:1,905 operations per second
Input/output:magnetic tape, unityper, printer
Memory size:1,000 12-digit words in delay lines
Memory type:delay lines, magnetic tape
Technology:serial vacuum tubes, delay lines, magnetic tape
Floor space:943 cubic feet
Cost:F.O.B. factory $750,000 plus $185,000 for a high speed printer
Progression of The Architecture
Vacuum tubes -- 1940 – 1950
Transistors -- 1950 – 1964
Integrated circuits -- 1964 – 1971
Microprocessor chips -- 1971 – present
Intel 4004 1971
Motherboards / Chipsets / SocketsOH MY!
•Chipset
In charge of:
•Memory Controller
•EIDE Controller
•PCI Bridge
•Real Time Clock
•DMA Controller
•IRDA Controller
•Keyboard
•Mouse
•Secondary Cache
•Low-Power CMOS SRAM
GPU’s
•Allows for Real Time Rendering Graphics on a small PC
•GPUs are true processing units
•Geforce3 contains 57 million transistors on a 0.15 micron manufacturing process
•Pentium 4 contains 42 million transistors on a 0.18 micron process
SourcesSource for DX4100 PictureOneironauthttp://oneironaut.tripod.com/dx4100.jpg Source for Computer Architecture Overview Picturehttp://www.eecs.tulane.edu/courses/cpen201/slides/201Intro.pdf Pictures of CPU Overview, Single Bus Architecture, Tripe Bus ArchitectureRoy M. Wnek Virginia Tech. CS5515 Lecture 5http://www.nvc.cs.vt.edu/~wnek/cs5515/slide/Grad_Arch_5.PDF Historical Data and PicturesThe Computer Museum History Center.http://www.computerhistory.org/ Intel Motherboard Diagram/Pentium 4 PictureIntel Corporationhttp://www.intel.com The AbacusAbacus-Online-Museumhttp://www.hh.schule.de/metalltechnik-didaktik/users/luetjens/abakus/china/china.htm Information Also fromClint Flerihttp://www.geocities.com/cfleri/
Memory FunctionalityDana Angluinhttp://zoo.cs.yale.edu/classes/cs201/Fall_2001/handouts/lecture-13/node4.html Benchmark GraphicsDigital Lifehttp://www.digit-life.com/articles/pentium4/index3.html Chipset and Socket InformationMotherboards.orghttp://www.motherboards.org/articlesd/tech-planations/17_2.html Amd Processor PicturesToms hardwarehttp://www6.tomshardware.com/search/search.html?category=all&words=Athlon GPU Info4th Wave Inc.http://www.wave-report.com/tutorials/gpu.htm NV20 Design PicturesDigital Lifehttp://www.digit-life.com/articles/nv20/
DRAM vs. SRAM•DRAM is short for Dynamic Random Access Memory
•SRAM is short for Static Random Access Memory
DRAM is dynamic in that, unlike SRAM, it needs to have its storage cells refreshed or given a new electronic charge every few milliseconds. SRAM does not need refreshing because it operates on the principle of moving current that is switched in one of two directions rather than a storage cell that holds a charge in place.
Parity vs. Non-Parity Parity is error detection that was developed
to notify the user of any data errors. By adding a single bit to each byte of data, this bit is responsible for checking the integrity of the other 8 bits while the byte is moved or stored.
Since memory errors are so rare, many of today’s memory is non-parity.
SIMM vs. DIMM vs. RIMM? SIMM-Single In-line Memory Module DIMM-Dual In-line Memory Modules RIMM-Rambus In-line Memory Modules
SIMMs offer a 32-bit data path while DIMMs offer a 64-bit data path. SIMMs have to be used in pairs on Pentiums and more recent processors
RIMM is the one of the latest designs. Because of the fast data transfer rate of these modules, a heat spreader (aluminum plate covering) is used for each module
Evolution of Memory 1970 RAM / DRAM 4.77 MHz 1987 FPM 20 MHz 1995 EDO 20 MHz 1997 PC66 SDRAM 66 MHz 1998 PC100 SDRAM 100 MHz 1999 RDRAM 800 MHz 1999/2000 PC133 SDRAM 133 MHz 2000 DDR SDRAM 266 MHz 2001 EDRAM 450MHz
• FPM-Fast Page Mode DRAM -traditional DRAM
•EDO-Extended Data Output -increases the Read cycle between Memory and the CPU
•SDRAM-Synchronous DRAM -synchronizes itself with the CPU bus and runs at higher clock speeds
•RDRAM-Rambus DRAM -DRAM with a very high bandwidth (1.6 GBps)
•EDRAM-Enhanced DRAM -(dynamic or power-refreshed RAM) that includes a small amount of static RAM (SRAM) inside a larger amount of DRAM so that many memory accesses will be to the faster SRAM. EDRAM is sometimes used as L1 and L2 memory and, together with Enhanced Synchronous Dynamic DRAM, is known as cached DRAM.
Read Operation•On a read the CPU will first try to find the data in the cache, if it is not there the cache will get updated from the main memory and then return the data to the CPU.
References http://www-ece.ucsd.edu/~weathers/ece30/downloads/Ch7_memory(4x).pdf http://home.cfl.rr.com/bjp/eric/ComputerMemory.html http://aggregate.org/EE380/JEL/ch1.pdf
Defining a Bus A parallel circuit that connects the major
components of a computer, allowing the transfer of electric impulses from one connected component to any other
VESA - Video Electronics Standards Association
32 bit bus Found mostly on 486 machines Relied on the 486 processor to function People started to switch to the PCI bus
because of this Otherwise known as VLB
ISA - Industry Standard Architecture
Very old technology Bus speed 8mhz Speed of 42.4 Mb/s maximum Very few ISA ports are found in
modern machines.
MCA - Micro Channel Bus
IBM’s attempt to compete with the ISA bus 32 bit bus Automatically configured cards (Like Plug and
Play) Not compatible with ISA
EISA - Extended Industry Standard Architecture Attempt to compete with IBM’s MCA bus Ran on a 8.33Mhz cycle rate 32 bit slots Backward compatible with ISA Went the way of MCA
PCI – Peripheral Component Interconnect
Speeds up to 960 Mb/s Bus speed of 33mhz 16-bit architecture Developed by Intel in 1993 Synchronous or Asynchronous PCI popularized Plug and Play Runs at half of the system bus speed
PCI – X Up to 133 Mhz bus speed 64-bit bandwidth 1GB/sec throughput Backwards compatible with all PCI Primarily developed for increased I/O
demands of technologies such as Fibre Channel, Gigabit Ethernet and Ultra3 SCSI.
AGP – Accelerated Graphics Port
Essentially a high speed PCI Port Capable of running at 4 times PCI
bus speed. (133mhz) Used for High speed 3D graphics
cards Considered a port not a bus
Only two devices involved Is not expandable
BUS Width (bits)
Bus Speed
(Mhz)
Bus Bandwith
(Mbytes/sec)
8-bit ISA 8 8.3 7.9
16-bit ISA 16 8.3 15.9
EISA 32 8.3 31.8
VLB 32 33 127.2
PCI 32 33 127.2
AGP 32 66 254.3
AGP(X2) 32 66 X 2 508.6
AGP(X4) 32 66 X 4 1017.3
IDE - Integrated Drive Electronics
Tons of other names: ATA, ATA/ATAPI, EIDE, ATA-2, Fast ATA, ATA-3, Ultra ATA, Ultra DMA
Good performance at a cheap cost
Most widely used interface for hard disks
SCSI - Small Computer System Interface “skuzzy”
Capable of handling internal/external peripherals
Speed anywhere from 80 – 640 Mb/s
Many types of SCSI
TYPE Bus Speed, MBytes/
Sec. Max.
Bus Width,
bits
Max. Device
Support
SCSI-1 5 8 8
Fast SCSI 10 8 8
Fast WideSCSI
20 16 16
Ultra SCSI 20 8 8
Ultra Wide SCSI 40 16 16
Ultra2 SCSI 40 8 8
Wide Ultra2 SCSI 80 16 16
Ultra3 SCSI 160 16 16
Ultra320 SCSI 320 16 16
USB 1.0
hot plug-and-play Full speed USB devices signal at 12Mb/s Low speed devices use a 1.5Mb/s
subchannel. Up to 127 devices chained together
2.0 data rate of 480 mega bits per second
USB On-The-Go For portable devices. Limited host capability to communicate with
selected other USB peripherals A small USB connector to fit the mobile form
factor
Firewire i.e. IEEE 1394 and i.LINK
High speed serial port 400 mbps transfer rate 30 times faster than USB 1.0 hot plug-and-play
Parallel port i.e. “printer port” Old type Two “new” types ECP (extended capabilities port)
and EPP (enhanced parallel port) Ten times faster than old parallel
port Capable of bi-directional
communication.
Need for High Performance Computing There’s a need for tremendous
computational capabilities in science engineering and business
There are applications that require gigabytes of memory and gigaflops of performance
What is a High Performance Computer Definition of a High Performance computer :
An HPC computer can solve large problems in a reasonable amount of time
Characteristics : Fast Computation Large memory High speed interconnect High speed input /output
How is an HPC computer made to go fast Make the sequential computation faster
Do more things in parallel
Applications1> Weather Prediction2> Aircraft and Automobile Design3> Artificial Intelligence4> Entertainment Industry5> Military Applications6> Financial Analysis7> Seismic exploration8> Automobile crash testing
Who Makes High Performance Computers* SGI/Cray Power Challenge Array Origin-2000 T3D/T3E* HP/Convex SPP-1200 SPP-2000* IBM SP2 * Tandem•
Trends in Computer Design Performance of the fastest computer has
grown exponentially from 1945 to the present averaging a factor of 10 every five years
The growth flattened somewhat in 1980s but is accelerating again as massively parallel computers became available
Real World Sequential ProcessesSequential processes we find in the world.The passage of time is a classic example of a
sequential process.Day breaks as the sun rises in the morning.Daytime has its sunlight and bright sky.Dusk sees the sun setting in the horizon.Nighttime descends with its moonlight, dark sky
and stars.
Music
An orchestra performance, where every instrument plays its own part, and playing together they make beautiful music.
Parallel ProcessesParallel Processes
Parallel Features of Computers
Various methods available on computers for doing work in parallel are :
Computing environmentOperating system
Memory
Disk
Arithmetic
Computing Environment - Parallel FeaturesUsing a timesharing environment
The computer's resources are shared among many users who are logged in simultaneously.
Your process uses the cpu for a time slice, and then is rolled out while another user’s process is allowed to compute.
The opposite of this is to use dedicated mode where yours is the only job running.
The computer overlaps computation and I/OWhile one process is writing to disk, the computer lets
another process do some computation
Operating System - Parallel Features
Using the UNIX background processing facilitya.out > results &
man etime
Using the UNIX Cron jobs featureYou submit a job that will run at a later time.
Then you can play tennis while the computer continues to work.
This overlaps your computer work with your personal time.
Memory - Parallel Features
Memory InterleavingMemory is divided into multiple banks, and consecutive
data elements are interleaved among them.
There are multiple ports to memory. When the data elements that are spread across the banks are needed, they can be accessed and fetched in parallel.
The memory interleaving increases the memory bandwidth.
Memory - Parallel Features(Cont) Multiple levels of the memory hierarchy
Global memory which any processor can access.
Memory local to a partition of the processors.
Memory local to a single processor:
cache memory
memory elements held in registers
Disk - Parallel FeaturesRAID disk
Redundant Array of Inexpensive Disk
Striped diskWhen a dataset is written to disk, it is broken into
pieces which are written simultaneously to different disks in a RAID disk system.
When the same dataset is read back in, the pieces of the dataset are read in parallel, and the original dataset is reassembled in memory.
Arithmetic - Parallel Features
We will examine the following features that lend themselves to parallel arithmetic:Multiple Functional Units
Super Scalar arithmetic
Instruction Pipelining
MultiComputer A multicomputer comprises a number of
von Neumann computers or nodes linked by a interconnection network
In a idealized network the cost of sending the a message between two nodes is independent of both node location and other network traffic but does depend on message length
Distributed Memory (MIMD)
MIMD means that each processor can execute separate stream of instructions on its own local
data,distributed memory means that memory is distributed among the processors rather than placed in a central location
Difference between multicomputer and MIMD
The cost of sending a message between multicomputer and the distributed memory is not independent of node location and other network traffic
MultiProcessor or Shared Memory MIMD All processors share access to a common
memory via bus or hierarchy of buses
Use of Cache
Why is cache used on parallel computers?The advances in memory technology aren’t keeping up with
processor innovations.Memory isn’t speeding up as fast as the processors.One way to alleviate the performance gap between main
memory and the processors is to have local cache.The cache memory can be accessed faster than the main
memory.Cache keeps up with the fast processors, and keeps them
busy with data.
processor processor processor1 2 3
Shared Memory
Network
Cache Cache Cache Memory 1 Memory 2 Memory 3
Cache Coherence
What is cache coherence? Keeps a data element found in several caches current
with each other and with the value in main memory.
Various cache coherence protocols are used.snoopy protocoldirectory based protocol