# PROFESSIONAL ENGINEERS OF ONTARIO NATIONAL EXAMS December 2008 98-Comp-A3 (COMPUTER ARCHITECTURE) 3 hours duration #### **NOTES:** - 1. If doubt exists as to the interpretation of any question, the candidate is advised to submit with the answer paper a clear statement of any assumptions made; - 2. Candidates may use one of the two calculators: the Casio or Sharp approved models. This is a Closed-Book exam - 3. You must answer Q1 and any four (4) of the remaining questions. | NAME (Please pri | nt): | <del></del> | |--------------------|-----------------------|-------------| | SIGNATURE: | | w. | | DO not write anyth | hing below this line: | | | | Question[1] 20 pts | | | | Question[2] 20 pts | - | | | Question[3] 20 pts | • | | | Question[4] 20 pts | | | | Question[5] 20 pts | | | | Question[6] 20 pts | | | | TOTAL (out of 100) | | ## Q1-General concepts - a) Explain the basic instruction cycle. What are the key actions performed by the processor during an instruction cycle? - b) What is the main purpose of interrupt processing? List the most common classes of interrupts. How does interrupt processing take place? - c) Identify two key architectural features that distinguish RISC and CISC machines. What are the typical characteristics of a RISC instruction set architecture? - d) The performance of a 100Mhz processor P is measured by executing 10,000,000 instructions of benchmark code, which is found to take 0.25 ns. What are the values of CPI and MIPS for this performance experiment? Is this processor likely to be superscalar? - e) Identify and briefly describe three distinct ways in which parallelism can be introduced into the microarchitecture of a computer in order to increase its overall execution speed. - f) Discuss the advantages and disadvantages of storing programs and data in the same memory (the stored program concept). Under what circumstances is it desirable to store programs and data in separate memories? #### Q2- Memory organization - a) List the main physical differences between the following memory technologies: SRAMs, flash memories, magnetic floppy disks, optical hard disks, and CD-ROMs. - b) Discuss in qualitative terms the impact of the following design decisions on cache performance: i) selection of a cache block (line) size p that is too small; ii) selection of a cache block size that is too big; iii) selection of an associativity level k that is too small - c) A certain memory configuration has four levels $M_1, M_2, M_3$ and $M_4$ with hit ratios of 0.8, 0.95, 0.99, and 1.0, respectively. A program Q makes 3000 references to this memory system. Calculate the number of references $R_i$ , made by Q that are satisfied by an access to level $M_i$ . #### Q3- Datapath and control design - a) List the advantages and disadvantages of designing a floating-point processor in the form of a k-stage pipeline. A floating-point pipeline has five stages $S_1, S_2, S_3, S_4$ and $S_5$ whose delays are 120, 90, 100, 85, and 100 ns, respectively. What is the pipeline's maximum throughput in millions of floating-point operations per second (MFLOPS)? - b) Explain why one (1) is the lower bound on the CPI of conventional, nonsuperscalar microprocessors. Name and briefly describe two techniques that superscalar microprocessors use to make CPI less than one. - c) Explain what is meant by a hardwired implementation of a control unit. What is the difference between a hardwired and a microprogrammed implementation of a control unit? Discuss the general attributes of horizontal and vertical microinstructions. #### **Q4** Processor basics a) Suppose that the contents (hexadecimal) of two CPU registers in a 32-bit processor are as follows: R0 = 01237654 R1=7654EDCB The following store-word instructions are executed to transfer the contents of these registers to main memory M. STORE R0, ADR STORE R1, ADR+4 Assuming that M is byte-addressable, give the contents of all memory locations affected by the above code (i) if the computer is big-endian, (ii) if the computer is little-endian. b) Consider a set of four processors $P_0$ , $P_1$ , $P_2$ and $P_3$ , where $P_i$ is an *i*-address machine. $P_0$ is a zero-address stack machine, while $P_1$ , $P_2$ and $P_3$ are conventional computers each with 16 general-purpose registers R0:R15 for data and address storage. All four processors have instructions with the (assembly language) opcodes ADD, SUB, MUL, and DIV to implement the operations +, -, x, and /, respectively. (i) Using as few instructions as you can, write a program for each of the four machines to evaluate the following arithmetic expression: $$X = (A/B+CxD)/(DxE-F+C/A)+G$$ Use standard names for any additional instructions the you need, for example, LOAD, PUSH or POP (result of the expression is stored in memory location X) (ii) Calculate the total object-program size in bits for each of your four programs assuming the following data on machine-language instruction formats: opcodes (which contain no addressing information) are 8 bits long; memory-address length is 16 bits; and register-address length is 4 bits. #### Q5- System design - a) A magnitude-comparator circuit compares two unsigned numbers X and Y and produces three outputs, $z_1, z_2$ , and $z_3$ , which indicate X<Y, X>Y and X=Y, respectively: (i) show how to implement a magnitude comparator for 2-bit numbers using a single 16-input, 3-bit multiplexer of appropriate size, (ii) show how to implement the same comparator using a single 8-input 2-bit multiplexer and a few two-input NOR gates. - b) Certain very small-scale ICs contain a single two-input gate. The ICs are manufactured in three varieties-NAND, OR and EXCLUSIVE-OR- as indicated by a printed label on the IC's package. By mistake, a batch of all three varieties is manufactured without their labels (i) Devise an efficient test that a technician can apply to any IC from this batch to determine which gate type it contains. (ii) Suppose the batch of unlabeled ICs contains NOR gates, as well as, NAND, OR and EXCLUSIVE-OR. Devise an efficient testing procedure to determine each IC's gate type. - c) Show how to connect n half adders to form an n-bit combinational incrementer whose function is to add one (modulo $2^n$ ) to an *n*-bit number X. For example, if X = 10010111, the incrementer should output Z = 10011000; if X = 111111111, it should output Z = 10011000. ### Q6-System organization - a) A CISC computer consists of a CPU and an I/O device D connected to main memory M via a one-word shared bus. The CPU can execute a maximum of $10^6$ instructions per second. An average instruction requires five machine cycles, three of which use the memory bus. A memory read or write operation uses one machine cycle. Suppose that the CPU is continuously executing "background" programs that require 90 percent of its instruction execution rate but no I/O instructions. Now D is to be used to transfer very large blocks of data to and from M. (i) If programmed I/O is used and each one-word I/O transfer requires the CPU to execute two instructions, estimate the maximum I/O data-transfer rate $r_{MAX}$ possible through D. (ii) Estimate $r_{MAX}$ if DMA (Direct Memory Access) is used. - b) Let s denote the fraction of time that must be spent on the serial parts of a program Q, and let p denote the fraction of time spent on the parallelizable parts of Q. Assuming that s+p=1, show that Amdahl's law for the speedup S(n) achievable by an n-processor computer executing Q can be reformulated as follows: $$S(n) = \frac{1}{s + p/n}$$ c) Amdahl's law makes the implicit assumption that p is independent of n. In practice, the problem size tends to increase with n; that is problems expand to use the additional processors. This situation suggests that the time for a serial processor to execute Q should be represented by s+pn, given that it runs in time s+p=1 on the parallel processor. With this assumption, derive an alternative expression for S(n). Comment on its implications concerning the performance of massively parallel computers.