Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

CPU Architecture and Instruction Fetch & Execute Cycle: A Comprehensive Guide, Slides of Computer Architecture and Organization

Vishwakarma University Computer Architecture and Organization

Different ppt on the subject. MCQs on the subject

Typology: Slides

2019/2020

Uploaded on 04/18/2020

khanjan-varma 🇮🇳

1 document

1 / 15

This page cannot be seen from the preview

Don't miss anything!

Lecture 2

The CPU, Instruction Fetch & Execute

In Lecture 1 we learnt that the separation of data from control helped simplify the

definition and design of sequential circuits, particularly when there were many registers

involved only with storage.

We saw that the main memory, as well as holding plain data, could hold the program

data — the ordered list of instructions that specify what you want to machine to do.

We speculated that when an instruction is read from memory, it could be passed im-

mediately to the control part of the machine in order to change the effective transfer

function of the data section.

In this lecture we develop the detailed organization of the CPU to support that idea.

Rather than introduce individual components separately and then stick them together,

we will dive in at the deep end by revealing a “Bog Standard Architecture” for the CPU.

2.1 A Bog Standard Architecture

The CPU contains

•a number of registers, some of which fall on the address side, others on the data

side;

•an arithmetic logic unit;

•the control section or control unit;

•connections to the memory (a large unit of storage) by two buses, the uni-

directional address bus and the bi-directional data bus; and

•internal buses or data pathways which allow the output of one register to connect

to the input of another.

1

Partial preview of the text

Download CPU Architecture and Instruction Fetch & Execute Cycle: A Comprehensive Guide and more Slides Computer Architecture and Organization in PDF only on Docsity!

Lecture 2

The CPU, Instruction Fetch & Execute

In Lecture 1 we learnt that the separation of data from control helped simplify the definition and design of sequential circuits, particularly when there were many registers involved only with storage.

We saw that the main memory, as well as holding plain data, could hold the program data — the ordered list of instructions that specify what you want to machine to do.

We speculated that when an instruction is read from memory, it could be passed im- mediately to the control part of the machine in order to change the effective transfer function of the data section.

In this lecture we develop the detailed organization of the CPU to support that idea.

Rather than introduce individual components separately and then stick them together, we will dive in at the deep end by revealing a “Bog Standard Architecture” for the CPU.

2.1 A Bog Standard Architecture

The CPU contains

a number of registers, some of which fall on the address side, others on the data side;
an arithmetic logic unit;
the control section or control unit;
connections to the memory (a large unit of storage) by two buses, the uni- directional address bus and the bi-directional data bus; and
internal buses or data pathways which allow the output of one register to connect to the input of another.

1

2/2 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE

CPU

Outside the CPU

SETalu

Address Bus

Data Bus

CLKmem

SP

MAR

AC

IR(opcode) IR(address)

Status

MBR

IR

CU ALU

Memory

Control Lines

PC

INCpc/LOADpc

to Registers, ALU, Memory, etc

Figure 2.1: Our Bog Standard Architecture

2.1.1 CPU Registers

K MAR The Memory Address Register is used to store the address to access memory.

K MBR The Memory Buffer Register stores information that is being sent to, or received from, the memory along the bidirectional data bus.

K AC The Accumulator is used to store data that is being worked on by the ALU, and is the key register in the data section of the cpu. Notice that the memory can’t access the AC directly. The MBR is an intermediary.

K PC Connected to the internal address bus, the Program Counter holds the address in memory of the next program instruction. Notice that it doesn’t connect directly to the memory, but must go via the the MAR. (The PC is special in that it is both a register and a counter. More later.)

K IR When memory is read, the data first goes to the MBR. If the data is an instruction it gets moved to the Instruction Register. The IR has two parts:

IR (opcode) The most significant bits of the instruction make up the opcode. This the genuine instruction part of the instruction, that tells the cpu what to

2/4 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE

K ALU The Arithmetic Logic Unit is responsible for bit operations on data held in the AC and MBR and for storing the results. It contains arithmetic adders, logical AND-ers and OR-ers, and so on. A special requirement in our architecture is a “null operation” or “no-op” which simply allows the output of the AC to appear at the output of the ALU. Again we will detail its hardware later.

K SR Closely associated with the ALU is the Status Register or Condition Control Word or Status Word. It is not quite the same as the other registers in that it really just a collection of 1bit flags that indicate the outcome of operations that the ALU has just carried out. There are the flags (you met in P2) Carry C, Overflow V flags, negative flag N, and zero flag Z. These are monitored by the CU.

2.1.3 Buses, registers, and their widths

The buses carry words of information which are many bits wide, and on diagrams a bus is indicated either by a wide line, or by a single line with a dash through it often accompanied by the bus width in bits.

Data: Microcontrollers have data bus widths of 4 bits, 8-bits, 16-bits and 32-bits, while the most advanced PCs use 64 bits. In these lectures we will assume that the “memory width” is 16 bits or 2 Bytes. This means that each location can store 2 Bytes. We will also assume that the data bus is 16 bits wide, and the MBR and AC registers on the data side of the CPU are therefore also 16 bits wide. The ALU is also 16 bits wide.

16bits

16 16bits

AC

MBR Memory MBR

(a) (b)

Figure 2.3: (a) The data side is 2 Bytes or 16 bits wide. The ALU has been omitted here, but is also 16 bits wide. (b) You should not think that the MBR register (for example) has grown multiple electrical inputs. The actual wiring involves tri-state buffers, as becomes clear in Lecture 3.

Address: The address bus does not have to be the same width as the data bus. The width on CPUs over time has increased in step with contemporary memory technology,

2.1. A BOG STANDARD ARCHITECTURE 2/

with the the Intel 8086 (from 1979) having n = 20 address lines to current processors having n = 36 − 40.

Having n address lines means that that there are 2 n^ addresses or locations in the address space. A convenient method of figuring out 2 n^ is to remember that 210 = 1024, so n = 10 lines address 1K locations, n = 20 lines address 1M locations, and n = 30 can address 1G locations. Of course microcontrollers tend to have a smaller amount of memory, because they are not designed to multitask (i.e., run multiple programs), and 256K locations is the largest number spotted (in 2010).

However, for lecturing purposes it is useful (i) to have different numbers on the address and data side, and (ii) to keep things in multiple of 8 — so here we will assume a 24 bit address bus, able to access 16M location. (Note this is not necessarily 16MByte of memory. Why not?) The PC, SP, and MAR in our cpu will therefore be 24 bits wide.

24bits 24bits

24bits (^2) locations

SP

MAR

PC Memory

Figure 2.4: The address side is 24 bits or 3 Bytes wide. The address space has 224 locations.

The IR is special. The IR (opcode) part should be wide enough to take the largest opcode. We assume the opcode is a fixed 8 bits wide, allowing 256 different instructions — which is plenty enough. The IR (address) part has to have the same width as the address bus, 24 bits. So the whole IR is 32 bits wide.

It is however fed from the internal data bus which is only 16 bits wide in our architecture. We will return to solve this conundrum in §2.6.1.

IR(opcode) IR(address) MBR

IR

Figure 2.5: The IR must be 8 + 24 = 32 bits width.

2.3. FETCHING AND DECODING AN INSTRUCTION 2/

Fetching and an Executing an instruction simply require the CPU’s Control Section to issue Levels and Pulses which set up pathways and fire register transfers so that

Data is moved from memory to registers, and between registers
Data is passed (sometimes) through the ALU, and
Data is stuffed back into the memory

If you are in need of an analogy, we are doing little more than “playing trains” with data. The Control Secton uses Levels to “set the points” and create the route between A and B, and uses a Pulse to send the train from A to B.

2.3 Fetching and decoding an instruction

To start processing the cpu needs to fetch the first instruction in the program from the main memory. The Program Counter is the key register here. The PCalways holds the address of the next program instruction in the main memory. It is said to point to the next instruction^1. But remember that the memory address register acts as a gatekeeper to the memory, so the first thing to happen is that the program counter gets copied into the memory address register. The register transfer is

MAR←PC

Because it is the MAR that is clocked, this leaves the PC unaltered. Now read the memory into the MBR.

MBR←〈MAR 〉

The next step is to copy the instruction from the MBR to the instruction register.

IR←MBR

In our standard architecture the IR is split into two parts, IR (opcode) and IR (address). As far as the instruction fetch is concerned it the IR (opcode) that is important. The opcode is decoded by the control unit, as described later.

Last comes a touch of housekeeping. Usually the next instruction in the program is located in the next memory location, so the program counter is incremented.

PC←PC + 1 (^1) Any memory address points to the memory contents at that address.

2/8 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE

So to summarize, the instruction fetch requires the following in RTL, where you should note that the program counter can be incremented at the same clock tick as loading the instruction register.

Instruction fetch

MAR←PC
MBR←〈MAR 〉
IR←MBR; PC←PC + 1 (Then decode the opcode) NB: these line numbers will soon turn into RTL Control Steps!

CPU

Outside the CPU

SETalu

Address Bus

Data Bus

CLKmem

SP

MAR

AC

IR(opcode) IR(address)

Status

MBR IR

CU ALU

Memory

Control Lines

INCpc/LOADpc^ PC

to Registers, ALU, Memory, etc

2.4 A few instructions

Our CPU uses 8-bit opcodes, so could distinguish 256 different instructions. For the purpose of explanation we give just nine from our instruction set. Column 1 contains the assembler language mnemonic, which is shorthand for several lines of RTL. Column 2 gives an overall “RTL-like” description. Column 3 is the binary opcode.

Inst Overall RT Opcode Meaning HALT 00000000 Stop the clock LDA x AC←〈x〉 00000001 Load AC with contents of mem address x STA x 〈x〉 ←AC 00000010 Store AC in memory at address x ADD x AC←AC + 〈x〉 00000011 Add mem contents at x to AC AND x AC =AC ∧ 〈x〉 00000100 Logical and ... JMP x PC←x 00000101 Jump to instruction at address x BZ x if Z=1 then PC←x 00000110 if Z-flag is set then jump NOT AC←AC 00000111 Two’s complement the AC SHR AC←RightShift(AC) 00001000 Shift the AC 1bit to right

An assembler language is designed around a particular cpu, and there is no standard set of mnemonics. However, once you understand the purpose of the instructions, it is trivial to convert between languages.

2.5 Executing an instruction

During the instruction fetch, an opcode is put into the IR (opcode), and is decoded by the control unit (exactly how we will see later). The CU now “knows” which instruction it should execute, and can therefore output a sequence of of Levels and Pulses to set up paths and effect the desired the register transfers.

2/10 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE

2.5.2 Example of LDA x fetch and execute.

Figure 2.7 gives an example of the information flows in the entire fetch and execute of LDA x. In the example, the instruction is found in memory location 2, and the actual instruction is LDA 5.

a b

c

e

f

g

h

i

don’t care

Decode

d

AC=527 dec

LDA x So, MAR<−IR(address) MBR<−M AC<−MBR

INC

MAR=

Memory

00000101

PC=2 MAR= PC=

00000000 00000101

5 4 3 2 1

IR(opcode) IR(address)^0

00000010 00001111

MBR=

MBR= 00000000 00000101

00000010 00001111

00000001

Figure 2.7: Example of the Fetch and execute of the LDA x instruction with x=5 and starting with PC =2. The steps follow a,b,c, ...,i.

(a) During the fetch, MAR←PC

(b) Addressing location 2

(c) Reading the memory MBR←〈MAR 〉

(d) Now the MBR is transferred to the IR.

(e) The last part of the fetch is to increment the PC.

(f) Decode, then first step of execute is MAR←IR (operand)

(g) Now addressing location 5.

(h) Reading the memory MBR←〈MAR 〉 again.

(i) Now transfer to the Accumulator AC←MBR.

2.5. EXECUTING AN INSTRUCTION 2/

2.5.3 Execute STA x

Store the contents of the AC in the memory location x which is held in the IR (operand).

STA x

MAR←IR (address); MBR←AC
〈MAR 〉 ←MBR; →(`fetch)

CPU

Outside the CPU

SETalu

Address Bus

Data Bus

CLKmem

SP

MAR

AC

IR(opcode) IR(address)

Status

IR MBR

CU ALU

Memory

Control Lines

INCpc/LOADpc^ PC

to Registers, ALU, Memory, etc

2.5.4 Execute ADD x

Add the memory contents of x to the AC. This involves the ALU. Looking back at the BSA diagram we see that the ALU has two inputs one from the AC and the other from the MBR. To perform this line of RTL we must, before firing the transfer, set up the ALU to perform the binary addition of its inputs, rather than say the logical AND or logical OR. As you might guess, and as we will see in Lecture 4, this requires the configuration of combinational circuitry using a level signal. Obviously we are going to need a CSL in addition to a CSP for this line of RTL.

ADD x

MAR←IR (address)
MBR←〈MAR 〉
AC←AC + MBR; →(`fetch)

CPU

Outside the CPU

SETalu

Address Bus

Data Bus

CLKmem

SP

MAR

AC

IR(opcode) IR(address)

Status

IR MBR

CU ALU

Memory

Control Lines

INCpc/LOADpc^ PC

to Registers, ALU, Memory, etc

2.5.5 AND x

This again involves the ALU, but now we are performing a logical AND.

AND x

MAR←IR (address)
MBR←〈MAR 〉
AC←AC ∧ MBR; →(`fetch)

2.6. DECODING THE OPCODE 2/

2.6 Decoding the opcode

Earlier on, when discussing the Instruction Fetch, we wrote “then decode the opcode”. Although it is common to talk of the fetch-execute cycle, it is useful to think about decoding as a separate distinct phase in a fetch-decode-execute cycle.

Now suppose that

we have a mechanism of producing a set of level signals [LDA=1, STA=0, ADD=0, etc] when the opcode is LDA; and similarly for other opcodes; and
we have written the RTL for all the execute phases, and know that the execute phases start at control steos 10,13,15,18, and so on;

then we could write a decoding step using RTL’s conditional goto (see lecture 1):

Decoding (NB! this is RTL)

→(LDA,STA,ADD,AND, ..., SHR,HALT)/(10,13,15,18,...,25,99)

where 10, etc, are the first lines of the RTL which execute the respective instruction.

2.6.1 Decoding detail — engineering to the rescue

So far, so good. However, in our BSA, the MBR and data bus are 16 bits wide. Hence, IR ←MBR can only supply the 8 bit opcode with an 8 bit operand. If operands are only 8 bits long, we can only access 256 of our 224 locations. How can we fill the operand up to its full 24 bits?

The engineered solution is to make the LDA instruction (and any other instruction that needs a full operand) to read the next 16-bit word of memory into the MBR, and then into the IR (address), building up a 24 bit address for transferral to the MAR.

In our instruction set the first 6 opcodes LDA, STA, ADD, AND, JMP, BZ require this extra read of memory. We could write the decoding stage in RTL as follows:

Longer Decoding

→(NOT,SHR,...)/(24,25,...) //Ie, all that don’t need extra
MAR←PC
MBR←〈MAR 〉
IR [23:8] ←MBR; PC←PC + 1
→(LDA,STA,...)/(10,13,...) //Ie, all that do need extra

This solution ef-

fectively divides up the opcodes into two sets: those that need full operands and those that don’t.

2/14 LECTURE 2. THE CPU, INSTRUCTION FETCH & EXECUTE

Data Bus

bits (23−16) bits (15−8) bits (7−0)

Loc+ Loc

Data Bus

bits (23−16) bits (15−8) bits (7−0)

Loc+ Loc

MBR

IR

MAR

IR(opcode) IR(address)

Memory

MBR

IR

MAR

IR(opcode) IR(address)

Memory

Figure 2.8: The fetch delivers the opcode and the low Byte of the operand. After decoding, if the opcode requires a full operand with 3 Bytes another read of the memory can take place.

The detail of this step is often missed out in text books, and it is assumed that the fetch (lines 1-3) provides an operand of full length. We too will neglect the problem — unless explicitly asked to worry about it!

(If we want to worry about it, there are changes of detail required to the RTL of line 3 of the instruction fetch. What are they, and once you’ve written the RTL could you draw the hardware involved?)

2.7 Summary

In this lecture we have

laid out an architecture for a simple CPU, introduced its components, and de- scribed how the CPU connects to memory.
noted that the data section of our CPU divides into two halves, one more con- cerned with addresses, the other with data, with the instruction register at the junction
described using RTL the fetch of an instruction from memory into the instruction register, and learned that the instruction is made up of opcode and operand
described, again using RTL, the execute phase of several common instructions
discussed “simple” decoding, and how it may be made more elaborate to overcome constraints imposed by the architecture.

CPU Architecture and Instruction Fetch & Execute Cycle: A Comprehensive Guide, Slides of Computer Architecture and Organization

Related documents

Partial preview of the text

Download CPU Architecture and Instruction Fetch & Execute Cycle: A Comprehensive Guide and more Slides Computer Architecture and Organization in PDF only on Docsity!

Lecture 2

The CPU, Instruction Fetch & Execute

2.1 A Bog Standard Architecture

CPU

Outside the CPU

SP

MAR

AC

MBR

IR

CU ALU

PC

IR

2.3 Fetching and decoding an instruction

2.4 A few instructions

2.5 Executing an instruction

a b

c

e

f

g

h

i

Decode

d

2.6 Decoding the opcode

2.7 Summary