Prepare for your exams
Get points
Guidelines and tips

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Superscalar Processing and Parallelism: Understanding Multiprocessor Systems, Slides of Computer Aided Design (CAD)

Baba Ghulam Shah Badhshah University Computer Aided Design (CAD)

An overview of superscalar processing, multiple ways of achieving parallelism in computing systems, and the concepts of instruction-level, data-level, and thread-level parallelism. It also covers static and dynamic scheduling, pipelining, and cache coherence in the context of multiprocessor systems.

Typology: Slides

2012/2013

Uploaded on 04/24/2013

baijayanthi 🇮🇳

4.5

(13)

171 documents

1 / 44

This page cannot be seen from the preview

Don't miss anything!

Increasing Machine Throughput

Superscalar Processing

Multiprocessor Systems

Docsity.com

Partial preview of the text

Download Superscalar Processing and Parallelism: Understanding Multiprocessor Systems and more Slides Computer Aided Design (CAD) in PDF only on Docsity!

Increasing Machine Throughput

Superscalar Processing

Multiprocessor Systems

Multiprocess ing

• There are 3 generic ways to do multiple things “in

parallel”

Instruction-level Parallelism (ILP)
- Superscalar
  - doing multiple instructions (from a single program) simultaneously
Data-level Parallelism (DLP)
- Do a single operation over a larger chunk of data
  - Vector Processing
  - “SIMD Extensions” like MMX
Thread-level Parallelism (TLP)
- Multiple processes
  - Can be separate programs
  - …or a single program broken into separate threads
- Usually used on multiple process ors , but not required

Dynamic Scheduling

 4 reservation stations for  4 separate pipelines
- Each pipeline may have a different depth

C o m m i t

u n i t

I n s t r u c t i o n f e t c h

a n d d e c o d e u n i t

I n - o r d e r i s s u e

I n - o r d e r c o

m m i t

L o a d /

S t o r e

F l o a t i n g

p o i n t

I n t e g e r I n t e g e r

F u n c t i o n a l …

u n i t s

O u t - o f - o r d e r e x e c u t e

R e s e r v a t i o n

s t a t i o n

R e s e r v a t i o n

s t a t i o n

R e s e r v a t i o n

s t a t i o n

R e s e r v a t i o n

s t a t i o n

COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED

Docsity.com

How to Fill Pipeline Slots

We’ve got lots of room to execute – now how do we fill

the slots?

This process is called Scheduling
- A schedule is created, telling instructions when they can execute
2 (very different) ways to do this:
- Static Scheduling
  - Compiler (or coder) arranges instructions into an order which

can be executed correctly

Dynamic Scheduling
- Hardware in the processor reorders instructions at runtime to

maximize the number executing in parallel

Dynamic Pipeline Scheduling

Allow the hardware to make scheduling decisions
In order issue of instructions
Out of order execution of instructions
In case of empty resources:
- The hardware will look ahead in the instruction stream to see if there are

any instructions that are OK to execute

As they are fetched, instructions get placed in

reservation stations – where they wait until their

inputs are ready

Dynamic Scheduling

 4 reservation stations for  4 separate pipelines
- Each pipeline may have a different depth

C o m m i t

u n i t

I n s t r u c t i o n f e t c h

a n d d e c o d e u n i t

I n - o r d e r i s s u e

I n - o r d e r c o

m m i t

L o a d /

S t o r e

F l o a t i n g

p o i n t

I n t e g e r I n t e g e r

F u n c t i o n a l …

u n i t s

O u t - o f - o r d e r e x e c u t e

R e s e r v a t i o n

s t a t i o n

R e s e r v a t i o n

s t a t i o n

R e s e r v a t i o n

s t a t i o n

R e s e r v a t i o n

s t a t i o n

Docsity.com

Dynamic Scheduling

Case Study

Intel’s Pentium 4
- First appeared in 2000
Possible for 126

instructions to be

“in-flight” at one

time!

Processors have gone

“backwards” on this

since 2003

Superscalar Processing and Parallelism: Understanding Multiprocessor Systems, Slides of Computer Aided Design (CAD)

Related documents

Partial preview of the text

Download Superscalar Processing and Parallelism: Understanding Multiprocessor Systems and more Slides Computer Aided Design (CAD) in PDF only on Docsity!

Increasing Machine Throughput

Superscalar Processing

Multiprocessor Systems

Multiprocess ing

• There are 3 generic ways to do multiple things “in

parallel”

Dynamic Scheduling

COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED

Docsity.com

How to Fill Pipeline Slots

the slots?

can be executed correctly

maximize the number executing in parallel

Dynamic Scheduling

COPYRIGHT 1998 MORGAN KAUFMANN PUBLISHERS, INC. ALL RIGHTS RESERVED

Docsity.com

Thread-level Parallelism (TLP)

– If you have multiple threads…

• by having multiple programs running, or

• writing a multithreaded application

– …you can get higher performance by running these

threads:

• On multiple processors, or

• On a machine that has multithreading support

– SMT – (AKA “Hyperthreading”)

• Conceptually these are very similar

– The hardware is very different

The Jigsaw Puzzle Analogy

Serial Computing

Suppose you want to do a jigsaw puzzle

that has, say, a thousand pieces.

We can imagine that it’ll take you a

certain amount of time. Let’s say

that you can put the puzzle together in

an hour.

The More the Merrier?

Now let’s put Bob and Charlie on the

other two sides of the table. Each of

you can work on a part of the puzzle,

but there’ll be a lot more contention

for the shared resource (the pile of

puzzle pieces) and a lot more

communication at the interfaces. So

you will get noticeably less than a

4-to-1 speedup, but you’ll still have

an improvement, maybe something

like 3-to-1: the four of you can get it

done in 20 minutes instead of an hour.

Diminishing Returns

If we now put Dave and Ed and Frank

and George on the corners of the

table, there’s going to be a whole lot

of contention for the shared resource,

and a lot of communication at the

many interfaces. So the speedup

you’ll get will be much less than we’d

like; you’ll be lucky to get 5-to-1.

So we can see that adding more and

more workers onto a shared resource

is eventually going to have a

diminishing return.

More Distributed Processors

It’s a lot easier to add

more processors in

distributed parallelism.

But, you always have to

be aware of the need to

decompose the problem

and to communicate

among the processors.

Also, as you add more

processors, it may be

harder to load balance

the amount of work that

each processor gets.

Load Balancing