Prepare for your exams
Get points
Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search Store documents

The best documents sold by students who completed their studies

Search through all study resources

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

University Rankings

Discover the best universities in your country according to Docsity users

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

From our blog

Exams and Study

Go to the blog

Computer architecture solutions[1] new, Exercises of Advanced Computer Architecture

Stevens Institute of Technology Advanced Computer Architecture

Solution book for Computer architecture 4.0

Typology: Exercises

2015/2016

Uploaded on 11/04/2016

Amit.Mistry 🇺🇸

4.7

(3)

1 document

1 / 163

This page cannot be seen from the preview

Don't miss anything!

Introduction 2

Chapter 1 Solutions 2

Chapter 2 Solutions 20

Chapter 3 Solutions 37

Chapter 4 Solutions 59

Chapter 5 Solutions 87

Chapter 6 Solutions 107

Chapter 7 Solutions 121

Chapter 8 Solutions 130

Appendix A Solutions 148

Partial preview of the text

Download Computer architecture solutions[1] new and more Exercises Advanced Computer Architecture in PDF only on Docsity!

Introduction
Chapter 1 Solutions
Chapter 2 Solutions
Chapter 3 Solutions
Chapter 4 Solutions
Chapter 5 Solutions
Chapter 6 Solutions
Chapter 7 Solutions
Chapter 8 Solutions
Appendix A Solutions

B

Solutions to All Exercises for

Instructors

Captain Kirk

You ought to sell an instruction and maintenance manual with

this thing.

Cyrano Jones

If I did, what would happen to man’s search for knowledge?

Star Trek

“The Trouble with Tribbles” (Dec. 29, 1967)

Chapter 1 Solutions 3

c. Assume that the relative disk volume scales linearly with disk capacity.

where years is the number of years forward from 1990. Then,

d. Actual component cost of the.

Cost of components other than the hard disk is.

e. Cost of hard disk is.

Assume disk density did improve 60% per year from 1990 through 1996 and

at 100% per year since 1997. Then by 2001 an improvement of only 30% per

year would have lead to a higher hard disk cost of

Adding this to the cost of the other components and scaling comp0onent cost

up to list price gives. At this

higher price desktop digital video editing would be much less widely accessi-

ble.

1.2 Let PV stand for percent vectorization divided by 100.

a. Plot for.

b. From the equation in (a), if then the percent vectorization

is or 56%.

c. or 11%.

d. From the equation in (a), if then or

e. The increased percent vectorization needed to match a hardware speedup of

applied to the original 70% vectorization is

Projected value =1990 value ×100 MB-------------------30 GB ×( 1 – 0.3)years

Mass 2002 = 1000 g × (^) 100 MB-------------------30 GB ×( 1 – 0.3)^12 =4152 g

Height 2002 =Volume 2002 ⁄Drive bay area =1000 g ×100 MB-------------------30 GB ×( 1 – 0.3)^12 =29.7 cm

$1000 PC = $1000 × 46.6%=$

$466 × 91%=$

$466 × 9% =$

$42 (^1 +60%)

( 1 +30%)^11

× -----------------------------------------------------------=$

PC cost = ($424 +$1258) ⁄ 46.6%=$

Net speedup PV^ ⁄^10

1 – PV PV

= ------------------------------- 0 ≤ PV ≤ 1

Net speedup = 2

Time in vector mode PV^ ⁄^10

1 – PV+PV------ 10 -

Net speedup = 10 ⁄ 2 = 5 PV = 8 ⁄ 9

10 × 2 = 20

1 – PV+PV------ 10 -

4 Solutions to All Exercises for Instructors

Solving shows that the vectorization must increase to 74%, not a large

increase. Improving the compiler to increase vectorization another 4% may

be easier and cheaper than improving the hardware by a factor of 2.

1.3 This question further explores the effects of Amdahl’s Law, but the data given in

the question is in a form that cannot be directly applied to the general speedup

formula.

a. Because the information given does not allow direct application of Amdahl’s

Law we start from the definition of speedup:

The unenhanced time is the sum of the time that does not benefit from the 10

times faster speedup plus the time that does benefit, but before its reduction

by the factor of 10. Thus,

Substituting into the equation for Speedup yields

b. Using Amdahl’s Law, the given value of 10 for the enhancement factor, and

the value for Speedupoverall from part (a), we have

Solving shows that the enhancement can be applied 91% of the original time.

1.4 a.

Thus,

Also,

n 8 16 32 64 128 256 512 1024 Speedup 2.7 4.0 6.4 10.7 18.2 32.0 56.9 102.

Speedup (^) overall Time (^) unenhanced =----------------------------------Time (^) enhanced-

Time (^) unenhanced = 50% Time (^) enhanced + 10 × 50% Timeenhanced=5.5 Time (^) enhanced

Speedup (^) overall

5.5Time (^) enhanced =-------------------------------------Time (^) enhanced-^ =5.

Fraction (^) enhanced 1 – Fraction (^) enhanced

Fraction (^) enhanced +------------------------------------- 10

Speedup =Number of floating-point instructions DFT------------------------------------------------------------------------------------------------------Number of floating-point instructions FFT-

n^2 = n ------------------- log 2 n -

n lim → ∞Speedup^ n lim →∞^ n

2 = (^) n ------------------- log 2 n - =∞

6 Solutions to All Exercises for Instructors

Relatively, there are 21 times more instructions executed by the embedded

processor.

b.

The MIPS rating of the embedded processor will be a factor of 10/6 = 1.

times higher than the rating of the RISC version.

c. The RISC processor performs the non-FP instructions plus 195,578 FP

instructions. The embedded processor performs the same number of non-FP

instructions as the RISC processor, but performs some larger number of

instructions than 195,578 to compute the FP results using non-FP instructions

only. The number of non-FP instructions is

Thus,

Finally,

1.8 Care in using consistent units and in expressing dies/wafer and good dies/wafer

as integer values are important for this exercise.

a. The number of good dies must be an integer and is less than or equal to the

number of dies per wafer, which must also be an integer. The result presented

here assumes that the integer dies per wafer is modified by wafer and die

yield to obtain the integer number of good dies.

Microprocessor Dies/wafer Good dies/wafer Alpha 21264C 231 128 Powe3-II 157 71 Itanium 79 20 MIPS R14000 122 46 UltraSPARC III 118 44

MIPS RISC

CC RISC

CPI RISC × 10 6

---------------------------------- CC

10 × 10 6

MIPS (^) emb

CC (^) emb CPI (^) emb × 10 6

-------------------------------- CC

6 × 10 6

Number of non-FP instructions = IC (^) RISC – 195578 =0.108 CC – 195578

Number instructions for FP (^) emb =IC (^) emb – Number non-FP instructions =2.27 CC – ( 0.108 CC – 195578 ) =2.16 CC + 195578

Average number instr. for FP in software (^) emb

Number instr. for FP (^) emb =---------------------------------------------------------Number FP instr. - 2.16 CC + 195578 =-------------------------------------------- 195578

Chapter 1 Solutions 7

b. The cost per good die is

c. The cost per good, tested, and packaged part is

d. The largest processor die is the Itanium at 300 mm^2. Defect density has a sub-

stantial effect on cost, pointing out the value of carefully managing the wafer

manufacturing process to maximize the number of defect-free die. The table

below restates die cost assuming the baseline defect density from parts (a)–

(c) and then for the lower and higher densities for this part.

e. For the Alpha 21264C, tested, packaged die costs for an assumed defect den-

sity of 0.8 per cm^2 and variation in parameter α from α = 4 to α = 6 are

$77.53 and $78.59, respectively.

1.9 a. Various answers are possible. Assume a wafer cost of $5000 and α = 4 in all

cases. For a defect density of 0.6 /cm^2 and die area ranging from 0.5 to 4 cm^2 ,

then die cost ranges from $4.93 to 118.56. Fitting a polynomial curve to the

(die area, die cost) pairs shows that a quadratic model has an acceptable norm

of the residuals value of 0.669. Fitting to a third degree polynomial yields a

very small cubic term coefficient and a better norm of the residuals of 0.017,

but the quadratic fit is good and the polynomial simpler, so that would be the

preferred choice.

Microprocessor $/ good die Alpha 21264C $36. Powe3-II $56. Itanium $245. MIPS R14000 $80. UltraSPARC III $118.

Microprocessor $ / good, tested, packaged die Alpha 21264C $64. Powe3-II $78. Itanium $268. MIPS R14000 $108. UltraSPARC III $152.

Itanium $ / good, tested, packaged die defect density = 0.5 $268. defect density = 0.3 $171. defect density = 1.0 $635.

Chapter 1 Solutions 9

Now

because the quotient of a nonnegative real number at a positive real number is

nonnegative. Thus, AM ≥ GM.

Now assume that AM = GM. Then,

Algebraic manipulation yields , which for positive integers implies

a = b. So AM = GM when a = b.

1.12 For positive integers r and s

and

Now

because the quotient of a nonnegative real number at a positive real number is

nonnegative. Thus, AM ≥ HM.

Now assume that AM = HM. Then,

Algebraic manipulation yields ( r–s )^2 = 0, which for positive integers implies

r = s. So AM = HM when r = s.

1.13 a. Let the data value sets be

A = {10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 1}

and

B = {1, 1, 1, 1, 1, 1, 1, 1, 1, 10^7 }

Arithmetic mean (A) = 9 × 106

Median (A) = 10 × 10 6

Arithmetic mean (B) = 1 × 106

Median (B) = 1

AM – GM a -----------^ + 2 b - – ab a -------------------------------^ –^2 2 ab + b - (^ a^ – b )

2 = = =--------------------------- 2 ≥ 0

a + b ----------- 2 - = ab

a – b = 0

Arithmetic mean = AM = r ----------^ + 2 s -

Harmonic mean HM 12 r^ ---^

s ---

AM – HM a -----------^ + 2 b - 12 r^ ---^

-- s -

------------ (^ r^ – s )

2 = = 2 ------------------ ( r + s )-^ ≥ 0

r + s ---------- 2 -

r^ ---^

-- s -

10 Solutions to All Exercises for Instructors

Set A mean and median are within 10% in value, but set B mean and median

are far apart. A large outlying value seriously distorts the arithmetic mean,

while a small outlying value has a lesser effect.

b. Harmonic mean (A) = 10.

Harmonic mean (B) = 1.

In this case the set B harmonic mean is very close to the median, but set A

harmonic mean is much smaller than the set A median. The harmonic mean is

more affected by a small outlying value than a large one.

c. Which is closest depends on the nature of the outlying data point. Neither

mean produces a statistic that is representative of the data values under all cir-

cumstances.

d. Let the new data sets be

C = {1, 1, 1, 1, 1, 1, 1, 1, 1, 2}

and

D = {10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 10^7 , 5 × 106 }

Then

Arithmetic mean (C) = 9.5 × 106

Harmonic mean (C) = 9.1 × 106

Median (C) = 10 × 106

and

Arithmetic mean (D) = 1.

Harmonic mean (D) = 1.

Median (D) = 1

In both cases, the means and medians are close. Summarizing a set of data

values that has less disparity among the values by stating a statistic, such as

mean or median, is intrinsically more meaningful.

1.14 a. For a set of n programs each taking Time i on one machine, the equal-time

weightings on that machine are

Applying this formula to the Reference Time data for the 14 benchmarks

yields the weights shown in Figure S.3.

w (^) i^1 Time i Time^1 j

j = 1

n × ∑

12 Solutions to All Exercises for Instructors

1.15 a. The first condition is that the measured time be accurate, precise, and exclu-

sively for the program of interest. Execution time is measured, typically,

using a clock that ignores what the computer is running. This might be a

clock on the wall or a free-running timer chip in the computer with an output

that can be read using a system call. If the computer can work on computa-

tional tasks other than the program of interest during the measurement inter-

val, then it is important to remove this other time from the run duration of the

program of interest. If we cannot account for this other time, then the perfor-

mance result derived from the measurement will be inaccurate, and may be of

little meaning.

If the program completes execution in an interval that is short compared to

the resolution of the timer, then the run time may be over- or under-stated

enough due to rounding to affect our understanding. This is a problem of

insufficient measurement precision, also known as having too few significant

digits in a measurement. When a more precise timer (for example, microsec-

Weighted time for each benchmark (seconds)

Benchmark

Reference computer Compaq IBM Intel 168.wupwise 134.4 29.3 43.8 34. 171.swim 134.4 12.5 59.2 33. 172.mgrid 134.4 25.6 47.3 54. 173.applu 134.4 34.8 43.2 55. 177.mesa 134.4 26.8 49.2 25. 178.galgel 134.4 30.2 35.4 45. 179.art 134.4 10.9 14.5 35. 183.equake 134.4 61.1 25.4 57. 187.facerec 134.4 19.8 62.5 45. 188.ammp 134.4 33.2 49.4 47. 189.lucas 134.4 21.0 51.5 43. 191.fma3d 134.4 28.5 44.1 47. 200.sixtrack 134.4 49.2 65.5 79. 301.apsi 134.4 30.2 46.0 38. Weighted arithmetic mean (seconds) 1881 413 637 643 SPECfp_base (geometric mean as percent) 100 500 313 304 Figure S.4 Weighted runtimes. The table entries for each benchmark show the time in seconds for that benchmark on a given computer. The summation of benchmark times gives the weighted arithmetic mean execution time of the benchmark suite. Note that with equal weighting of the benchmarks the three computers studied are ranked Com- paq, IBM, Intel from fastest (lowest time) to slowest, which is the same ranking seen in the SPECfp_base_2000 numbers, where the highest corresponds to fastest.

Chapter 1 Solutions 13

onds instead of milliseconds) is not available, the traditional solution is to

change the benchmark program input to yield a longer run time so that the

available timer resolution is then sufficiently precise. The goal is for the run

time to become long enough to require the desired number of significant dig-

its to express so that rounding will have an insignificant effect.

The condition that has to do with the program itself is what if the program

does not terminate. or does not terminate within the patience of the measurer?

How long then is the execution time? How should run time be defined?

b. Throughput is a consistent and reliable measure of performance if a consis-

tent, meaningful unit of work can be defined. Consider a web server that

sends a single, fixed page in response to requests. Each request then triggers a

computational task, transfer identical web page description language to each

new requesting computer, that is essentially identical each time. Throughput

in terms of pages served per unit time would then be inversely proportional to

the time to perform what is essentially a fixed benchmark task, serve this

page. This is the same concept involved in measuring the time to run a fixed

SPEC benchmark with its given code and given input data set. So throughput

of fixed tasks is directly comparable to running fixed benchmarks.

When the task performed changes each time, for example very different

pages served for each new request, then the use of throughput becomes more

difficult. If an aggregate of tasks with consistent character exists, then

throughput measured over a time interval that encompasses the collection of

tasks may be sufficiently consistent and reliable. It may be difficult to identify

such a task collection or to restrict the processing performed to just that col-

lection.

c. With overlapped work, single transaction time will understate the amount of

work, measured in units of number of transactions completed, that the com-

puter can perform per unit time. Throughput will not understate performance

in this way.

1.16 a. Amdahl’s Law can be generalized to handle multiple enhancements. If only

one enhancement can be used at a time during program execution, then

where FE i is the fraction of time that enhancement i can be used and SE i is

the speedup of enhancement i. For a single enhancement the equation reduces

to the familiar form of Amdahl’s Law.

With three enhancements we have

Speedup 1 FE (^) i i

(^) ∑ FE (^) i = + ∑ i SE -------- (^) i
1

Speedup 1 – (FE 1 + FE 2 +FE 3 )

FE 1

SE-------- 1 -^

FE 2

SE-------- 2 -^

FE 3

= + + +SE-------- 3 -

Chapter 1 Solutions 15

Thus, if only one enhancement can be implemented, enhancement 3 offers

much greater speedup.

Thus, if only a pair of enhancements can be implemented, enhancements 1

and 3 offer the greatest speedup.

Selecting the fastest enhancement(s) may not yield the highest speedup. As

Amdahl’s Law states, an enhancement contributes to speedup only for the

fraction of time that it can be used.

1.17 a.

b.

c.

d.

e. The time for the processor alone is W = 4 sec. The time for the processor/co-

processor configuration is B = 1.1 sec. While its MIPS rating is lower, the

faster execution time belongs to the processor/co-processor combination.

Your colleague’s evaluation is correct.

Speedup (^12 1) – (0.15+ 0.15 ) 0. --------- 30 -

1 =1.

Speedup (^13 1) – (0.15+ 0.7 ) 0. 30

1 =4.

Speedup (^23 1) – (0.15+ 0.7 ) 0. --------- 20 -

1 =4.

MIPS (^) proc = 120 × 10 6 = I ---------------^ + WYF -

MIPS (^) proc/co = 80 × 10 6 = I -----------^ + BF -

I = 120 × 106 W – FY =( 120 × 10 6 ) ( 4 )–( 8 × 10 6 ) ( 50 ) = 80 × 10 6 instructions

B^80

× 6 + 8 × 10 6

80 × 10 6

= --------------------------------------------- =1.1 sec

MFLOPS (^) proc/co= B ----------------------------------------------------------------------------- – Time for integer instructions^ F - F = B ------------------------------------------- – I ⁄MIPS (^) proc/co-

8 × 10 6 1.1 – 80 × 10 6 ⁄ 80 × 10 6

=80 MFLOPS

16 Solutions to All Exercises for Instructors

1.18 a.

Because one of the two measured values (time) is reported with only three sig-

nificant digits, the answer should be stated to three significant digits precision.

b. There are four 171.swim operations that are not explicitly given normalized

values: load, store, copy, and convert. Let’s think through what normalized

values to use for these instructions.

First, convert comprises only 0.006% of the FP operations. Thus, convert

would have to correspond to about 1000 normalized FP operations to have

any effect on MFLOPS reported with three significant digits. It seems

unlikely that convert would be this much more time-consuming than expo-

nentiation or a trig function. Any less and there is no effect. So let’s apply an

important principle—keep models simple—and model convert as one nor-

malized FP operation.

Next, copy replicates a value, making it available at a second location. This

same behavior can be produced by adding zero to a value and saving the

result in a new location. So, reasonably, copy should have the same normal-

ized FP count as add.

Finally, load and store interact with computer memory. They can be quick to

the extent that the memory responds quickly to an access request, unlike

divide, square root, exponentiation, and sin, which are computed using a

series of approximation steps to reach an answer. Because load and store are

very common, Amdahl’s Law suggests making them fast. So assume a nor-

malized FP value of 1 for load and store. Note that any increase would signif-

icantly affect the result.

With the above normalized FP operations model, we have

1.19 No solution provided.

1.20 No solution provided.

1.21 a. No solution provided.

b. The steps of the word-processing workload and their nature are as follows.

1. Load the word-processing program and the document file. [Disk and

memory system.]

MFLOPS (^) native Number of floating-point operations Execution time in seconds × 10 6

287 × 10 6

MFLOPS (^) normalized Normalized number of floating-point operations Execution time in seconds × 10 6

18 Solutions to All Exercises for Instructors

1.24 Figure 1.10 shows the addition of three costs to that of the circuitry components,

which determines system list price. System power and volume increases that are

the unavoidable consequence of a CPU component power consumption increase

can be identified in an analogous way. First, consider the effect of CPU power.

An additional watt of CPU power consumption requires an additional watt of

power supply capacity. Because a power supply is not 100% efficient, the power

input to the supply circuitry must increase by more than 1 watt and the waste

energy of conversion, appearing as heat in the components of the supply, will

increase. This input power increase of greater than 1 watt is modeled much as the

direct costs increase shown in Figure 1.10.

At some level of power delivery to the system, the power supply components will

become hotter than their rated maximum operating temperatures if only convec-

tion cooling is available. While several active cooling technologies are available,

the least expensive is forced air. This requires addition of a power supply fan and

the power to run it, with the typical small, rotating fan using about 1 watt of

power. This additional power requirement can be modeled analogously to that

gross margin of Figure 1.10.

Finally, with increasing CPU power consumption the chip will eventually

become too hot without a substantial heat sink and, perhaps, a dedicated fan to

assure high airflow for the CPU. (The fan may be designed to run only when the

CPU temperature is particularly high, as is the case for many laptop computers).

This final “power tax” of a CPU fan for the more power-hungry CPU is modeled

as the average discount component of list price (see Figure 1.10).

System volume is affected by the need to house all system components and, for

air cooling, to provide adequate paths for airflow. The volume model for CPU

Figure S.5 Graph showing relative performance of three processors (normalized to Pentium). The speedup and speedup per watt are quite different.

Computer architecture solutions[1] new, Exercises of Advanced Computer Architecture

Related documents

Partial preview of the text

Download Computer architecture solutions[1] new and more Exercises Advanced Computer Architecture in PDF only on Docsity!

B

Solutions to All Exercises for

Instructors

Captain Kirk

You ought to sell an instruction and maintenance manual with

this thing.

Cyrano Jones

If I did, what would happen to man’s search for knowledge?

Star Trek

“The Trouble with Tribbles” (Dec. 29, 1967)

Chapter 1 Solutions 3

c. Assume that the relative disk volume scales linearly with disk capacity.

where years is the number of years forward from 1990. Then,

d. Actual component cost of the.

Cost of components other than the hard disk is.

e. Cost of hard disk is.

Assume disk density did improve 60% per year from 1990 through 1996 and

at 100% per year since 1997. Then by 2001 an improvement of only 30% per

year would have lead to a higher hard disk cost of

Adding this to the cost of the other components and scaling comp0onent cost

up to list price gives. At this

higher price desktop digital video editing would be much less widely accessi-

ble.

1.2 Let PV stand for percent vectorization divided by 100.

a. Plot for.

b. From the equation in (a), if then the percent vectorization

is or 56%.

c. or 11%.

d. From the equation in (a), if then or

e. The increased percent vectorization needed to match a hardware speedup of

applied to the original 70% vectorization is

$1000 PC = $1000 × 46.6%=$

$466 × 91%=$

$466 × 9% =$

$42 (^1 +60%)

( 1 +30%)^11

× -----------------------------------------------------------=$

PC cost = ($424 +$1258) ⁄ 46.6%=$

Net speedup PV^ ⁄^10

1 – PV PV

= ------------------------------- 0 ≤ PV ≤ 1

Net speedup = 2

Time in vector mode PV^ ⁄^10

1 – PV+PV------ 10 -

Net speedup = 10 ⁄ 2 = 5 PV = 8 ⁄ 9

10 × 2 = 20

1 – PV+PV------ 10 -

4 Solutions to All Exercises for Instructors

Solving shows that the vectorization must increase to 74%, not a large

increase. Improving the compiler to increase vectorization another 4% may

be easier and cheaper than improving the hardware by a factor of 2.

1.3 This question further explores the effects of Amdahl’s Law, but the data given in

the question is in a form that cannot be directly applied to the general speedup

formula.

a. Because the information given does not allow direct application of Amdahl’s

Law we start from the definition of speedup:

The unenhanced time is the sum of the time that does not benefit from the 10

times faster speedup plus the time that does benefit, but before its reduction

by the factor of 10. Thus,

Substituting into the equation for Speedup yields

b. Using Amdahl’s Law, the given value of 10 for the enhancement factor, and

the value for Speedupoverall from part (a), we have

Solving shows that the enhancement can be applied 91% of the original time.

1.4 a.

Thus,

Also,

6 Solutions to All Exercises for Instructors

Relatively, there are 21 times more instructions executed by the embedded

processor.

b.

The MIPS rating of the embedded processor will be a factor of 10/6 = 1.

times higher than the rating of the RISC version.

c. The RISC processor performs the non-FP instructions plus 195,578 FP

instructions. The embedded processor performs the same number of non-FP

instructions as the RISC processor, but performs some larger number of

instructions than 195,578 to compute the FP results using non-FP instructions