Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Relational Algebra -Introduction to Database Systems - Exams, Exams of Introduction to Database Management Systems

Main points of this past exam are: Relational Algebra, Describing, Buildings, Employees Work, Queries, Employees Work, Paid Employee, Computing, Join Algorithms, Ignore Output

Typology: Exams

2012/2013

Uploaded on 04/02/2013

shalin_p01ic
shalin_p01ic 🇮🇳

4

(7)

86 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
UNIVERSITY OF CALIFORNIA
Department of EECS, Computer Science Division
CS186 Final Exam
May 16, 2000
Final Exam: Introduction to Database Systems
This exam has seven sections, each with one or more problems. Each problem may be made up of multiple
questions. You should read through the exam quickly and plan your time-management accordingly. Before
beginning to answer a question, be sure to read it carefully and to answer all parts of every question!
REFERENCE DATABASE . This is the Reference Database referred to in some of the questions.
There are six tables describing a company, describing employees, departments, buildings, which
department(s) an employee works in (and a percentage of the time for each), department managers
(possibly more than one per department), and in which building an employee works (an employee may
have more than one office). The primary key of each table is the attribute(s) in capitals. Other
attributes are not necessarily unique.
EMP – 100,000 tuples, 1,000 pages
EID EName Salary Start_Date End_Date
001 Jane $124,000 3/1/93 null
002 Jim $32,000 2/29/96 null
003 John $99,000 12/12/98 null
004 Joe $55,000 2/2/92 null
005 Jenny $51,000 5/5/95 null
EID values range from 1 to 100,000
BUILDING – 2,000 tuples, 10 pages
BID BName Address
201 ATC 1600 Ampitheatre
202 CCC 500 Crittenden
203 MFB 123 Shoreline
BID values range from 1 to 2,000
DEPT – 1,000 tuples, 5 pages
DID DName Annual_Budget
101 Research $1,001,000
102 Development $500,000
103 Sales $2,000,000
DID values range from 1 to 1000
IN_DEPT – 110,000 tuples, 550 pages
EID DID Percent_Time
001 101 100
002 102 100
003 101 60
003 102 40
004 103 100
005 103 100
IN_BUILDING – 110.000 tuples, 550 pages
EID BID
001 201
002 201
003 202
003 203
004 202
005 203
MANAGES_DEPT – 800 tuples, 4 pages
EID DID
003 101
003 102
001 103
You must write your answers on these stapled pages. You also must write your name at the top of every
page except this one, and you must turn in all the pages of the exam. You may remove this page from the
stapled exam, to serve as a reference, but do not remove any other pages from the stapled exam! Two
pages of extra answer space have been provided at the back in case you run out of space while answering.
If you run out of space, be sure to make a “forward reference” to the page number where your answer
continues.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Relational Algebra -Introduction to Database Systems - Exams and more Exams Introduction to Database Management Systems in PDF only on Docsity!

UNIVERSITY OF CALIFORNIA

Department of EECS, Computer Science Division CS186 Final Exam May 16, 2000

Final Exam: Introduction to Database Systems

This exam has seven sections, each with one or more problems. Each problem may be made up of multiple questions. You should read through the exam quickly and plan your time-management accordingly. Before beginning to answer a question, be sure to read it carefully and to answer all parts of every question!

REFERENCE DATABASE. This is the Reference Database referred to in some of the questions.

There are six tables describing a company, describing employees, departments, buildings, which department(s) an employee works in (and a percentage of the time for each), department managers (possibly more than one per department), and in which building an employee works (an employee may have more than one office). The primary key of each table is the attribute(s) in capitals. Other attributes are not necessarily unique.

EMP – 100,000 tuples, 1,000 pages EID EName Salary Start_Date End_Date 001 Jane $124,000 3/1/93 null 002 Jim $32,000 2/29/96 null 003 John $99,000 12/12/98 null 004 Joe $55,000 2/2/92 null 005 Jenny $51,000 5/5/95 null EID values range from 1 to 100,

BUILDING – 2,000 tuples, 10 pages BID BName Address 201 ATC 1600 Ampitheatre 202 CCC 500 Crittenden 203 MFB 123 Shoreline BID values range from 1 to 2,

DEPT – 1,000 tuples, 5 pages DID DName Annual_Budget 101 Research $1,001, 102 Development $500, 103 Sales $2,000, DID values range from 1 to 1000

IN_DEPT – 110,000 tuples, 550 pages EID DID Percent_Time 001 101 100 002 102 100 003 101 60 003 102 40 004 103 100 005 103 100

IN_BUILDING – 110.000 tuples, 550 pages EID BID 001 201 002 201 003 202 003 203 004 202 005 203

MANAGES_DEPT – 800 tuples, 4 pages EID DID 003 101 003 102 001 103

You must write your answers on these stapled pages. You also must write your name at the top of every page except this one , and you must turn in all the pages of the exam. You may remove this page from the stapled exam, to serve as a reference, but do not remove any other pages from the stapled exam! Two pages of extra answer space have been provided at the back in case you run out of space while answering. If you run out of space, be sure to make a “forward reference” to the page number where your answer continues.

CS186 March 6 Midterm Page 2

I. SQL – All queries are based on the sample schema shown on the first page. Assume

that the tables have many more rows than are shown there. 15 Points.

  1. Which of the following queries finds the names of buildings where more than 50 employees work? (Circle as many as are correct.) (5 points)

a. SELECT Bname FROM IN_BUILDING GROUP_BY BID WHERE Count(*) > 50

b. SELECT Bname FROM BUILDING WHERE BID IN (SELECT BID FROM In_Building GROUP BY BID HAVING Count(*) > 50)

c. SELECT Bname FROM Building B, In_Building I WHERE B.BID = I.BID GROUP BY B.BID HAVING Count(*) > 50

d. SELECT Bname FROM Building B WHERE 50 < (SELECT Count(*) FROM In_Building I WHERE I.BID = B.BID)

e. None of the above

  1. Which of the following queries finds the name of Departments where no employees work? (Circle as many as are correct.) (5 points)

a. SELECT Dname FROM Dept WHERE DID IN (SELECT I.DID FROM In_Dept I GROUP BY I.DID HAVING COUNT(*) = 0)

b. SELECT Dname FROM Dept D, In_Dept I, Emp E WHERE I.EID = E.EID and D.DID = I.DID and Count(E.EID) = 0

c. SELECT Dname FROM Dept WHERE DID NOT IN (SELECT DISTINCT DID FROM In_Dept I)

d. SELECT Dname FROM Dept D Where Not Exists (SELECT * FROM In_Dept I, EMP WHERE I.EID = EMP.EID and I.DID = D.DID)

e. None of the above

CS186 March 6 Midterm Page 4

II. Implementation of Relational Operators – 18 points

Consider the schema on the first page, and the number of tuples and pages for each

relation shown there. Let “×” be the join operator, and “A×B” means join with A as the

outer relation and B as the inner.

As we did in class, when computing the cost for join algorithms, you may ignore output

cost (since this is the same for all algorithms).

Note : you have 9 pages of main memory to work with in these problems.

  1. Consider the operation: σσσσ (^) (EID < 5000)EMP (2 points)

a) What is the I/O cost of this operation? _________

b) What is the reduction factor? ________

2. Consider the join: In_Dept ×××× Dept (4 points)

a) What is the I/O cost of this using Blocked Nested Loops? __________

b) What is the I/O cost of this using Index Nested Loops, with a Hash index on Dept.DID?


3. Consider the join: Dept ×××× In_Dept (4 points)

a) What is the I/O cost of this using Blocked Nested Loops? ____________

b) What is the I/O cost of this using Index Nested Loops, with a Hash index on In_Dept.DID?

__________________________

4. Consider the join : EMP ×××× In_Building ( 8 points)

a) What is the I/O cost of this using Blocked Nested Loops? ___________

b) What is the I/O cost to sort EMP? _________

c) What is the I/O cost to sort In_Building? _________

d) What is the total I/O cost to do this using Sort/Merge join? __________

CS186 March 6 Midterm Page 5

III. Query Optimization – 13 points

Consider the schema shown on the first page and especially the number of tuples and

pages for each relation.

Consider the following query:

Select Bname

From EMP E, Building B, In_Building I

Where E.EID < 500 and E.EID = I.EID and B.BID = I.BID

  1. Write this query in relational algebra. (3 points)
  2. If the database has an unclustered B-Tree index on EMP.EID, what is the best plan you can find to execute this query? Do your work on the additional pages at the back of the exam, and show the query plan here, including the costs for each step and the total cost. (10 points)

CS186 March 6 Midterm Page 7

V. Concurrency Control and Crash Recovery: LOCKING – 15 Points

Locking is the most popular concurrency control technique implemented by commercial database management systems.

  1. Consider a database that is read-only (i.e., no transactions change any data in the database, data may be loaded into the database when the database is off-line). Suppose serializability needs to be supported. Please circle all correct statements: (5 points)

a. No locking is necessary. b. Only read locks are necessary and they need to be held until end of transaction. c. Only read locks are necessary but they can be released as soon as the read is complete. d. Both read and write locks are necessary and locking must be done in two phases. e. None of the above.

Consider the following database schema:

STUDENT(name, sid, gpa, level, dept)

Suppose the following two transactions are executed concurrently:

T1: begin tran update STUDENT set gpa = 4.0 where dept = 'CS' commit tran

T2: begin tran insert into STUDENT values ('Mihut', 101, 3.9, 4, 'CS') insert into STUDENT values ('Sirish', 102, 3.9, 3, 'CS') commit tran

  1. Assume Mihut and Sirish were not in the STUDENT table before the start of T1 or T2. Suppose read locks are released immediately after the read is done and write locks are held until end of transaction. Can it ever happen that after both T1 and T2 have committed, Mihut and Sirish have different gpa values? Please state your reasoning in support of your conclusion. If your answer depends on locking granularity, access methods or indexing, please analyze the possibilities. (10 points)

CS186 March 6 Midterm Page 8

VI. Concurrency Control and Crash Recovery: WRITE-AHEAD LOGGING – 12

points

Write-ahead logging is the most popular recovery technique.

  1. Checkpoint is a technique that can reduce recovery time after a crash. Please circle the correct statements: (4 points)

a. After a soft crash (which does not affect data on hard drives), the log only needs to be scanned back until the last checkpoint is found. The log beyond the last checkpoint will not be read during the recovery process. b. Once a checkpoint is done, the log can be truncated. c. Checkpoint is automatically performed after every transaction commit. d. Checkpoints should be done after every update to the database. e. None of the above.

  1. This question deals with when updated data pages (dirty pages) must be written to disk. Please circle the correct statements: (4 points)

a. Updated pages must be written to disk immediately after the update. b. Dirty pages must be written to disk at transaction commit time but before the transaction log is written to disk. c. Dirty pages must be written to disk at transaction commit time but after the transaction log is written to disk. d. A dirty page must be written to disk when it is replaced from the buffer pool. e. None of the above.

  1. Since a database log can grow without limits, the log should be truncated at some point. Where can the log be truncated? (4 points)

CS186 March 6 Midterm Page 10

VIII. Extra Credit – 8 points

  • Broadbase: Data Marts & OLAP (2 points) - The presenter from BroadBase described how their database uses “Cubes”, pre-computed indexes of aggregate information. In 15 words or less, what aspect of the workload allows them to use “Cubes”?
  • Evite: Managing data at a Web Site (5 points) - The presenter from E-Vite expressed opinions on the following topics w.r.t. their DBMS. In 10 words or less, what were her opinions of: i. Primary Keys

ii. Integrity Constraints

iii. Blobs

iv. Indexes

E-vite uses the log for an interesting purpose not discussed in the book. In 15 words or less, what unusual thing do they do with the log?

  • MineSet: Data Mining (1 point) - Name a data mining algorithm mentioned in the guest lecture on Data Mining:

Score: Section I ____________/

Section II ____________/

Section III ____________/

Section IV ____________/

Section V ____________/

Section VI ____________/

Section VII ____________/

Section VIII ____________/8 (extra credit)

Total ____________/100 (108 With extra credit)

CS186 March 6 Midterm Page 11

Additional Space