Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Chapter 11 Database Performance Tuning and Query Optimization Comprehensive Exam Study Gui, Exams of Information Technology

Chapter 11 Database Performance Tuning and Query Optimization Comprehensive Exam Study Guide Latest Updated 2024/2025

Typology: Exams

2023/2024

Available from 10/30/2024

TUTOR2025
TUTOR2025 🇬🇧

3

(4)

790 documents

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
?
Chapter 11 Database
Performance Tuning and
Query Optimization
Comprehensive Exam Study
Guide Latest Updated
2024/2025
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download Chapter 11 Database Performance Tuning and Query Optimization Comprehensive Exam Study Gui and more Exams Information Technology in PDF only on Docsity!

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Chapter 11

Database Performance Tuning and Query Optimization

Discussion Focus

This chapter focuses on the factors that directly affect database performance. Because performance- tuning techniques can be DBMS-specific, the material in this chapter may not be applicable under all circumstances, nor will it necessarily pertain to all DBMS types.

This chapter is designed to build a foundation for the general understanding of database performance- tuning issues and to help you choose appropriate performance-tuning strategies. (For the most current information about tuning your database, consult the vendor’s documentation.)

 Start by covering the basic database performance-tuning concepts. Encourage students to use the web to search for information about the internal architecture (internal process and database storage formats) of various database systems. Focus on the similarities to lay a common foundation.  Explain how a DBMS processes SQL queries in general terms and stress the importance of indexes in query processing. Emphasize the generation of database statistics for optimum query processing.  Step through the query processing example in section 11-4, Optimizer Choices.  Discuss the common practices used to write more efficient SQL code. Emphasize that some practices are DBMS-specific. As technology advances, the query optimization logic becomes increasingly sophisticated and effective. Therefore, some of the SQL practices illustrated in this chapter may not improve query performance as dramatically as it does in older systems.  Finally, illustrate the chapter material using the query optimization example in section 11-8.

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Chapter 11

Database Performance Tuning and Query Optimization

Answers to Review Questions

1. What is SQL performance tuning?

SQL performance tuning describes a process – on the client side – that will generate an SQL query to return the correct answer in the least amount of time, using the minimum amount of resources at the server end.

2. What is database performance tuning?

DBMS performance tuning describes a process – on the server side – that will properly configure the DBMS environment to respond to clients’ requests in the fastest way possible, while making optimum use of existing resources.

3. What is the focus of most performance tuning activities, and why does that focus exist?

Most performance-tuning activities focus on minimizing the number of I/O operations, because the I/O operations are much slower than reading data from the data cache.

4. What are database statistics, and why are they important?

The term database statistics refers to a number of measurements gathered by the DBMS to describe a snapshot of the database objects’ characteristics. The DBMS gathers statistics about objects such as tables, indexes, and available resources -- such as number of processors used, processor speed, temporary space available, and so on. Such statistics are used to make critical decisions about improving the query processing efficiency.

5. How are database statistics obtained?

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Database statistics can be gathered manually by the DBA or automatically by the DBMS. For example, many DBMS vendors support the SQL’s ANALYZE command to gather statistics. In addition, many vendors have their own routines to gather statistics. For example, IBM’s DB2 uses the RUNSTATS procedure, while Microsoft’s SQL Server uses the UPDATE STATISTICS procedure and provides the Auto-Update and Auto-Create Statistics options in its initialization parameters.

6. What database statistics measurements are typical of tables, indexes, and resources?

For tables, typical measurements include the number of rows, the number of disk blocks used, row length, the number of columns in each row, the number of distinct values in each column, the maximum value in each column, the minimum value in each column, what columns have indexes, and so on.

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

database. These factors include the network, the client’s computer resources, and even other queries running concurrently in the same database.)

After the parsing and execution phases are completed, all rows that match the specified condition(s) have been retrieved, sorted, grouped, and/or – if required – aggregated. During the fetching phase, the rows of the resulting query result set are returned to the client. During this phase, the DBMS may use temporary table space to store temporary data.

9. If indexes are so important, why not index every column in every table? (Include a brief discussion of the role played by data sparsity.)

Indexing every column in every table will tax the DBMS too much in terms of index-maintenance processing, especially if the table has many attributes, many rows, and/or requires many inserts, updates, and/or deletes.

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Chapter 11

Database Performance Tuning and Query Optimization

One measure to determine the need for an index is the data sparsity of the column you want to index. Data sparsity refers to the number of different values a column could possibly have. For example, a STU_SEX column in a STUDENT table can have only two possible values, “M” or “F”; therefore this column is said to have low sparsity. In contrast, the STU_DOB column that stores the student date of birth can have many different date values; therefore, this column is said to have high sparsity. Knowing the sparsity helps you decide whether or not the use of an index is appropriate. For example, when you perform a search in a column with low sparsity, you are very likely to read a high percentage of the table rows anyway; therefore index processing may be unnecessary work.

10. What is the difference between a rule-based optimizer and a cost-based optimizer?

A rule-based optimizer uses a set of preset rules and points to determine the best approach to execute a query. The rules assign a “cost” to each SQL operation; the costs are then added to yield the cost of the execution plan.

A cost-based optimizer uses sophisticated algorithms based on the statistics about the objects being accessed to determine the best approach to execute a query. In this case, the optimizer process adds up the processing cost, the I/O costs and the resource costs (RAM and temporary space) to come up with the total cost of a given execution plan.

11. What are optimizer hints and how are they used?

Hints are special instructions for the optimizer that are embedded inside the SQL command text. Although the optimizer generally performs very well under most circumstances, there are some circumstances in which the optimizer may not choose the best execution plan. Remember, the optimizer makes decisions based on the existing statistics. If the statistics are old, the optimizer may not do a good job in selecting the best execution plan. Even with the current statistics, the optimizer choice may not be the most efficient one. There are some occasions when the end-user would like to

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Chapter 11

Database Performance Tuning and Query Optimization

table scan a viable option. Using the same logic, do not create indexes for tables with few rows and few attributes— unless you must ensure the existence of unique values in a column.

Declare primary and foreign keys so the optimizer can use the indexes in join operations. All natural joins and old-style joins will benefit if you declare primary keys and foreign keys because the optimizer will use the available indexes at join time. (The declaration of a PK or FK will automatically create an index for the declared column. Also, for the same reason, it is better to write joins using the SQL JOIN syntax. (See Chapter 8, “Advanced SQL.”)

Declare indexes in join columns other than PK/FK. If you do join operations on columns other than the primary and foreign key, you may be better off declaring indexes in such columns.

13. Most query optimization techniques are designed to make the optimizer’s work easier. What factors should you keep in mind if you intend to write conditional expressions in SQL code?

Use simple columns or literals as operands in a conditional expression—avoid the use of conditional expressions with functions whenever possible. Comparing the contents of a single column to a literal is faster than comparing to expressions.

Numeric field comparisons are faster than character, date, and NULL comparisons. In search conditions, comparing a numeric attribute to a numeric literal is faster than comparing a character attribute to a character literal. In general, numeric comparisons (integer, decimal) are handled faster by the CPU than character and date comparisons. Because indexes do not store references to null values, NULL conditions involve additional processing and therefore tend to be the slowest of all conditional operands.

Equality comparisons are faster than inequality comparisons. As a general rule, equality comparisons are processed faster than inequality comparisons. For example, P_PRICE = 10.00 is

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

processed faster because the DBMS can do a direct search using the index in the column. If there are no exact matches, the condition is evaluated as false. However, if you use an inequality symbol (>,

=, <, <=) the DBMS must perform additional processing to complete the request. This is because there would almost always be more “greater than” or “less than” values and perhaps only a few exactly “equal” values in the index. The slowest (with the exception of NULL) of all comparison operators is LIKE with wildcard symbols, such as in V_CONTACT LIKE “%glo%”. Also, using the “not equal” symbol (<>) yields slower searches, especially if the sparsity of the data is high; that is, if there are many more different values than there are equal values.

Whenever possible, transform conditional expressions to use literals. For example, if your condition is P_PRICE -10 = 7, change it to read P_PRICE = 17. Also, if you have a composite condition such as: P_QOH < P_MIN AND P_MIN = P_REORDER AND P_QOH = 10 change it to read: P_QOH = 10 AND P_MIN = P_REORDER AND P_MIN > 10

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Third, evaluate the creation of indexes based on the access patterns. Identify common search criteria and isolate the most frequently used columns in search conditions. Create indexes on high usage columns with high sparsity.

Fourth, evaluate the usage of aggregate queries in your database. Identify columns used in aggregate functions and determine if the creation of indexes on such columns will improve response time.

Finally, identify columns used in ORDER BY statements and make sure there are indexes on such columns.

15. What does RAID stand for, and what are some commonly used RAID levels?

RAID is the acronym for R edundant A rray of I ndependent D isks. RAID is used to provide balance between performance and fault tolerance. RAID systems use multiple disks to create virtual disks (storage volumes) formed by several individual disks. RAID systems provide performance

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Chapter 11

Database Performance Tuning and Query Optimization

improvement and fault tolerance. Table 11.7 in the text shows the commonly used RAID levels. (We have reproduced the table for your convenience.)

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Chapter 11

Database Performance Tuning and Query Optimization

TABLE 11.7 Common RAID Configurations RAID Level Description 0 The data blocks are spread over separate drives. Also known as striped array. Provides increased performance but no fault tolerance. Fault tolerance means that in case of failure, data could be reconstructed and retrieved. Requires a minimum of two drives. 1 The same data blocks are written (duplicated) to separate drives. Also referred to as mirroring or duplexing. Provides increased read performance and fault tolerance via data redundancy. Requires a minimum of two drives. 3 The data are striped across separate drives, and parity data are computed and stored in a dedicated drive. Parity data are specially generated data that permit the reconstruction of corrupted or missing data. Provides good read performance and fault tolerance via parity data. Requires a minimum of three drives. 5 The^ data^ and^ the^ parity^ are^ striped^ across^ separate^ drives.^ Provides^ good^ read performance and fault tolerance via parity data. Requires a minimum of three drives.

Problem Solutions

Problems 1 and 2 are based on the following query:

SELECT EMP_LNAME, EMP_FNAME, EMP_AREACODE, EMP_SEX FROM EMPLOYEE WHERE EMP_SEX = ‘F’ AND EMP_AREACODE = ‘615’ ORDER BY EMP_LNAME, EMP_FNAME;

1. What is the likely data sparsity of the EMP_SEX column?

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Because this column has only two possible values (“M” and “F”), the EMP_SEX column has low sparsity.

2. What indexes should you create? Write the required SQL commands.

You should create an index in EMP_AREACODE and a composite index on EMP_LNAME, EMP_FNAME. In the following solution, we have named the two indexes EMP_NDX1 and EMP_NDX2, respectively. The required SQL commands are:

CREATE INDEX EMP_NDX1 ON EMPLOYEE(EMP_AREACODE); CREATE INDEX EMP_NDX2 ON EMPLOYEE(EMP_LNAME, EMP_FNAME);

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

After the EMPLOYEE RowIDs have been retrieved, the DBMS uses those RowIDs to get the EMPLOYEE rows. Next, the DBMS selects only those rows with SEX = ‘F.’ Finally, the DBMS sorts the result set by employee last name and first name.

Problems 4- 6are based on the following query:

SELECT EMP_LNAME, EMP_FNAME, EMP_DOB, YEAR(EMP_DOB) AS YEAR FROM EMPLOYEE WHERE YEAR(EMP_DOB) = 1966;

4. What is the likely data sparsity of the EMP_DOB column?

Because the EMP_DOB column stores employee’s birthdays, this column is very likely to have high data sparsity.

Performance Tuning and

Query Optimization

Comprehensive Exam Study

Guide Latest Updated

Chapter 11

Database Performance Tuning and Query Optimization

5. Should you create an index on EMP_DOB? Why or why not?

Creating an index in the EMP_DOB column would not help this query, because the query uses the YEAR function. However, if the same column is used for other queries, you may want to re-evaluate the decision not to create the index.

6. What type of database I/O operations will likely be used by the query? (See Table 11.3.)

This query more than likely uses a full table scan to read all rows of the EMPLYEE table and generate the required output. We have reproduced the table here to facilitate your discussion:

TABLE 11.3 Sample DBMS Access Plan I/O Operations Operation Description Table Scan (Full) Reads the entire table sequentially, from the first row to the last row, one row at a time (slowest) Table Access (Row ID) Reads a table row directly, using the row ID value (fastest) Index Scan (Range) Reads the index first to obtain the row IDs and then accesses the table rows directly (faster than a full table scan) Index Access (Unique) Used when a table has a unique index in a column Nested Loop Reads and compares a set of values to another set of values, using a nested loop style (slow) Merge Merges two data sets (slow) Sort Sorts a data set (slow)