Query Processing
Query Processing
Query processing is defined as the group of phases associated with the extraction of data from the database. It includes conversion of queries written in a high-level language such as SQL into a form that can be understood by the physical level implementation of the database, SQL query optimization techniques, and the original evaluation of the query.
There are three major steps involved in query processing:
Parser and translator: The first step in query processing is parsing and translation. Parser just like a parser in compilers checks the syntax of the query whether the relations mentioned are present in the database or not. A high-level query language such as SQL is suitable for human use. But, it is totally unsuitable to system internal representation. Therefore, translation is required. The internal representation can be extended form of relational algebra.
Optimization: A SQL query can be written in many different ways. An optimized query also depends on how the data is stored in the file organization. A Query can also have different corresponding relational algebra expressions.
So, the above query can be written in the two forms of relational algebra. So it totally depends on the implementation of the file system which one is better.
Execution plan: A systematic step-by-step execution of primitive operations for fetching data from the database is termed a query evaluation plan. Different evaluation plans for a particular query have different query costs. The cost may include the number of disk accesses, CPU time for execution of the query, time of communication in the case of distributed databases.
Order of a SQL Query
The way to make a query run faster is to reduce the number of calculations that the software must perform. To do this, you’ll need some understanding of how SQL executes a query.
Let’s take a look at a sample SQL query :
SELECT DISTINCT column, AGGREGATE(column)
FROM table1
JOIN table2
ON table1.column = table2.column
WHERE constraint_expression
GROUP BY column
HAVING constraint_expression
ORDER BY column ASC/DESC
LIMIT count;
Each part of the query is executed sequentially, so it’s important to understand the order of execution :
FROM
andJOIN
: The FR`OM clause, and subsequent JOINs are first executed to determine the total working set of data that is being queriedWHERE
: Once we have the total working set of data, theWHERE
constraints are applied to the individual rows, and rows that do not satisfy the constraint are discarded.GROUP BY
: The remaining rows after theWHERE
constraints are applied are then grouped based on common values in the column specified in theGROUP BY
clause.HAVING
: If the query has aGROUP BY
clause, then the constraints in theHAVING
clause are applied to the grouped rows, and the grouped rows that don’t satisfy the constraint are discarded.SELECT
: Any expressions in theSELECT
part of the query are finally computed.DISTINCT
: Of the remaining rows, rows with duplicate values in the column marked asDISTINCT
will be discarded.ORDER BY
: If an order is specified by theORDER BY
clause, the rows are then sorted by the specified data in either ascending or descending order.LIMIT
: Finally, the rows that fall outside the range specified by theLIMIT
are discarded, leaving the final set of rows to be returned from the query.
Query Optimizer
SQL is a declarative language. This means that SQL queries describe what the user wants and then the query is transformed into executable commands by the Query Optimizer. Those executable commands are known as Query Plans.
The Query Optimizer generates multiple Query Plans for a single query and determines the most efficient plan to run.
There are often many different ways to search a database. Take for example the following database of tools that has five entries. Each entry has a unique ID number and a non-unique name.
In order to find a particular tool, there are several possible queries that could be run. For example, the query:
SELECT * FROM tools WHERE name='Screwdriver';
Will return the same thing as the query:
SELECT * FROM tools WHERE id=3;
Purpose of SQL Query Optimization
The major purposes of SQL Query optimization are:
- Reduce Response Time: The major goal is to enhance performance by reducing the response time. The time difference between users requesting data and getting responses should be minimized for a better user experience.
- Reduced CPU execution time: The CPU execution time of a query must be reduced so that faster results can be obtained.
- Improved Throughput: The number of resources to be accessed to fetch all necessary data should be minimized. The number of rows to be fetched in a particular query should be in the most efficient manner such that the least number of resources are used.
- Query plans are a set of instructions generated by the Query Optimizer. They generate estimates of query efficiency.
- The Query Optimizer generates multiple query plans and determines which plan is most efficient for a given query.
Scan Vs. Seek
These queries will return the same results, but may have different final query plans. The first query will have a query plan that uses a sequential scan. This means that all five rows of the database will be checked to see if the name is screwdriver and, when run, would look like the following table:
(GREEN = MATCH) (RED = MISS) (WHITE = NOT CHECKED)
The second query will use a query plan which implements a sequential seek since the second query handles unique values. Like a scan, a seek will go through each entry and check to see if the condition is met. However unlike a scan, a seek will stop once a matching entry has been found. A seek for ID = 3 would look like the following figure:
(GREEN = MATCH) (RED = MISS) (WHITE = NOT CHECKED)
This seek only needs to check three rows in order to return the result unlike a scan which must check the entire database.
For more complicated queries there may be situations in which one query plan implements a seek while the other implements a scan. In this case, the query optimizer will choose the query plan that implements a seek, since seeks are more efficient than scans. There are also different types of scans that have different efficiencies in different situations.
- Scans search the whole database for matches.
- Seeks search the database for a single match and stop once they have found it.