There is an excellent page on Allen Browne's website devoted to various methods of improving query performance: http://allenbrowne.com/QueryPerfIssue.html
I thought it would be helpful to others to illustrate his suggestions by performing a series of speed tests showing the effect of each suggested change.
To reduce file size, the queries are based on cut down versions of 3 tables from the DEMO version of my School Data Analyser application.
All data is for fictitious students in a fictitious school.
The aim of the query is to get the count of each type of pastoral incident recorded for each student in the year 2018.
The query is also filtered to those students whose date of birth was in the year 2005.
Click any image to view a larger version ...
There are 11 versions of the query with varying amounts of optimisation starting with a (deliberately) badly designed query and ending with the most optimised.
All queries return the same records (total = 882) but the times should get progressively faster each time (except for the final stacked queries test).
Each test is run several times to reduce natural variations caused by other processes that may be running in the background.
The total time recorded is for the set number of loops. By default, the number of loops = 3
The fields used in each table to filter and sort the data are indexed to speed up searches:
The indexed fields are Surname, Forename, DateOfBirth, DateOfIncident
The average times recorded after running each set of tests 20 times was as follows:
8. Optimise Queries Updated 05/03/2019
The times taken to loop through each query 3 times improved significantly from over 27 s originally down to about 0.49 s
This means it is running over 50 times faster
The first query uses an outer join between 2 tables (PupilData / PRecords) and a DLookup value from the third table (PRCodes).
It took over 27 s to do 3 loops – VERY SLOW
Running domain functions such as DLookup in a query is VERY SLOW as the operation must be performed in turn on each row in the query.
It also wastes resources as additional connections have to be made to the data file.
The query execution plan involves a huge number of steps as each record is checked in turn
In this case, the domain function is totally unnecessary as the same result can be obtained using a second join
In this second query, the outer join goes from the many side of the main PRecords table : PRCodes -> PRecords -> PupilData
Although the join direction is not the best choice, the time taken is dramatically reduced to about 1.55 s.
In the third query, the direction of the joins is reversed (one to many): PupilData -> PRecords -> PRCodes.
This is a more efficient process for Access to manage and the time drops again to about 1.27 s.
This example has been deliberately designed so that using inner joins will get exactly the same records.
It always makes sense to use inner joins wherever possible as the constraints limit the searching required
Doing so further reduces the work required from the database engine and the time drops to just under 0.9 s.
All the remaining queries are based on inner joins.
Until now, all the aggregate totals have been based on the VBA Nz function: Count(Nz([PastoralRecordID],0)).
The Nz() function replaces Null with another value (usually a zero for numbers, or a zero-length string for text).
The new value is a Variant data type, and VBA tags it with a subtype: String, Long, Double, Date, or whatever.
This will affect the sort order and can lead to incorrect results in some situations
The fifth query replaces the VBA Nz function with the use of the JET IIf function: IIf(Count([PastoralRecordID]) Is Null,0,Count([PastoralRecordID]))
This has several advantages including avoiding an unnecessary VBA function call.
In addition, the correct data type is retained (in this case, integer) so the column sorts correctly.
This shaves another 0.03 s off the time which has now become about 0.87 s.
However, by using inner joins as in this example, a simple count will achieve the same results
Although, the expression is simpler, the overall time is only slightly less than before – approximately 0.86 s.
All the above queries were sorted by a concatenated expression: [Surname] & ", " & [Forename]
Doing so, prevents the database engine making use of the indexes to perform the sort.
The next query fixes that, sorting by the two indexed fields: Surname and Forename.
Doing so, further reduces the time required to about 0.77 s – almost 0.1 s faster
Whilst the query is now running well, further improvements can still be made.
Aggregate queries (those with a GROUP BY clause) can have both a WHERE clause and a HAVING clause.
The WHERE is executed first - before aggregation; the HAVING is executed afterwards - when the counts have been calculated.
Therefore, in some cases (though not always), it can be faster to use WHERE
The next query changes the HAVING clause to WHERE and the time drops to 0.65 s (another 0.12 s faster)
See this separate article for detailed speed tests based on HAVING vs WHERE
However, although the WHERE clause looks simple to run, it is not using the indexing of the two date fields.
A better result is obtained using the indexes by indicating a range of values for each of the date fields:
WHERE (((PupilData.DateOfBirth) Between #1/1/2005# And #12/31/2005#)
AND ((PRecords.DateOfIncident) Between #1/1/2018# And #12/30/2018#))
This further reduces the time by another 0.1 s down to about 0.55 s for 3 loops. Every little helps!
All the above queries have used the default arrangement, grouping all fields from each table that are not being used for aggregation.
However the PupilID field is the unique primary key field in the PupilData table.
There is no need to group by other fields in that table. Instead optimise the query by choosing First instead of Group By in the Total row under the other fields.
Similarly for the other fields not required for the grouping in the other 2 tables.
This results in a further significant reduction in time to less than 0.5 s. The end result is now more than 50 times faster than the original 27.4 s!
Using First allows the database engine to return the value from the first matching record, without needing to group by the field.
In the query above I have used aliases for the fields now based on First.
Allen Browne also points out another benefit if you are grouping by Memo / Long Text fields:
If you GROUP BY a memo (Notes in the example), Access compares only the first 255 characters, and the rest are truncated!
By choosing First instead of Group By, JET is free to return the entire memo field from the first match.
So not only is it more efficient; it actually solves the problem of memo fields being chopped off.
Both stacked queries and subqueries are often useful in Access though both can be slower than using a single query where that is achievable.
As a final test, I also created a stacked query version of test J.
The first query qryStacked1 filters the records in PupilData and PRecords for the required date ranges
The second query qryStacked2 is an aggregate query based on that
The average time for 20 tests was 0.492 s – about 0.004 s FASTER than the single query equivalent in test J.
However, the fastest times in individual tests were recorded in test J
I repeated tests J and K on separate workstations and there was no clear winner between the 2 methods though the time difference was always very small
If anyone can see ways in which the above query can be further optimised, please do let me know!
View Query Execution Plans
You can use the JET ShowPlan feature to view the query execution plans for your queries.
By doing so, you can often obtain useful information to assist with the process of optimising your queries
Using this feature creates a text file ShowPlan.out which can be viewed in Notepad
For further information, see this article ShowPlan – Go Faster
I have attached three ShowPlan files for the above tests:
• ShowPlanA.out – this lengthy file just covers Test A which uses a DLookup.
It should help explain why using domain functions in a query will ALWAYS be SLOW
• ShowPlanB2J.out – this covers all the other main tests: Tests B => Test J
• ShowPlanStacked.out – this just covers the stacked query version used in Test K
As an example, this is the query execution plan for test C. It is the shortest of the 11 query execution plans by a long way
Click to download:
Example database: OptimiseQueries.accdb Approx 3.5 MB (zipped)
Query execution plans: ShowPlan.out Approx 1.5 MB (zipped)
PDF version of this article: OptimiseQueries.pdf Approx 1.0 MB (zipped)
To provide feedback on this article, please enter your name, e-mail address and company (optional) below, add your comment and press the submit button. Thanks