AKA How the new (12.4) aggregate statement makes your life easier
Summary
With the new (12.4) aggregate statement we get a new tool we can use to make some of our queries more efficient. While that is the intended purpose, we can also twist things a bit to get some more insight into the data we keep on disk, and the efficiency of some of our indexes.
To help with this we’ve created a new tool (onetwothree) that will output the number of records you can expect an index to return. That data can be used to identify possible improvements, and most importantly, to identify outliers that may cause performance issues.
Now depending on how many records you have, and how that data is distributed across different field values, you might end up in some situations where an index works fine for 99.9% of clients, but there will always be that one client who has their own incredibly weird setup that causes issues. If normal clients get back 10 – 100 records back for a query, but your outlier gets 10000 records this will be visible using onetwothree.
Data
While we could start with a nice hour long talk on what data is, it’s already been hard enough for me to wake up in the morning, so I’d rather not go straight back to sleep. For the purposes of this article, we’ll be looking at 3 things:
- Disk Structure – how your data is stored on disk, and how the OE RDBMS works with it
- DB Structure – how your data structure is defined – your tables, fields, and indexes
- Data Values – the actual values you save inside of your database.
The aforementioned tool is only concerned with the data values, and to a lesser extent the DB structure, but we will also go into the disk structure side of things briefly as knowing about it can be useful.
Disk Structure
We will be skipping any Type I storage areas as they frankly don’t matter anymore. If you are still using Type I storage areas, the first thing you should do is move to Type II storage areas as soon as possible. One of the easiest ways to do this, is to use the proutil tablemove command.
Here are a couple of things that you should be aware of as you work with your database.
Your DB hates your HDD/SSD
As you can tell from the title, there are some differences between how much your DB hates HDDs Vs. Sata SSDs Vs. NVME SSDs and the reason for this “hate” is access time.
The numbers below aren’t all that accurate as they differ based on various factors like the manufacturer, the specifications of the disk, the controller used, RAM latency, frequency, etc. but they are good enough to show the difference between storage types.
For an HDD you are looking at 5 – 10 ms for a random read. If your data is stored contiguously, the next read will be faster because the HDD won’t have to seek the next location, but it’s still slow.
A SATA SSD on the other hand has an access time of 50 – 100 microseconds. That’s a 100x difference compared to HDDs.
An NVME SSD has an access time of 5 – 10 microseconds. That’s a 10x difference compared to a SATA SSD.
Accessing data in RAM has a latency of 10 – 50 nanoseconds. That’s another 500x difference compared to an NVME SSD.
These huge differences in access times are why the database will do its best to minimize disk operations, and when not possible it will offload disk operations to different processes.
The way data is stored on disk matters
While the database tries to minimize disk access, it’s not possible to do away with it entirely. At some point data will need to be persistently stored somewhere, and disks are the best option currently available.
Whenever the database must work with the disk (whether to read or write something) it will do it in chunks the size of the database block size. This should match the OS disk block size or be a multiple of it. For example, on Windows, the default block size for an NTFS formatted disk will be 4KB. That means that your database should have a block size of either 4KB or 8KB.
While it’s possible to use a smaller block size i.e. 1KB for a system using a 4KB block size, this will lead to serious performance issues. This means that when the DB wants to read a block, it will ask for a 1KB block. The system will then go to the disk and read the 4KB block. Then from that 4KB block it will send the database the 1KB chunk it asked for. When reading the next 1KB block, the process repeats again.
Instead of 1 read operation we have now done 4, and the result is the same – we read 4KB from the disk. The only difference is that it took a lot longer than it should have. Unfortunately, there is no easy way to move to a different block size. You’ll need to dump the database, create a new database with the new block size, and load everything into the new database.
Just as an aside, 8KB is the preferred block size even if the disk block size is 4KB. While reading a single block will now require 2 read operations, there are benefits for index blocks, and how compactly the records are stored. The one downside is that operations that involve 2 OS blocks aren’t atomic 100% of the time. If power cuts off at just the right time, you’ll end up with some corruption that won’t be fixed by crash recovery. Thankfully this is so unlikely that the benefits far outweigh the risks (at least in my opinion and for my use cases). If you ever plan on changing the DB block size, make sure that you change the records per block for all areas, or you’ll end up with 8KB blocks storing 4KB of data.
Now here comes the fun part – blocks won’t be 100% filled with the data you are asking for. That means that limiting the number of blocks you have to read or write WILL have a big impact on performance.
Always start your asynchronous processes
We already know that the database doesn’t want to wait for disk access, but how can we get around that? The answer is quite simple – make sure that your APWs (Asynchronous Page Writer), BIW (Before Image Writer), and AIW (After Image Writer) processes are running.
The BIW and AIW will write filled BI/AI buffers to disk. Because of this you should always have a supply of empty BI/AI buffers to write to, so the database won’t have to wait around for a disk write.
As you make more and more changes, the BI buffers will get written to a BI cluster by the BIW. When that cluster is full, you got to the end of a checkpoint. A checkpoint is the process through which the in memory and on disk state of the database are reconciled. During a checkpoint the modified buffers associated with a cluster must be written to disk. All the modified buffers that weren’t written to disk, must be written to disk at the end of a checkpoint. While that write happens, database write activity will be stalled.
This is where APWs come in. They take modified buffers from the database buffer pool (-B) and write them to disk in the background. If you have enough APWs and checkpoints with a reasonable duration (2+ minutes), at the end of a checkpoint there won’t be any buffers that must be written to disk.
Fragmentation and Scattering
All tables belong to a storage area, and every storage area has a setting that decides how many records can be stored per database block. This is important, as a block will be assigned that number of ROWIDs when it is created. If not set correctly you will either end up with fragmented records, or with blocks that won’t be filled leading to a higher scatter factor.
As records are stored in blocks, it’s possible to end up with a record bigger than the remaining space in a block. That record will then be split into multiple chunks, thus spanning multiple blocks. It’s also possible for a record to be written to a block, and for that block to be filled later. If that record is then updated to contain more data i.e. by writing to a comments field, that extra data will go to a different block.
The result is the same – to access 1 record, you will need to access more than 1 database block. Thankfully the number of fragments is limited, a maximum of 9 for a database with a 4KB block size, and 5 for a database with an 8KB block size.
The scatter factor is a simpler metric, and it can be calculated as the # of blocks used / minimum # of blocks. A high scatter factor means that you have blocks that aren’t filled with records, either because your records per block was set too low, or because a lot of records were deleted across many blocks, so you now have a lot of empty space to deal with.
While for fragmentation the issue was that you had to read multiple database blocks for a single record, the issue when dealing with a high scatter factor is that you have to read more blocks than would be strictly necessary for a given number of records.
For a very simple example, let’s imagine some blocks that can fit 100 records, but we have a high scatter factor, so they only hold 50 records each. If we run a query that will return 500 records, that means we’d read 10 database blocks. If the blocks all stored 100 records, we’d only need to read 5 blocks. This is a very simplistic example, and not at all how things work in the real world. At the same time, it provides a good enough explanation for the issue.
In addition to causing issues by increasing IO operations, a high scatter factor will also lead to more memory being used because your buffer pool will use more buffers for a smaller number of records.
To get information on scatter and fragmentation, you can use proutil dbanalys. Sadly, there’s no easy method of dealing with these 2 issues. You can move tables to a better configured area using proutil tablemove or you could dump and load the database.
Data Structures & Data Values
There’s not much to say about data structure, outside of making sure that you are making reasonable choices. Use multiple tables where appropriate instead of storing everything inside a single massive table, create indexes based on the queries you will be using, don’t create indexes that won’t be used, and use proper names for your tables & fields (not 5 letter abbreviations or acronyms). As far as values are concerned, just be consistent with the way in which things are stored.
The Aggregate Statement
Before we get to the new tool, we need to go over the aggregate statement. This new statement got added with 12.4 and the documentation for it is available at https://docs.progress.com/bundle/abl-reference/page/AGGREGATE-statement.html.
There is one very important reason why you should care about this new statement – it offloads everything to the database server.
Previously if you wanted to sum up some values i.e. Order.TotalAmount for all orders made in the last 3 months, you would write the query, fetch all the records from the database, and do the sum locally. If you were feeling particularly great that day, you might have used the accumulate statement.
Regardless of how you did it, the only thing that was constant was the fact that the database had to send some records back to the client. While this is less of a penalty if you are using a shared memory connection compared to a TCP connection, it was still unnecessary.
The aggregate statement does the same (almost) thing, except no records are sent from the database to the client. The only thing you get back from the database server will be the aggregate value. Everything happens on the database side thus performance is maximized. The syntax is easy to use as well:
AGGREGATE AllOrderTotal = TOTAL(Order.Amount) for Order where Order.OrderDate >= TODAY – 90.
As it’s a relatively new statement, there are some limits that will hopefully be removed in future releases:
- You can only calculate a single aggregate
- Doesn’t support the TENANT-WHERE clause for multi-tenant tables
- No dynamic version
Thankfully, for our tool these limitations don’t really bother us.
Onetwothree
This is a newly created tool and is available at https://github.com/claudiu-spac-wf/onetwothree. The associated README provides some information on how to setup and use the tool, so the steps provided here won’t be particularly detailed. The tool itself has some command line help, so if you would like more details on what parameters are available you can simply run something like this: python ./onetwothree.py cfg -h
How it works
Overall things are quite simple. You generate a config file based on your database structure (tables, indexes, and fields). That config file is then used to generate some code that will go over your data and provide some results. Finally, you run the generated code and look at the results.
Creating a config file
Generating a config file is quite simple. A single command will generate a config file based on your database structure: python ./onetwothree.py cfg
While you can use the generated configuration file to get information on existing indexes, the real fun starts when you begin to edit the configuration file. The format is simple, so you will be able to quickly make changes. Do you want to know how a new index would do? Simply add it to the configuration file. What about adding a new field to an existing index?
Other than knowing how well indexes would do, you can also use this tool to find some records that might interest you. We’ll go over this a bit later, when we get to the data this tool outputs.
Generating the code
Generating the code is again a matter of running a single command: python ./onetwothree.py gen
This will create an output folder containing some base classes and include files, and finally some classes that will run the reporting. A class will be created for every single index specified in the config file. You shouldn’t need to make any manual changes to this generated code, but if you want to see how it works, or otherwise improve it so it better fits your use case, the code itself is simple.
Running the code
Generating the code is again a matter of running a single command: python ./onetwothree.py run
This will then create instances of all the classes we generated and start the reporting. By default, each class can only take 20 seconds to report. This is done using a do stop-after 20: block so that things can stop in a reasonable amount of time.
If you want to allow for more runtime, simply include a MaxRunTime field in the config file. The value of this field will then replace the default 20 seconds.
What the code does
The best short way to explain things is that we go over table records, aggregate by unique values at all index levels, and keep track of the number of records that would be returned. And now for the explanation & example combo so that the short explanation makes sense.
For an example we’ll be looking at a Customer table, and an index over the Country, State, City fields.
Going over table records
To know how our aggregate query is supposed to look like we need to go over a lot of table records. The first index level is the Country, so are interested in all values for the Country field. We don’t gather these values first because we don’t want to spend our allotted 20 seconds just scanning an entire table. Instead, we use something like this: FOR EACH Customer NO-LOCK BY Customer.Country DESCENDING: LEAVE. END.
This fetches the first Customer record, and thankfully the buffer is available after the FOR EACH statement.
Aggregating by unique values
While a Customer buffer is available, we run an aggregate statement counting Customer records with the same Country as the currently available Customer.
AGGREGATE Count = Count(bufCustomer.Country) FOR bufCustomer WHERE bufCustomer.Country = Customer.Country.
The result we store reads something like this: At the first level of the index, a query with Country = USA returns 1609 records.
At all index levels
This is an index with 3 levels, so after we store the previous result, we need to look at the second level of the index. To do this we run a similar process to what we did until now, the only difference being that the queries we use will include a Country = USA clause. The first state we find is Vermont.
FOR EACH Customer NO-LOCK WHERE Customer.Country = “USA” BY Customer.State DESCENDING:
AGGREGATE Count = Count(bufCustomer.State) FOR bufCustomer WHERE bufCustomer.Country = “USA” and bufCustomer.State = prevLvlCustomer.Country.
Now we hit the biggest issue with the resulting output – it can’t be as precise as it would ideally be as the number of possible combinations balloons up incredibly quickly (# Countries * # States * # Cities at most, generally speaking less but still a significant number).
The result we store reads something like this: At the second level of the index, a query with State = Vermont returns 156 records. Note that there is no mention of the Country here.
Interpreting the results
This is the part that needs some more work to get right. The results are a json file, containing data for every index specified in the configuration file. For every index you’ll get some summary results: the average, minimum, and maximum number of records that index would return.
In addition to that summary, you will get multiple data points, one per level and number or records returned. This is where we start running into issues. Using the previous example, things are still fine at the second level as the USA doesn’t have multiple states with the same name. It does have Cities with the same name, however. Franklin is the most common, with 31 cities sharing the name.
At the third level the output we get will be a bit confusing:
City = Franklin with 5 records returned
City = Franklin with 10 records returned
City = Franklin with 3 records returned
All of the above are valid outputs, it’s just that they refer to different states. In the end these records will be combined, so the final output will be something like this:
Cities of Franklin, Arlington,…,Clinton with 5 records returned.
Because of this, the resulting graph has a good chance of being confusing – as you hover over different graph bars, you’ll start seeing the same value returning different number of records. There’s not much you can do here, if you require a complete set of values, you’ll need to do a bit more work to figure out the values for all fields.
Because we have data for every level of an index, you can also show data for a given level. If you are only looking for outliers, you can move to the previous level of the index and pick the outlying value for that level.
Usage examples
Finding outliers
Yesterday I needed to find some receipts that had a lot of associated order lines to investigate receipt printing performance. I could create new receipts, or I could blindly go through recent receipts to find some data that matched my purpose, but where’s the fun in that?
Instead, I created a new configuration file and input the data I was interested in – go through OrderLine records and look at the CurrentReceipt index. I didn’t want to get data for all OrderLines, so I added a QueryString that limited things to sales that happened this last month and only for a particular item type.
The result was a json file that when graphed showed me the number of OrderLines per receipt I would get. Since I wanted receipts with a lot of order lines, I simply hovered over the bar showing me that I would get back 1190 OrderLines, and in the tooltip I could see the receipt number I should use – 46016.
This applies to essentially everything. Receipts with a large amount of order lines, records with a lot of associated AuditLogs, households with a lot of receipts, etc.
Making index changes
As I was going through a performance issue, I wanted to see how much a new index would help. There was a query that happened whenever something new got added to the cart (going over some general ledger records). The only problem is that it was over a table that was only looked at for entire receipts outside of this one use case.
Turns out that the only thing I had to do was add an existing order line field to the index. I changed the config file so instead of the index having just the receipt number, it now also had the order line. Generate the code, run the reports, and find out that I’m getting kicked out at the 20s mark because I had too much data.
Now I could increase the MaxRunTime, but in my case it made more sense to make the actual index change since I was going to need it for testing as well. I brought everything offline, made the index change, did an index rebuild all because I was lazy and I was all set. I ran the report again, it finishes in 3 seconds, and now I can look at the results.
Because it was going over everything in the receipt, the issue only showed up for big receipts. Looking at the outliers for the first level of the index (just the receipt number) I can see a large number of records for the same big receipts we found in the last example.
The number of records depends on how the fees were setup, but as an example, for a receipt with 1000 items, we were getting 5000 general ledger records – 5 per order line. Because the query ran whenever something was added to the cart, over the lifetime of the cart we were reading ~5 * N * (N + 1) / 2 records, and most of them were discarded by the database server because they weren’t for the right item. To add a 500th item to the cart, we would need to read a total of 620.000 records.
Now that we included the order line, the second level of the index was returning just the 5 records for that specific order line, the ones we cared about. The result was a new index in our database, and a better experience for customers working with large receipts.
Ignoring the tool but using the aggregate
We moved to 12.8 somewhat recently, so we still have a lot of queries that do simple math – totaling up the fees for items, counting the number of records, etc. But with the move to 12.8, we can fix some of the more egregious uses of the good ol’ for-each-with-a-single-assign-inside uses.
For us that was fees. Records could have their fees changed across different receipts, they could be partially paid, etc. We wanted to know the amount any given household had to pay for items. While we tagged order lines as fully paid, we still had to go through the fees for those items.
While the index we were using was 100% correct, things still took long enough to be an issue, especially for customers that have a lot of not fully paid items. That translates directly to all our big customers, so it was an ongoing annoyance for them.
Now instead of that nice sum-inside-a-for-each, we could just use the aggregate statement. We went from ~5s to ~0.5s for that particular query just by changing it to an aggregate statement.
Some last thoughts
Performance is a huge part of application development and maintenance. While the OE RDBMS provides a lot of tuning options (almost all of them not even mentioned in this article) having more tools to work with is always welcome.
For this article, the star of the show is the new AGGREGATE statement. It can have a huge impact because instead of fetching however many records from the database, we can have the database return a single COUNT/TOTAL/AVERAGE value making things much faster.
I hope that onetwothree will see more use, and will help more people find weird outliers, or improve their indexes. Personally, I plan to continue using it, and improving it as I need new functionality for my use case.
Finally, there is one piece of advice that will always be relevant – make sure your -B is big enough, and that your hard disks are fast enough.
Author: Claudiu Spac, Tech Lead
He is a seasoned tech lead with a decade of experience at Wayfare, specializing in Progress OpenEdge. He has tackled diverse projects, from TV scheduling to warehouse logistics, with a strong emphasis on performance optimization and resolving intricate database challenges.