Application Performance

Useful Tools and Practices

What Is Performance?

In an ideal world, your application will only do the strictly minimum amount of work to accomplish a task. In the real world, we run into various inefficiencies – CPU cache misses, CPU branch prediction taking the wrong turn as it were, inefficient code, bad indexes, etc.

This means that instead of an ideal amount of work X, we need to do X * N work to accomplish the same thing. Because we don’t live in an ideal world, the amount of work will always be more than strictly necessary. The database reads data in blocks, not singular records. Blocks are ordered after one index (which makes reads using that index faster), but we use many indexes. The branch prediction on our CPU isn’t 100% accurate. The code we write isn’t the best it could be, either because it needs to be readable, or easy to maintain, etc.

Performance is an illusory metric that everyone cares about. To some it’s the difference between how well things are working now vs. how well they could be if we moved as close as we could to that ideal amount of work. To others it’s the simple check of whether someone complains about something taking too much time. To people using cloud computing, it’s the amount listed at the bottom of the bill for last month.

We need to strike a balance between working on performance, and working on new features, or maintaining our application. This is why the way in which we approach performance is so important – if everything is worked on while keeping a close eye on performance (from the standpoint of “is this as good as it can be”), everything will be much easier because potential performance issues are fixed as soon as they come up. If we are not proactive, we end up in situations where improving performance requires a lot of separate performance focused work, a lot of refactoring, and a lot of architecture changes.

In all these situations, we will need various tools to help identify performance issues and make fixing them a lot easier than if we were to make performance changes blind. In fact, one of the leading causes of performance issues is making performance changes blind – the code looks like it should be an improvement, it feels like an improvement, but in the real world it’s 2x slower than the code it fixed.

Identifying Performance Issues

The most common ways of identifying performance issues (at least as far as projects I worked on so far) can be split into 2 wide categories – someone feels that something is too slow, or someone who has your boss’s number calls him at 2 AM yelling, so he calls you at 2 AM yelling. Depending on whether you are a manager, you might then call someone else at 2 AM yelling.

As fun a process as this might be, there are various issues with it. The fact that something feels slow doesn’t mean it wasn’t slow before it felt that way. To feel that something is slow you either need to use that functionality often enough that you get annoyed with it, or the performance issue must come at once, so you noticeably go from a sub-second to a 2 second screen loading time.

As for waiting for customers to call you about performance issues, that’s a bad time for everyone involved. And customers might not necessarily want to call you, they might just call a competitor whose application might not have the same performance issues.

There is only one way of reliably identifying performance issues – using tools to measure it and storing enough historical data that you can easily trend how well your code did.

Tools

There are a lot of tools that can be used for performance monitoring, some more expensive than others, some with more functionality than others, etc. We won’t be going over APM tools (or Open Telemetry) in detail here, but they will be briefly mentioned.

APM Tools

APM stands for Application Performance Monitoring and there are a lot of options available. Personally, I would recommend choosing one that supports Open Telemetry as that’s the standard we seem to be moving towards.

An APM tool will be a centralized platform where you can easily go over historical data and view various metrics. It will ingest various pieces of data (logs, open telemetry traces, various metrics, etc.) and make it easy to work with all of those at the same time. The CPU usage of a server spikes at 2 AM every Saturday. Go through the associated logs and see what was running at the time. Go through the database metrics and see what kind of activity was happening at the time. And all of this can be done in the same place, and quite easily.

On the Progress OpenEdge side of things, there is one outlier as far as APM tools go – ProTop. This tool is 100% focused on database monitoring and provides a lot of in-depth information. That focus means that it will not deal with things like log ingestion, but the database information it can provide is incredibly useful.

CPU Profiler

Currently OpenEdge provides a CPU profiler, and it is incredibly useful when it comes to identifying performance issues. It tracks the execution time of the code that runs and provides various statistics about it.

The main use is figuring out where the time spent is going to. Is one procedure taking 60% of the total time? Is there one query that takes 10 seconds instead of the expected ~0.2 seconds? How many times is a particular function called? Depending on the setup you can even get more detailed information – instead of the average across N executions of a query, you can get timing information for every one of those N executions.

Memory Profiler

Like the CPU profiler, we have the memory profiler. Instead of tracking the execution time of our various pieces of code, it tracks memory usage. This includes what objects are created and when they are deleted. You’ll be able to tell what your memory was used for at a given time.

Sadly, this isn’t 100% done. While starting with 12.8 you can enable the backend, there is no frontend so you can easily go over this data. In some future release the frontend will be available as well, but until then we can only rely on community developed frontends. Proginov does have their own frontend that works with 12.8, but I am not sure how/if it is available for download.

If you want to develop your own, the output file format isn’t supposed to be a secret. While I don’t think it’s available right now, it should be easy enough to obtain by asking someone from Progress.

Load Testing

While going over profiler results in your local development environment is going to provide a lot of useful information, the volume of data most likely differs from a production customer. Then there’s also various configuration differences to consider – database parameters, agent startup parameters, whether you are connecting to a DB using shared memory or a TCP connection, what you are doing vs. what a customer would be doing, behavior differences when under heavy load, etc.

Because of this, you should have a load testing solution, and you should be using it as much as possible. Create various scenarios based on what an actual customer would do. If possible, obtain a UAT database from one of your customers for whom performance is a big issue. Run your set of load tests at least for every release and keep the resulting historical data. Being able to trend results over the last 12 releases is a great way to showcase how much performance has improved.

Proutil Dbanalys & Promon

Progress provides these 2 utilities, and they are invaluable when it comes to figuring out performance issues. Using dbanalys you can easily figure out how many records your tables contain, and some more information on how fragmented/scattered your records are.

Using promon you will be able to obtain a lot of in-depth information on what your database is doing. Things like being able to display the current values of various database parameters, monitoring the buffer hit rate, seeing how many buffers were flushed at the end of the last checkpoint, the number of timeouts for the various latches used by the database, and more.

XREF Files

While not strictly a tool in and of itself, it’s possible to generate XREF files when compiling Progress ABL code. One of the most important use cases for these files, is being able to check what indexes a particular query uses. This doesn’t work for dynamic queries, but for those you can use the INDEX-INFORMATION attribute on the query handle at runtime.

The format of the XREF files is simple, so with a bit of code you will be able to go over all your queries, and see the indexes used by all of them. Throw in some extra checks and you will be able to enforce various rules.

The same XREF output is also available in SonarLint by Riverside Software, so if you are already using SonarLint and SonarQube you should be able to create and enforce similar index usage rules.

Logging

Logging is a very powerful tool when it comes to identifying performance issues. A simple timer can just log the time it took for a procedure to run, or you can enable various log entry types that provide more information.

One log entry type that is particularly helpful when it comes to query performance, is QryInfo. It has two levels it can be used at – level 2 for dynamic queries, and level 3 for all queries. The idea is that it can output various information about a query, including the file and the line number, the indexes used, the time taken, the number of records read by the DB, the number of records sent back to the client, whether the sorting is done on the client side,etc.

When using level 2, the gathering of information can be enabled using the BASIC-LOGGING attribute on the query handle, and the information can be dumped to the log file using the DUMP-LOGGING-NOW() method.

Virtual System Tables

Previously we mentioned promon. The information output by promon can be obtained from the various VSTs that are available. These can be accessed as normal tables, so we can easily write some code that obtains various statistics.

There are various examples online, but some simple examples would be obtaining data on the number of CRUD operations done to all our database tables, monitoring the buffer hit rate, checking how many buffers had to be flushed during the last checkpoint, and more.

Using the Available Tools

Now that we know what tools we have available, we need to start using them. Here is an example of what can be implemented using these tools. We have a lot of tools that we can use, but most come with some sort of performance penalty. The impact will vary depending on how much load there is on your application, but tools like the CPU and memory profiler will have impact on both CPU and memory usage. Others like QryInfo:3 logging simply creates too many logs and causes a big increase in the traffic between the database and the client.

There are other nice-to-haves that we can implement quite easily, and all of this can be integrated with your application to allow for some ease of use, with some very minor code changes. The idea is quite simple – we want to control what we profile, what we log, and when we do both of those from a single place. This can be done using the profiler and the log-manager handles.

Overall, we only need a couple of things:

Some configuration – this can fit in with all your other configuration
Starting the profiler as soon as possible within your application code
Stopping the profiler as late into your application code as possible
Allowing for runtime changes of the log-entry-types used

If all of that is done, you can start doing some interesting things to gather data. Using your configuration you can enable profiling for 1% of all requests. Or enable it for all requests that go to a particular procedure. Ignore the profiler output for requests that took less than 0.1 seconds.

Because of the associated issues with QryInfo:3 logging, you’ll need to be more careful, but it should be possible to enable it only in some more stringent circumstances i.e. when the user abc clicks on the checkout button which we want to investigate.

All the profiler output can be stored in the database. We can also use a word index to keep various search criteria against the profiler output i.e. a duration bucket, the related action (checkout, add to cart), the user, the datetime, etc.

Even without going into the contents of the profiler file, by just using all the data we now have about how long certain actions took we can easily graph this out to see how our application behaves from day to day, or from release to release. We can also find outliers and investigate to see what caused the difference.

Similarly, we also don’t necessarily want to go through a million profiler output files. The profiler output format is simple enough to import (to get a quick start point check how the old Progress Profiler.w imports data from the file) so we can merge profiler files sharing the same action and duration bucket. Or if we are using a framework for our UI work we can simply ignore every line that isn’t part of our framework to get some insight on how our framework is doing.

Now that we can change what we monitor on the fly, there is another thing we really need to add. If a particular customer is running into issues, we could try to dupe the issue locally. Or we can enable all our monitoring for that customer and then find just the information we are interested in by filtering for the user that is having the issue. Or we can make it easier to enable & disable our monitoring for a given user/session.

Regardless of what frontend the application uses, it should be possible to add a keyboard shortcut that enables profiling and another that enables QryInfo:3 logging for the current user (provided the user has the necessary permission in the config). The resulting output should also be flagged so we can more easily find it. It also provides some interesting ways a customer can help with intermittent performance issues – they get an uncommon issue that can’t be easily duped, they enable profiling andQryInfo:3 logging, they go through the process that has issues, and they then disable the extra profiling and logging. At some later point they can create a ticket that has all the necessary information to find and fix the issue.

Custom metrics can be created using all this information and they can be imported into an APM of your choice as well. Things like Receipts Created, Carts Left, User Logins, etc. can be created quite easily. As mentioned previously, this is a starting point of sorts. As you need more functionality, you can add to the existing code and get better and better results.

Database Parameters

While we have a lot of tools we can use to monitor performance, there are a lot of potentially high-impact changes we can make to our database. There are a lot of parameters, and I honestly recommend grabbing the entire Progress 12.8 documentation and just reading through all the mentioned parameters. It’s a nice exercise and chances are that you will learn something new about how the database works.

Below are some sets of parameters that can be tuned to obtain better results. Just keep in mind that changing these parameters without a production-like level of load isn’t particularly useful or measurable.

Disk Bottlenecks

The database particularly hates working with your disks. Because of that it’s architecture to prefer working with memory, and it tries to offload as many disk-related operations as possible to processes running in the background. In here we will go over some of the parameters you can change to minimize disk bottlenecks.

Enable the BIW, AIW, and APWs

You should always enable the Before Image Write, the After Image Writer (if you are running After Imaging, which you should be), and enough Asynchronous Page Writers. How many APWs are enough depends on the amount of activity you see, but it’s easy enough to figure out.

If you don’t have a BIW, the BI buffer writes to disk will be done by the database servers, and in most cases, it will be a performance penalty. There is no reason not to have a BIW always running.

An AIW does a similar job as the BIW, but it does it for AI buffers. There is no reason not to have a AIW always running.

An APW has multiple duties – they ensure a constant supply of empty buffers in your -B so the database doesn’t have to wait for a disk write, they reduce the number of buffers the database must examine before writing a modified buffer to disk, and maybe most importantly, they reduce the overhead associated with checkpointing.

At the end of a checkpoint all modified buffers must be written to disk. An APW will throughout the lifetime of the checkpoint write buffers to disk, ensuring that the end of the checkpoint takes as little time as possible. This is important because while the database is writing all modified buffers to disk, the database write activity is stopped.

Again, the number of APWs is heavily dependent on your application and the environment the database is running on. You should start with 2 APWs and then monitor if any buffers are written to disk at the end of a checkpoint. The easiest way to monitor checkpoints is using promon -> R&D -> 3 -> 4 and checking the Flushes column. If you are seeing flushes, increase the number of APWs by 1 and try again.

Changing your BI block and cluster size

Depending on how fast your disks are, and how much write activity your database sees, it may not be necessary to make any changes here. If we assume that a checkpoint only has a 5 second duration, that means that your APWs only have 5 seconds to asynchronously write changed buffers to disk before the end of the checkpoint. Once the 5 seconds are up, the database will write all modified buffers to disk, and stopping all write activity until that is done.

Basically, we want to ensure that checkpoints last long enough to allow our APWs to ideally do all the writes before the end of the checkpoint. To check your checkpoint duration, we can use promon -> R&D -> 3 -> 4 and look at the Len column. That is the duration of the checkpoint in seconds.

We want our checkpoints to take at least upwards of 2 minutes under load. To increase the checkpoint duration you’ll need to change your BI block & cluster size. This can only be done offline, but the change itself is quite easy: proutil <dbname> -C truncate bi -biblocksize 16 -bi <cluster size in KB>

To get a working guestimate, you can divide the expected checkpoint duration by the current checkpoint duration, and then multiply the current cluster size by the result. This should get you close to the right value. As far as the BI block size goes, there’s really no reason to have it set to anything lower than the maximum 16KB.

Monitoring your buffer hit rate

The buffer hit rate has a big impact on performance. There are a couple ways to ensure a good buffer hit rate, but first how do you monitor it? Again, using promon: promon -> R&D -> 3 -> 1. You’ll notice that there are 3 buffer hit rate values displayed:

Overall buffer hit rate
Primary pool hit rate
Alternate pool hit rate

Alternate pool

This leads us quite nicely into a way of increasing our hit rate – moving some tables to the alternate buffer pool. Most applications have some small tables that are constantly read but not updated all that often. A prime example of this would be tables holding some configuration, or screen definitions, and these should be moved to the alternate pool. The schema area should also be moved as it has a fixed (until you change your schema) size, is small, and is constantly used when you execute dynamic queries.

To enable the alternate pool you’ll need to run proutil <db> -C enableb2. This can only be done with the database offline. Then you’ll need to assign a proper value to the -B2 startup parameter – we want it big enough to contain all the tables being moved to it in memory, and some more space to account for some future growth, but not much more than that.

This needs to be monitored, as if -B2 isn’t big enough, you won’t really achieve anything. The idea is that you have a buffer pool with a 100% hit rate, for all the small tables that are commonly used.

Increasing -B

Your primary buffer pool may need to be increased as well. If you are seeing buffer hit rates below 95% you need to increase -B. Memory is cheap enough nowadays that the performance impact almost always warrants the expense.

Changing the -lruskips value

The database mostly uses latches (or spin locks) when it needs to modify in-memory data. This ensures that only one process can make changes to a resource at a time, thus avoiding corruption. One such latch is the LRU latch, and it is used when making changes to the Least Recently Used chain.

When a buffer is accessed, it needs to be moved to the Most Recently Used end of the LRU. If this happened every time a buffer was accessed, most of the time would be spent spinning or napping while waiting for the latch to be available. Instead of doing that, we can use the -lruskips parameter to say that a buffer should only be moved after it’s been accessed -lruskips times.

For 99.9% of databases, a value of 100 will be more than enough, and it is the default for OE12. If you have it set to a different value, try setting it to 100 and checking how much of an impact this has.

Network parameters

If you are using a TCP connection, there are some parameters that you will most likely need to change to get better performance. When using a TCP connection, the database will fetch a number of records, and these will be packaged up inside a single message. The size of this message (-Mm) and the numbers of records fetched will have a big impact on the amount of messages that need to be sent.

Message Buffer Size (-Mm)

This defaults to 8192 bytes and can be set as high as 32600. The idea is that by increasing this parameter we will see less traffic going back and forth between the client and the database. Try increasing this to 16384 and even 32600 and check how that affects performance for you, as it depends heavily on the size of your records.

Prefetch Delay (-prefetchDelay)

This is enabled by default, and it should stay enabled. Previously network communication was slow enough that it made sense to send the first message back to a client ASAP. That way the first record could be displayed/used quickly.

Now that’s no longer a concern, so we can quite comfortable delay the first message until we fill it as much as possible.

Prefetch factor (-prefetchFactor)

This parameter takes a value of 1 to 100 and represents the percentage to which a message needs to be filled before being sent to the client. The default value is 100 and it is the correct one. As far as I know there is no reason to have this set to anything else.

Prefetch Num Recs (-prefetchNumRecs)

This parameter takes a value from 0 to 32766 and represents the number of records that should be placed into a message before it can be sent to the client. This defaults to 64.

This is a value that we really need to change, as it can cause issues, especially for smaller records (or queries asking for a single field). Like -prefetchFactor there is no real reason to have it set to something so low. We want the messages getting sent to be as full as possible, so this can be directly set to the maximum value.

Prefetch Priority (-prefetchPriority)

This parameter represents the number of records that should be added to a message before the database makes another poll() call. When set appropriately this can lower the CPU usage a bit, and make the entire process go slightly faster.

By default this is set to 1 which means that after every record is added to the message, a new poll() call happens. This should be increased, and the CPU usage monitored. If set to a value of 5, the process will instead work like this:

Add a single record
Poll()
1. If the poll() returns that the DB needs to do something else, do that and go back to 1
2. If the DB has nothing else to do, add 5 records and go back to 2

Common performance mistakes

Because we have some performance tools we can use, we know whether something needs to be changed because it is performing badly. If we didn’t have those tools available, we could run into all sorts of issues, and here are some of them.

Caching Helps, Right?

In broad terms, caching helps. Trying to cache something blindfolded however doesn’t. This is something I’ve noticed a lot in the last couple of months – records are fetched from the database, they are copied to a temp-table, that temp-table is then written to a longchar, which is then saved in the database as a cache record. When we need the same records, instead of grabbing them from the proper database tables, we instead get the cache record and use read-json() to get them in the same temp-table.

This process in and of itself isn’t particularly problematic, outside of mentioning some small things that can help with performance:

When using read-json() specify the “empty” read mode if possible. This has a big impact on performance, even if you are loading into an already empty temp-table.
Avoid using the XML versions of read/write-json. They are significantly slower than their json counterparts. Thankfully the change is very easy to make if the output is only used internally.
Try using a memptr instead of a longchar as it has better performance. You might run into weird issues if the data you are working with has characters from other languages i.e. Japanese, Arabic, etc.
Depending on what you store, it might be a better idea to add a new table with the right fields to the DB instead of reading & writing from a temp-table, but we’ll go over this next.

The problem comes when no one measures the impact. They just decided that caching would be a good idea, so they just went ahead and did it. In a lot of cases like this, simply disabling this caching helped performance. To take it further, it might not even be necessary to copy records into a temp-table, you could get away with simply using the database buffer, especially if the records are only needed in 1 or 2 places.

The moral of the story is quite simple, always measure before making performance changes, because at times they might create more issues than they solve.

Use the Existing Caching Mechanism, Right?

You have some data that you need to cache, and the current caching mechanism works by storing a temp-table as a clob. I can just keep using this mechanism right, I mean it worked for everything else so far.

Unfortunately, that’s not the case. There are situations when instead of storing the contents of a temp-table it’s better to create a new table with the right schema for what we are caching. If you need all the data in the stored temp-table, and the stored data is diverse enough, storing things as a clob can work quite well. In situations where you only need some of the cached records, it’s better to use a new table.

There’s not a lot more to say here, it’s all about minimizing how much data needs to be sent between the DB and the client, and more importantly minimizing the time we spend reading/writing from/to json.

One other thing to consider is the level at which the cache is stored. Sometimes a cache record is created, but that cache record is then used for every single request. In situations like these, there is one simple fix – move the cache so it is stored at the PAS agent session level. Sure, you’ll need to implement some invalidation mechanism, but now instead of reading the same cache record from the database with every request, it will be available in memory.

Reflect on Everything

One of the nice things about OOP is the ability to use reflection to do some interesting things. Then people take it too far so instead of admittedly very long assign statement, they use reflection to create something that can do the same thing for any object, and in fewer lines of code.

This works fine, the only problem is the associated performance overhead. There’s not much that can be added here, just make sure that you use reflection only when it is needed. If you know that you are using reflection quite a bit, run some profiling just to see whether it’s already taken over a good percentage of time spent by a procedure.

I Can Just Take a Break, Right?

At some point in your code, it’s a good idea to just take a small break and try again. Good examples for this might be external APIs that aren’t the best at being available, trying to lock commonly updated records, trying to access a file on disk.

There’s one single issue with taking that small break – the profiler doesn’t (or at least didn’t when I last checked) account for pause statements. If a procedure takes 2 seconds, and you add a 5 second pause, the profiler will confidently tell you that the procedure took 2 seconds to run while it took 7.

I Can Just Exclusive Lock the Record, Right?

This is another one of those unexpected things, but it only affects records that you can’t lock immediately. If you have a record that 2 users are fighting to update and you try to lock it using exclusive-lock, one of them will get it, and the other will get it later. The only problem here is the amount of time it takes for both users to get and update the record.

If we start counting since the first user got the record, you’re looking at ~1 second before the second user gets the record. The only question now is where did that 1 second of waiting time come from. The answer is that it’s most likely how often the client tries to get another lock.

Instead of that, it’s better to use the no-wait clause as well. That means that the second user doesn’t get the record, but they know about it immediately. You can then implement a simple retry mechanism, so that instead of waiting for ~1 second, you can try again every 0.1 seconds.

Thankfully this waiting period when doing an exclusive-lock without a no-wait does show up properly within the profiler, so you should be able identify when it happens more easily.

Summary

There are a lot of performance tools, and a lot more database parameters that can be tuned for optimal performance. At the same time, there are a lot of ways in which your code might not be as good as it can be. While fixing existing issues might take a long time, they are now at least visible if you are properly using the tools available.

As you’ll be making more and more performance changes, you’ll slowly gather a knowledgebase of common pitfalls, and start noticing them more as you write code, or do peer reviews, slowly improving the quality of the application.

Starting to implement all the related processes, setting up all the required tools, creating any in-house tools that might be necessary, will take a while. But if done right, once you finish you’ll have a solid base on which you can build further, and add more and more monitoring specific to your application.

The result will be a faster application, and lot more information on what exactly your application is doing. If you are on the cloud, you’ll also get a smaller bill at the end of the month, which will no doubt make someone very happy.

Author: Claudiu Spac, Tech Lead

He is a seasoned tech lead with a decade of experience at Wayfare, specializing in Progress OpenEdge. He has tackled diverse projects, from TV scheduling to warehouse logistics, with a strong emphasis on performance optimization and resolving intricate database challenges.

SEE HOW WE WORK.

Get in touch

our services