Trigger Performance - Monitoring

TL; DR;

Creating a Logger to measure Limit usage in Apex and building Dashboard to monitor improvements (or otherwise) over time.

The Problem

Reviewing the logs is not easy. Sometimes, it’s even impossible. There is a limit to how much can be stored and the rest gets omitted. And having the detailed logs turned on slows everything down too. Finally, you have to actually have the logs turned on. If you’ve not set up a trace flag for the specific user, there will be no logs at all.

When working on improving performance it’s important to be able to compare your before and after. You need to run everything enough times in order to be confident about your results.

Monitoring Apex

I’m facing this challenge right now. Naturally I went looking for some ready made solutions. The best thing I found was this Profiler by Adrian Larson. I can’t really call the code I need to measure directly using his Profiler but the LimitsSnapshot class I found very useful. I had the idea to wrap it in a custom logger, similar to a typical one you would use to persist logs in the database. Then I could add start/stop calls to every part of the code I wanted to measure and collect performance data over time.

Apex Limit Logger

It’s a bit of a “hot needle” thing so bear with me. This is the full class (I will pick out and explain several concepts below). I had to slightly extend the LimitsSnapshot too by adding a few fields and parameters. It’s trivial though so no need to show here.

public without sharing class ApexLimitLogger {
   private static List<LimitsSnapshot> limitSnapshotStack = new List<LimitsSnapshot>();
   private static List<Log__c> measurements = new List<Log__c>();
   private static final String TRANSACTION_KEY = EncodingUtil.convertToHex(
       Crypto.generateDigest('MD5', Blob.valueOf(DateTime.now().getTime().format()))
   );
   private static Integer counter = 0;
   private static final Boolean IS_ASYNC = System.isBatch() || System.isFuture() || System.isQueueable() || System.isScheduled();
 
   private Boolean isOn;
   private String fieldName;
 
   private ApexLimitLogger(SObjectField settingField) {
       Object relevantSetting = settings.get(settingField);
       this.isOn = relevantSetting != null && Boolean.valueOf(relevantSetting);
       this.fieldName = settingField.getDescribe().getName();
   }
 
   @TestVisible
   public static Logging_Settings__c settings {
       get {
           if (settings == null) {
               settings = Logging_Settings__c.getInstance();
           }
           return settings;
       }
       private set;
   }
 
   public static ApexLimitLogger instance(SObjectField settingField) {
       return new ApexLimitLogger(settingField);
   }
 
   public void start(String measureKey, Integer size) {
       if (!this.isOn) {
           return;
       }
       LimitsSnapshot measurement = LimitsSnapshot.getInstance(measureKey, size, counter++);
       limitSnapshotStack.add(measurement);
   }
 
   public void stop(String measureKey) {
       if (!this.isOn) {
           return;
       }
 
       if (limitSnapshotStack.isEmpty() || !limitSnapshotStack.get(limitSnapshotStack.size() - 1).name.equals(measureKey)) {
           Logger.addWarning('ApexLimitLogger: Cannot take measurement, not preiously started: ' + measureKey);
           return;
       }
       LimitsSnapshot startingPoint = limitSnapshotStack.remove(limitSnapshotStack.size() - 1);
       LimitsSnapshot measurement = LimitsSnapshot.since(startingPoint);
 
       Log__c entry = fillTransactionContext(measurement.newSObject());
       entry.CallStackLevel__c = limitSnapshotStack.size();
       entry.SettingName__c = this.fieldName;
       measurements.add(entry);
   }
 
   public void save() {
       insert measurements;
       measurements.clear();
   }
 
   private Log__c fillTransactionContext(Log__c entry) {
       entry.TransactionKey__c = TRANSACTION_KEY;
       entry.LogType__c = 'ApexLimitLogger';
       entry.Timestamp__c = System.now();
       entry.RunningUser__c = UserInfo.getUserId();
       entry.Async__c = IS_ASYNC;
       return entry;
   }
}

Main Concept

The start() method will take a snapshot of the Limits class (how much was consumed up to this point) and put it on a stack. The stop() method will then pop the stack, take another snapshot and do a diff between them to see how much of the Limits was used within the measured part of the code.

Using a stack allows nesting measurements. Like measuring the whole Trigger execution, as well as some of its parts separately. It just has to hold that you start and stop measuring in the LIFO order so that the measures match up. A string “measureKey” is used to identify the common start/stop point. The caller is in charge of producing the right one.

I’ve added an extra field which identifies this nesting and can be used to filter records in reports. When a measure is started I store the stack size to identify the nesting. This is important because the nested logs duplicate the usage of their parents.. We can’t just sum them all together.

..
if (limitSnapshotStack.isEmpty() || !limitSnapshotStack.get(limitSnapshotStack.size() - 1).name.equals(measureKey)) {
    Logger.addWarning('ApexLimitLogger: Cannot take measurement, not previously started: ' + measureKey);
    return; //exception is a bit harsh
}
LimitsSnapshot startingPoint = limitSnapshotStack.remove(limitSnapshotStack.size() - 1);
LimitsSnapshot measurement = LimitsSnapshot.since(startingPoint);
 
Log__c entry = fillTransactionContext(measurement.newSObject());
entry.CallStackLevel__c = limitSnapshotStack.size();
..

Transaction Context

A unique transaction key is generated once per transaction and used in all the logs created. This is how I group the entries of one execution context together. It’s just a random string because there is no other Id that I could find. In some async contexts you can get a JobId, but not in Future and definitely not in Trigger executions or API calls.

private static final String TRANSACTION_KEY = EncodingUtil.convertToHex(
    Crypto.generateDigest('MD5', Blob.valueOf(DateTime.now().getTime().format()))
);

Ordering Log Records

Another extra piece of data required is an Ordinal number. Because of the nesting possibilities and also because of potential overlaps of measurements, I don’t really know which measure is going to be saved first. The logs are difficult to make sense of when they don’t appear in the order in which the measuring started. So I added a simple counter to use in the ordering of reports.

private static Integer counter = 0;
...
public void start(String measureKey, Integer size) {
    ...
    LimitsSnapshot measurement = LimitsSnapshot.getInstance(measureKey, size, counter++);
    limitSnapshotStack.add(measurement);
}

More Context for Analysis

There is some extra info being stored alongside the Limits.

Size helps to put the measurements into perspective later (E.g. how many records were changed in the trigger when it was consuming so many resources). The caller decides what gets stored here as the logger can’t infer anything. Just pass in Trigger.new.size() or whatever the variation is in the calling arguments of the measured code.

Running user is good info and telling the difference between Async and Sync context will be useful too.

Setting Toggle

Finally it’s important to be able to turn this logging on and off easily in case there are problems. Performance cost of this logic is not huge, but it is not 0. We can also have storage shortages and all sorts. Finally, if no-one is looking at the data anymore it makes no sense to record it.

I’m using a hierarchy Custom Setting to do this of course. There is a Checkbox field for every group of measurements I want to be able to control together. Some basic @TestVisible wrapping is in place to be able to mock the settings in tests easily and having all tests skip logging by default. Fairly obvious.

private ApexLimitLogger(SObjectField settingField) {
    Object relevantSetting = settings.get(settingField);
    this.isOn = relevantSetting != null && Boolean.valueOf(relevantSetting);
}
 
@TestVisible
public static Logging_Settings__c settings {
    get {
        if (settings == null) {
            settings = Logging_Settings__c.getInstance();
        }
        return settings;
    }
    private set; }
...
public void start(String measureKey, Integer size) {
    if (!this.isOn) {
        return;
    }
...

Measuring Apex

Finally this is the code that is needed in order to measure some Apex. First in a Trigger.

trigger OpportunityTrigger on Opportunity(after delete, after insert, after update, before insert, before update) {
   ApexLimitLogger log = ApexLimitLogger.instance(Schema.Logging_Settings__c.Measure_OpportunityTrigger__c);
   log.start(Trigger.operationType + '-OpportunityTrigger', (Trigger.isDelete ? Trigger.old.size() : Trigger.new.size()));
 
   fflib_SObjectDomain.triggerHandler(OpportunitiesTriggerHandler.class);
 
   log.stop(Trigger.operationType + '-OpportunityTrigger');
   log.save();
}

And in a custom API service. This would be the same in @AuraEnabled or @Invocable methods.

@RestResource(urlMapping='/some/url/confirm')
global without sharing class MyBookingRestService {
   global class ConfirmBooking {
       public String recordId { get; set; }
       public String responseMode { get; set; }
       public String mode { get; set; }
       public String orderId { get; set; }
   }
 
   global class ConfirmBookingResponse {
       public String status { get; set; }
       public String message { get; set; }
   }
 
   private static ApexLimitLogger perfLog = ApexLimitLogger.instance(Schema.Logging_Settings__c.Measure_BookingConfirmationAPI__c);
 
   @HttpPost
   global static ConfirmBookingResponse confirmBooking(ConfirmBooking request) {
       perfLog.start('MyBookingRestService.confirmBooking', 1);
       try {
           // do the magic
 
           perfLog.stop('MyBookingRestService.confirmBooking');
           perfLog.save();
           return (ConfirmBookingResponse) JSON.deserialize(stringResponse, ConfirmBookingResponse.class);
       } catch (Exception e) {
	         // not logging here, only interested in successful calls
           ConfirmBookingResponse errorResponse = new ConfirmBookingResponse();
           errorResponse.status = 'error';
           errorResponse.message = e.getMessage();
           return errorResponse;
       }
   }
}

It’s quite intrusive, I agree. I haven’t spent all that much time trying to make this more elegant and sophisticated. Surely there are ways to design the code better so that this logger can be “attached” or whatever. Right now this does not bother me. I need to measure. And I don’t intend to go PROD with this. It’s only going to be used in the main testing Full Copy sandbox.

There are some arguments against this of course. It means testing a version of the code that is not technically the same as what will go to PROD eventually. The difference is minimal though and can be believed to be inconsequential. Also, in some cases there are more testing sandboxes so there could be places to do performance testing with this logger and UAT without it.

Data

What does the data look like? Good question. There’s two scenarios here.

When we wrap a clear entry point like @AuraEnabled or @Invocable method or a Rest Web Service we end up with a single log entry with the call stack level of 0 for each invocation. And there may also be various other ones with a higher number if the executed code contains other log calls. This is perfect, as I can filter on the Call Stack Level and average the times or queries in my reports.

Measuring Triggers (regular save operations by users) as the entry point though, that’s a lot trickier. There isn’t anywhere to put these log start and stop calls that would wrap the whole transaction.

I added it to the trigger most outer Apex code, against best practice directly in the Trigger. Still each of the Operation types will be logged separately (BEFORE INSERT, AFTER INSERT). That’s not so bad, the entries will all have Call Stack Level of 0 so we can just add them up for the same Transaction Key. After all that’s what it was for. The problem comes when you want to find an Average for (e.g.) CPU Time of all the transactions each day.

The report has to Group By Transaction Key and SUM – that’s the 1 transaction. But then how to do an Average of those Group Levels sums?

I’m still looking for a way to do this. Investigating PARENTGROUPVAL functions and what not. But that’s proving very very time consuming and I need to be measuring. Refactoring needs to move to Test soon.

So for now, I’ve just created a little apex script (which I will surely convert into a proper service soon) that I run every day to create aggregated logs adding all the 0 level entries of a given transaction. Then I don’t have to do the SUM and can just average the “single” log records.

Map<String, Log__c> reference = new Map<String, Log__c>();
for (Log__c e : [
   SELECT
       Id,
       TransactionId__c,
       OperationName__c,
       RunningUser__c,
       Timestamp__c,
       AggregateQueries__c,
       Ordinal__c,
       CallStackLevel__c,
       CPU_Time__c,
       Callouts__c,
       Queries__c,
       Heap_Size__c,
       ...
       Size__c
   FROM Log__c
   WHERE
       Exception_Type__c = 'Measure_OpportunityTriggers__c'
       AND Call_Stack_Level__c = 0
       AND (NOT TransactionId__c LIKE 'aggregate%')
       AND CreatedDate > 2022-07-14T00:00:00Z
]) {
   reference.put(e.TransactionId__c, e);
}
 
List<Log__c> aggregates = new List<Log__c>();
for (AggregateResult ar : [
   SELECT
       TransactionId__c,
       COUNT(Id) ids,
       MAX(Size__c) size,
       SUM(CPU_Time__c) cpu,
       SUM(Queries__c) queries,
       SUM(Elapsed_ms__c) time,
       SUM(Heap_Size__c) heap,
       ...
   FROM Log__c
   WHERE
       Exception_Type__c = 'Measure_OpportunityTriggers__c'
       AND Call_Stack_Level__c = 0
       AND (NOT TransactionId__c LIKE 'aggregate%')
       AND CreatedDate > 2022-07-14T00:00:00Z
   GROUP BY TransactionId__c
]) {
   Log__c original = reference.get((String) ar.get('TransactionId__c'));
   Log__c ag = new Log__c();
   ag.TransactionId__c = 'aggregate-' + ar.get('TransactionId__c');
   ag.Call_Stack_Level__c = 0;
   ag.Running_User__c = original.Running_User__c;
   ag.Timestamp__c = original.Timestamp__c;
   ag.ADvendio__Type__c = original.ADvendio__Type__c;
   ag.SettingName__c = 'Measure_CampaignTriggers__c';
   ag.Async__c = original.Async__c;
   ag.Size__c = Integer.valueOf(ar.get('size'));
   ag.CPU_Time__c = Integer.valueOf(ar.get('cpu'));
   ag.Queries__c = Integer.valueOf(ar.get('queries'));
   ag.Elapsed_ms__c = Integer.valueOf(ar.get('time'));
   ag.Heap_Size__c = Integer.valueof(ar.get('heap'));
   ...
   aggregates.add(ag);
}
upsert aggregates TransactionId__c;

I’m trying to obfuscate some real Field and SObject names here so I may have misspelled some things in the code snippets 🙂

And this is my final product. It looks good I think. There’s been some improvements since I first built it. If you think this would be useful for you too, let me know, I might create a package 😉