TL; DR;
Clearing Test Execution History from a Sandbox automatically as a workaround to a known issue with CLI failing to read Test Results with “Cannot read properties of undefined (reading ‘Status’)” error. A lesson in misdirection by expectation and the need for attention to detail in troubleshooting.
Context
In our team we have a GitHub Workflow on merge to develop
which runs deployment into an integration sandbox. Having run tests successfully as part of this deployment is of course important. A few months back I changed our pipeline from a source:deploy
with tests to one without tests and an independent apex:test:run
step following that.
I think it’s more appropriate this way. The org should always reflect its branch. If tests fail the only way they will be fixed (usually) is by another merge into the branch anyway. The main reason, though, was that it allowed me to save the coverage reports and have them upload to SonarCloud. We can now see the development of our test coverage over time, which is great.
(I was referring to this post by Aaron Winters quite a bit then)
Problem
All was well until a couple of weeks ago, when the job started to fail very frequently while reading results of the tests. There is nothing wrong with the tests themselves (they all pass usually), but the CLI just falls over with this error:
ERROR running force:apex:test:run: Cannot read properties of undefined (reading 'Status')
It matches a known issue well. Our tests are not failing, but the workaround of clearing the test history usually helps just the same. It’s fine doing that every now and then, but the problem slowly started to get on our nerves as it happened more and more. So I’ve decided to try and apply this workaround automatically.
Easy Solution
I didn’t want to spend too much time on this. There is a workaround and it is a recognised known issue after all. Even though that can remain “in review” for a long time.
Some basic googling didn’t find anyone having done this already. I’ve looked into the Tooling API a little bit, but only found a way to delete one Test Result record at a time. Ideally, I’d like to use the CLI anyway. The CI job is already authenticated so it’d be easier. And it’s actually possible to do this in the end. And not so hard at all.
First of all I can use the force:data:query
command which allows an option to --usetoolingapi
. A simple query in the ApexTestResult
exported to a CSV gets me a list of Ids to delete.
sfdx force:data:soql:query --usetoolingapi --query="SELECT Id FROM ApexTestResult" --resultformat=csv > tests.csv
Then in order to delete them, I can actually use the force:data:bulk:delete
command, creating a bulk API delete job. It works fine with the ApexTestResult
object, there isn’t even a switch between tooling/rest API needed. (thanks Michal for the hint here)
sfdx force:data:bulk:delete --csvfile=tests.csv --sobjecttype=ApexTestResult
The only thing here is that the deletion is an asynchronous process. This is not a problem though, I just have to switch the order of things. Instead of making sure there are no tests in the history before starting, I clean up after myself when the validation is over.
GitHub Workflow
The steps in the pipeline job are:
- Deploy source
- Run Tests
- Download Test Results
- Create BULK API delete job
In case any tests fail, the job fails and stops. That means delete doesn’t happen either and I have the Test Results available to investigate.
Just in case there’s some issues with the last two steps (BULK API limits should be way away, but you never know) I make them optional. They don’t really need to fail the workflow and stop subsequent jobs. I also want to be able to run the pipeline without these 2 steps so I’ve added an input variable for that.
Here’s what it looks like in the YAML file
name: Deploy to INT
on:
push:
branches:
- 'develop'
paths:
- 'force-app/**'
workflow_dispatch:
inputs:
delete-test-results:
type: boolean
description: Delete Test Results from Org
default: true
required: false
...
- name: 'Deploy to Int
run: sfdx force:source:deploy --sourcepath=force-app --targetusername=int_org --wait=120
- name: Run Tests in Int
run: sfdx force:apex:test:run --codecoverage --resultformat=json --outputdir=./tests/apex --testlevel=RunLocalTests --targetusername=int_org --wait=120
- name: 'Make report available'
uses: actions/upload-artifact@v2
with:
name: apex-code-coverage
path: tests/apex/test-result-codecoverage.json
- name: 'Cleanup Tests - download'
if: ${{ github.event.inputs.delete-test-results != ‘false’ }}
continue-on-error: true
run: sfdx force:data:soql:query --usetoolingapi --query="SELECT Id FROM ApexTestResult" --targetusername=int_org --resultformat=csv > tests-results.csv
- name: 'Cleanup Tests - delete'
if: ${{ github.event.inputs.delete-test-results != ‘false’ }}
continue-on-error: true
run: sfdx force:data:bulk:delete --csvfile=tests-results.csv --sobjecttype=ApexTestResult --targetusername=int_org
Subtle but Important Details
One last “gotcha” to notice. Even though the input is Boolean
, it’s actually behaving like a String
when accessed via github.event.inputs.
xxx. So the explicit check for ‘false’
is really important here. I could just use inputs.xxx
course, but there is this other point:
GitHub Docs Note: The workflow will also receive the inputs in the github.event.inputs
context. The information in the inputs
context and github.event.inputs
context is identical except that the inputs
context preserves Boolean values as Booleans instead of converting them to strings.
Inputs are only available with the manual Workflow Dispatch trigger so they will be BLANK on push. That’s why the negative check. i.e. != ‘false’
works with both ‘true’
and NULL
. I will admit I haven’t tested this thoroughly but the default Boolean
would most likely be FALSE
and I needed TRUE
.
if: ${{ github.event.inputs.delete-test-results != ‘false’ }}
And that was supposed to be it.
The Real Solution
We’ve had some other problems that caused our pipelines to fail so it took a while to notice, but the problem didn’t actually go away. I spent a week monitoring and testing out and finally I got another clue: running the tests without calculating test coverage worked fine every time (I added an option for this also recently since these jobs had to re-run so many times).
I’ve been googling and re-running and re-testing until I noticed something I completely missed at the very beginning. Or rather I did not pay enough attention to. As I said, our problems were not 100% matching the known issue.
The Test Execution results contained lines with a red cross, but clicking in the detail showed the tests inside it actually succeeded. I was not looking hard enough though because I only came here to delete the history as the workaround had suggested. Seeing something not quite right seemed expected.
Reading the Result carefully this finally led me to find the reason for our problems. I’ve recently unchecked the “Disable Parallel Testing” checkbox. I found our tests worked fine in parallel and when no coverage was calculated they ran in a couple of minutes. I had forgotten about having done that before our problems started though. And as it turns out, this is exactly what’s throwing the CLI out of whack.
“Code coverage from running this test class was not applied due to a conflicting recompilation …” seems to make sense for “Cannot read properties of undefined (reading ‘Status’)”. Can’t read something that’s not been done. I could still be wrong, but the box fixed the pipelines!
Lesson Learned
Always read an error message in full. Even if you think it’s not relevant. Having expectations about what may be causing your problems could speed up the search, but it can also make misdirect you. But you know that.. so do I. But every now and then a reminder is necessary. 🙂