A bug has been reported in production and there’s no obvious path to the problem. What are you going to do? A build process is failing and the errors make no sense. What are you going to do? A report is showing you things in the data that just shouldn’t happen. What are you going to do?
This kind of debugging is distinct from the kind you do when you’re writing code. This kind of debugging has to treat the system as an artefact to be explored. It doesn’t matter whether you wrote the code, or someone else did – it’s suddenly so far outside your expectations that you don’t know how to tackle it. You’re starting from the outside and trying to find the pieces that aren’t right.
Let’s look at some ideas for what you can do…
Scope out the size of the problem
The first question to ask yourself is, how big a problem is this? Does this affect 1 record, or a 1 million records? Does it mean that a rarely used report is showing the wrong numbers, or that business-critical functionality is broken? Is it going to get worse?
Finding this out will be good for both you and your customer. If your customer thinks that it’s a terrible problem, but it turns out to have limited scope, then this will help to calm the situation and reassure them. If you realise that the problem is bad, and could get bigger, you might need to take part of the system off-line while you sort things out.
Get movement!
You’ll never plow a field by turning it over in your mind
Sitting and staring at the problem won’t help for long. If you can’t see it immediately, then start trying to find the shape of the problem. Generate some theories and test them. If you can’t attack the problem directly yet, can you write some tests to make sure that parts of the system work the way you think they do? Make a cup of tea. Walk the dog. Whatever you do don’t just sit and stare.
Attack it from both ends
Suppose we have a record where changing one field and saving will cause an error, but we don’t know why. Try taking things away from that record to simplify the situation. Keep taking them away until the error disappears, or you can’t take anything else away.
Also work from the other end. Start with a simple record which does not cause an error, and keep making it more like the one we know causes an error until you find the point where the error starts coming in.
One way or another, you’re likely to see what causes the problem, or at least narrow down the scope of the problem
Binary Search
This is a laborious, old-school measure but it can be useful. It is easiest to understand if we consider something like a single Lightning Component. Suppose your component crashes on load and you can’t get any sort of stack-trace from it. To find where the problem lies, start by commenting out half of the component, and load again.
You now know which half of the component causes the error: if it still fails, then the error is in the half you didn’t comment. Otherwise, the error is in the half you commented out.
If it was still failing, then comment out half of what had remained active. If it still succeeded, uncomment half of what you just commented out. Now you will learn which quarter of the component causes the error.
You can keep repeating until the section that you’re commenting / uncommenting is so small that you can see where the problem is.
Fans of computer science algorithms will recognise this as a binary search.
You can apply this methodology to wider system problems, but the “commenting out” step may be more involved and you might not be able to chop things into such clean halves. Regardless, anything which narrows down the problem is still useful.
Use anonymous Apex with rollback and assert
You can use the Anonymous Apex window in Developer Console to recreate the problem. By using rollbacks, you can ensure that no permanent changes are recorded. By using assertions, you can check values along the way. So you might have something like this:
Savepoint sp = Database.setSavepoint(); My_Object__c testRecord = new My_Object__c(Id = 'xxx', Status__c = 'Activate'); // Maybe do some more manipulation here to explore the issue update testRecord; testRecord = [SELECT Status__c FROM My_Object__c WHERE Id = :testRecord]; System.assertEquals('Complete', testRecord.Status__c); Database.rollback(sp);
Now, we can add other assertions to quickly check other assumptions. We can modify some other records before enacting the problematic part. If the code crashes, or assertions fail, then Salesforce rolls back the transaction for us. If not, our final line rolls it back anyway. This gives you a great platform to explore the issue.
Use SeeAllData=true in a sandbox
When you have a particular problem record in a sandbox, you can explore the problem by writing a temporary test with @IsTest(SeeAllData=true). The test can then just query for this specific record. Normally, we wouldn’t use SeeAllData=true, but a temporary test like this can be a useful tool.
By writing a test, you can go iterate your ideas more quickly than you could by testing manually. It also gives you access to the code coverage report, which may be a good tip-off.
Once you have such a test, you can use the “Attack it from both ends” tactic. You can write another test without SeeAllData=true, and try to make it more and more like the real-world until it exhibits the same failing behaviour. Once you’ve done that, you’ve already got a regression test to keep for the future.
Copy package code to local copies
If you are writing packages (traditional managed packages or namespaced unlocked packages), your packaged code can be difficult/impossible to log and tweak in the target org. If you have narrowed it down to a few classes, and you have access to the source of those classes, then make a local un-namespaced copy of them in the target org. Then, you can write tests which explicitly invoke your local copies, or use anonymous apex to drive them (you might need to bring a few more supporting classes in if they are not global).
Doing this means that you can iterate your ideas much more quickly without having to rebuild package versions. Or you can identify that the problem is related to crossing the namespace barrier (i.e. the local copies work with no modification).
Create a minimal reproduction
If you know where the problem is, but you still don’t know why the system is behaving the way it is, then try to create the smallest possible system which demonstrates the problem.
Do this in a dev org or scratch org. The process will often help your understanding. If you manage to create a minimal reproduction and you’re still stuck, then you’ve got something you can take to colleagues, support forums, or Salesforce Support.
Write a question for Salesforce StackExchange
(And maybe even post it)
Have a look at the tips for asking questions on Salesforce StackExchange. Then, try to compose your problem as a decent question.
By the time you have stated the problem clearly, and tried to rule out the obvious, you may have solved it without ever having to hit “post”. Much like rubber duck programming, writing out or talking through the problem in terms that someone else can understand forces you to organise your thoughts.
What about debuggers?
I haven’t talked about debuggers because I don’t like them very much. They have their place, but it’s important to remember that coding and debugging are thinking activities. They are not typing or clicking activities. So, I’m not a big fan of having a tool where you can perform lots of displacement activity by clicking and typing instead of thinking.
By the time you’ve added a couple of watch-expressions and a couple of conditional breakpoints, you have to ask yourself: Would it have been quicker to just print out some values to the log?
Let us know what you think and get in touch.