🧵
1/ In a previous thread I introduced an idea I call function call graphing. The goal is to represent all of the ways that an adversary could achieve a specific action (Process Read in this case) through APIs. Let's talk through one potential defensive use case.
2/ The idea for function call graphs actually came about in the wake of the MITRE EDR Evaluations. The evaluation protocol and subsequent interpretation of results by vendors caused me to become hyper focused on identifying a measurable way to evaluate detection coverage.
3/ I knew that there were many ways in which attackers could alter their tools to cause it to appear different while still achieving the desired behavior. In many cases, it's not practical or even possible to test EVERY variation which means we MUST select a subset of variations.
4/ The problem with selecting a subset of variation is that it isn't obvious which subset you should choose. We have a tendency to measure our tests quantitatively (we have 7 Kerberoasting tests), but not qualitatively (we choose these 7 tests because...).
5/ Testing is extremely important in detection engineering and purple teaming to ensure that detection rules are meeting their goal. The problem is that there isn't much rigor used in selecting which variations to test.
6/ In my experience, the selection process involves identifying common or novel implementations and integrating them into the testing framework. This isn't a terrible strategy, but it leads to the common error where the set of tests is more similar than different.
7/ Consider these images. Imagine that the outer circle represents the scope of possible iterations for a behavior & the purple circles represent our test coverage. Both include the exact same five purple circles, yet we can see a substantial difference in overall coverage.
8/ These images reveal to us the idea that some subsets of possible iterations provide more optimal coverage than others. For instance, when it comes to OS Credential Dumping: Lsass Memory there are many known tools that we might use as part of our test.
9/ Imagine having the option of choosing between 2 sets of implementations for this behavior. The 1st set includes mimikatz & Invoke-Mimikatz while the 2nd set includes mimikatz & Out-Minidump. Which would you choose? I'd choose the 2nd set because the options are MORE different.
10/ You see, Invoke-Mimikatz is basically just a wrapper that injects mimikatz into memory. This means that besides the mechanism of code execution, they are functionally equivalent. Out-Minidump on the other hand uses a completely different API function to achieve its objective.
11/ Let's take a look at a function call graph for the Process Read action. This shows all of the KNOWN functional paths that applications can take to read the virtual memory of a process. Notice that using this graph it is fairly easy to see that Out-Minidump is more different.
12/ Okay, so it seems there might be some utility in using these function call graphs to find the more optimal set of test implementations, but can we find the MOST optimal set? If so, can we calculate it mathematically instead of using our intuition?
13/ Take a moment to consider how you might answer this question. Each red circle is an entry point into the graph and all paths are directional. That means there are eight different options. If you were asked to select the 3 most optimal starting points, which would you choose?
14/ Now that you've committed, let me tell you the methodology that I've found for answering this question. As mentioned above, it seemed obvious to me that given the choice between mimikatz & Invoke-Mimikatz vs. mimikatz & Out-Minidump the choice would be Out-Minidump, but why?
15/ The only explanation I could find in my head was that mimikatz & Out-Minidump seemed more "different" or "dissimilar". Maybe that was enough to start searching for answers. My first question was whether it was possible to measure the dissimilarity of paths in a graph.
17/ Edit distance (ED) is a way to quantify how dissimilar two strings are. It counts the minimum number of operations that are required to transform one string to the other. This is useful in many applications such as spell checking, DNA analysis, & malware naming.
18/ Levenshtein distance is the canonical ED algorithm and measures three types of operations:
Deletion (D) - sv_host -> svChost
Insertion (I) - sEvchost -> svchost
Substitution (S) - svRhost -> svChost
19/ Okay, so ED is used to measure the number of operations it takes to convert one string to another, but we aren't dealing with strings... we're dealing with paths in a graph. Well what if we converted each path into a string and then compared them? Seem simple?
20/ Using this idea we can convert each node to a letter and then create strings for each possible path. For instance, the path that starts at kernel32!MiniDumpWriteDump (this is the path Out-Minidump uses) would be ADFGH. The path used by ntdll!NtReadVirtualMemory would be GH.
21/ We can then calculate the string for every possible path by analyzing all paths starting at nodes with red borders. This analysis results in 8 possible paths, shown below:
ADFGH (Out-Minidump)
DFGH
FGH
GH
H (dumpert)
BEFGH
EFGH
CEFGH (mimikatz)
22/ A key feature of ED is that it is used for 1 to 1 comparisons, not 1 to many. Therefore, it is important we select a starting path to compare from. Since mimikatz is the canonical example of this technique we will use its path which is CEFGH.
23/ We can then use the algorithm to calculate the ED between CEFGH and the other paths. Below are the results:
ADFGH -> 2
DFGH -> 2
FGH -> 2
GH -> 3
H -> 4 ✅
BEFGH -> 1
EFGH -> 1
It turns out that H is the path that is the most dissimilar from CEFGH according to the ED.
24/ Conceptually, the shorter the ED the more the paths have in common, but you might notice that maybe this isn't always a fair comparison. For instance, both ADFGH & FGH have an ED of 2, but a quick glance at the graph tells us that maybe there's something more to it all.
25/ According to the literature this is a common issue. I found this paper¹ which introduces an algorithm called Contextual Normalized Edit Distance (CED) which assigns a weight to each edit operation based on the context surrounding the operation.
[1]ieeexplore.ieee.org/document/4498345
26/ The paper says, "a number of authors argue that in practice, having to rewrite twice on a string of length 2 is not the same as having to rewrite twice on a string of length 200." CED divides each edit operation locally by the length of the string on which it is applied.
27/ We can recalculate the similarity after implementing the CED in PowerShell:
ADFGH -> 0.4
DFGH -> 0.4
FGH -> 0.45
GH -> 0.78
H -> 1.28 ✅
BEFGH -> 0.2
EFGH -> 0.2
Notice that CED scored ADFGH slightly differently than FGH even though their ED was the same.
28/ After calculating each path's CED relative to CEFGH we can see that H is the most dissimilar. This means that if we were to choose only two paths to test we would choose CEFGH and H. Now what if we want to add a third? CED is for 1:1 string comparison not 1:1:1.
29/ To solve this problem, we can calculate the CED of all strings relative to CEFGH and the CED of all strings relative to H. We can then add each string's CED relative to CEFGH to it's CED relative to H and divide by two giving us an average CED score.
31/ As you probably are already thinking, this calculation can be applied recursively to rank order all paths. I wrote a script called Get-LeastSimilarPath to do just that. If we had to pick only 3 of 8 possible paths, we'd choose CEFGH, H, & ADFGH to maximize our coverage.
32/ The goal of this thread was to show that the function call graph enables us to do more than simply predict functional modifications that attackers might make to avoid detection. It also allows us to intelligently select the variants that provide the most additional coverage.
33/ Okay, so I told you all of that to tell you this. When I showed this approach to @jsecurity101 he had a really cool idea. He integrated the different paths in the format of @redcanary's AtomicTestHarness project.
34/ This means the TH comes with an implementation of each path (except the syscall for now) in the Process Read function call graph. You can then pipe the output of Get-LeastSignificantPath into Invoke-ATHDumpLsass and it will auto execute the variant for the paths you select.
35/ The cool thing is that the function call graph is relatively static meaning that you can take a function call graph that I make and it will be accurate for your organization in almost all scenarios. This makes it a really good opportunity for community collaboration.
36/ If you made it this far, I've included the code for Levenshtein Distance, Contextual Normalized Edit Distance, Get-LeastSignificantPath, & Invoke-ATHDumpLsass in the gist referenced below¹. Please play around with it and any feedback is welcome!
[1]gist.github.com/jaredcatkinson/014fd61dab5b01abfc791923b4a6efec