Great dataset and analysis! Can a package detect if it is running inside a or that sandbox? How do you deal with code paths that are only executed in certain environments (e.g. Windows)?
Sandbox detection is a pretty large cat-and-mouse game between malware authors and researchers. OTOH, for “is this a suspicious package” (vs detailed analysis what it exactly does) attempts at detecting a sandbox would probably already be a strong signal.
That depends a lot on how it does it. Depending on the sandbox environment, there are some quite lightweight probes that look like they’re just detecting the environment. Some desktop malware detects whether it’s running in a VM but that’s less useful for this kind of malware because it is likely to want to run when in a VM.
The unused code path thing is more interesting to consider. If I were creating a malicious package, I’d write it so that it did something useful but took some input that, in common use, would be likely to be come from an untrusted source. If that contained a magic UUID or something, I’d enter the evil code paths and only then open all of your secret files and send them somewhere else. For example, something processing images or PDFs would look for a particular flag in the metadata and behave well until presented with an image with that metadata. Once you’d used the library in some big system, I’d upload my image and exfiltrate all of your data. With guided fuzzing, you might be able to generate an input that would trigger this case. If I required the malicious input to be signed with a private key and embedded the public key in the package, you probably wouldn’t because your fuzzer would need to be able to craft a cyphertext that decrypted to the right plaintext with an unknown key, which would mean that it was able to break the cryposystem.
Great dataset and analysis! Can a package detect if it is running inside a or that sandbox? How do you deal with code paths that are only executed in certain environments (e.g. Windows)?
Sandbox detection is a pretty large cat-and-mouse game between malware authors and researchers. OTOH, for “is this a suspicious package” (vs detailed analysis what it exactly does) attempts at detecting a sandbox would probably already be a strong signal.
That depends a lot on how it does it. Depending on the sandbox environment, there are some quite lightweight probes that look like they’re just detecting the environment. Some desktop malware detects whether it’s running in a VM but that’s less useful for this kind of malware because it is likely to want to run when in a VM.
The unused code path thing is more interesting to consider. If I were creating a malicious package, I’d write it so that it did something useful but took some input that, in common use, would be likely to be come from an untrusted source. If that contained a magic UUID or something, I’d enter the evil code paths and only then open all of your secret files and send them somewhere else. For example, something processing images or PDFs would look for a particular flag in the metadata and behave well until presented with an image with that metadata. Once you’d used the library in some big system, I’d upload my image and exfiltrate all of your data. With guided fuzzing, you might be able to generate an input that would trigger this case. If I required the malicious input to be signed with a private key and embedded the public key in the package, you probably wouldn’t because your fuzzer would need to be able to craft a cyphertext that decrypted to the right plaintext with an unknown key, which would mean that it was able to break the cryposystem.
Cool stuff. It will be interesting to see what strace digs up for files opened - snooping around for SSH keys for instance.