CVE-2026-31431: Copy-Fail

To access material, start machines and answer questions login.

Set up your virtual environment

To successfully complete this room, you'll need to set up your virtual environment. This involves starting the Lab Machine, ensuring you're equipped with the necessary tools and access to tackle the challenges ahead.

Lab machine

Status:Off

Most local privilege escalation exploits are fragile. They depend on precise kernel version offsets, require winning a race condition reliably, or crash the system when timing slips. -2026-31431, nicknamed Copy Fail, is none of those things. A 732-byte Python script with no external packages, run by any unprivileged local user, returns a root shell in seconds.

The researcher's own description of the primitive frames it concisely: "an unprivileged local user can write four controlled bytes into the page cache of any readable file on a system, and use that to gain root."

Copy Fail was assigned -2026-31431 and a v3.1 base score of 7.8 (High). It was reported by Taeyang Lee, a researcher at Theori, who paired his own domain insight into the crypto subsystem with Xint Code, an -assisted code analysis tool also developed at Theori, to scan the kernel's crypto codepaths and surface the bug. The full disclosure is published on the Xint research blog (opens in new tab). The kernel security team was notified on 23 March 2026, with public disclosure following on 29 April 2026. The vulnerability sat silently in virtually every mainstream distribution for nine years, from a 2017 kernel optimisation through to the April 2026 patch.

If you are familiar with -2022-0847 (Dirty Pipe), the class of primitive that drives Copy Fail will already be familiar. Both exploits write into the in-memory page cache of a file without touching the on-disk content, and both bypass file monitoring as a result. The underlying mechanisms differ. Dirty Pipe used a flaw in the pipe subsystem, whereas Copy Fail abuses the kernel's AF_ALG crypto socket combined with a specific AEAD template. Despite the different entry points, both fall into the same attack surface class of kernel-level page cache writes via splice(). The two are compared below.

Property	Dirty Pipe (CVE-2022-0847)	Copy Fail (CVE-2026-31431)
Kernel subsystem	pipe / splice	AF_ALG crypto / splice
Write primitive	Arbitrary via pipe flag bug	Controlled 4-byte via authencesn scratch
Race condition required	Yes (originally)	No
Kernel offsets needed	No	No
On-disk file changed	No	No
File integrity bypass	Yes	Yes

The "No" in the race condition row is what makes Copy Fail substantially more reliable as a weapon. A race-based exploit may fail a fifth of the time, demand a retry loop, or hang the system on a bad attempt. The Copy Fail write is deterministic. It either lands cleanly or it errors without side effects. That reliability drove the rapid weaponisation observed after disclosure, with multiple public reimplementations appearing within 24 hours.

Learning Objectives

By the end of this room, you will be able to:

Explain the page cache write primitive that makes Copy Fail exploitable and why it bypasses file integrity monitoring
Identify the four kernel components (page cache, AF_ALG, authencesn, splice) that combine to create the vulnerability
Execute the proof-of-concept as an unprivileged user and obtain a root shell
Describe the primary detection signals and apply the modprobe mitigation on Ubuntu and Debian systems

Prerequisites

This room assumes comfort with the Linux command line, including basic shell navigation and reading command output, as covered in Linux Fundamentals. It also assumes a working understanding of local privilege escalation concepts as introduced in Linux Privilege Escalation.

Connecting to the Machine

Click the Start Lab Machine button at the top of this task and allow about a minute for it to boot. The room launches an in-browser split-view terminal that is already logged in as the unprivileged user karen, so no further setup is required to follow along. If you prefer to connect from your own terminal over , the credentials are provided in Task 3. Use whichever terminal suits you best for the practical steps in Task 3.

Answer the questions below

I have successfully started my machine.

Copy Fail emerges from the interaction of four independent components of the kernel, each of which behaves correctly in isolation. The vulnerability does not exist in any single component. It exists in what happens when they meet, specifically when a 2017 optimisation intended to make one of them faster failed to account for a data path that was not considered at the time.

In this section, we cover each component in the order it appears in the exploit, before describing how the combined behaviour produces the write primitive.

The Page Cache

The page cache is a region of kernel memory that holds copies of recently-read file contents to avoid repeated disk reads. When any process reads a file on , the kernel reads it from disk once into the page cache, and every subsequent read of that file by any process is served from the cached copy.

The cache is shared. All processes on the same system, including every sharing the host kernel, see the same cached pages for a given file. Writes to a file ordinarily update both the on-disk copy and the cached copy. However, the cached copy can also be modified directly without any change being written back to disk.

A useful analogy is a photocopier with an internal print buffer. The original document remains on the glass, but every print job is reproduced from the buffer rather than from the glass. If the buffer contents are tampered with, every printout is wrong. Inspecting the original document on the glass shows nothing out of place.

This matters because file tools such as AIDE, Tripwire, and IMA work by hashing the on-disk file. They read from disk, not from memory. If the page cache has been corrupted but the disk is untouched, every check passes. The hash check reports the file as clean, while the binary the kernel actually loads when the file is executed is the corrupted in-memory version.

Diagram showing a file on disk with an arrow pointing to the page cache in memory. All processes read from the page cache. The exploit writes to the page cache (labeled: attacker writes here). sha256sum reads from disk (labeled: integrity tools read here). The on-disk file is shown as unchanged.

AF_ALG: The Kernel Crypto Socket

AF_ALG (Address Family: Algorithm, socket family number 38) is a Linux socket interface that exposes the kernel's cryptographic subsystem to userspace. Introduced in kernel 2.6.38, it allows programs to request AEAD encryption and decryption, hashing, and symmetric cipher operations through standard socket calls such as bind(), sendmsg(), and recvmsg().

The property that matters for this exploit is access. AF_ALG is available to unprivileged users by default. No CAP_SYS_ADMIN or other special capability is required to open such a socket. Legitimate software that uses AF_ALG is a short and well-defined set, including cryptsetup, systemd-cryptsetup, kcapi-enc, kcapi-dgst, kcapi-mac, kcapi-speed, and the charon and charon-systemd daemons used by strongSwan for IPsec. This list becomes the baseline for detecting anomalous AF_ALG usage in Task 4.

authencesn and the Scratch Write

authencesn is a Linux kernel AEAD template used for IPsec with Extended Sequence Number (ESN) support. During an AEAD decryption operation, authencesn writes 4 bytes (the lower 32 bits of the Extended Sequence Number, stored as seqno_lo) at the offset assoclen + cryptlen within the output buffer. This write is a scratch operation the algorithm performs to rearrange the sequence number fields before verifying the authentication tag.

The detail that makes this exploitable is the order of operations. The scratch write happens before HMAC tag verification. The algorithm writes its 4 bytes into the output buffer first, then checks whether the HMAC tag is valid. If the HMAC check fails, and in this exploit it is deliberately made to fail, the error is returned to userspace. However, the write has already landed. There is no rollback, no cleanup of the output buffer, and no mechanism to undo a write that has already completed.

A common confusion point is whether a failing HMAC verification implies the operation as a whole was rolled back. It does not. The write happened first. The failure only means the authentication check at the end of the operation rejected the data. The 4 bytes written as a scratch value during processing are already in the output buffer by the time the HMAC result is evaluated.

Both the value and the destination of the write are attacker-controlled. The attacker controls the value through seqno_lo, which is derived from fields in the message they construct. The attacker controls the offset within the output buffer through assoclen and cryptlen. This gives full control over both what is written and where.

splice() and Page References

splice() is a Linux system call that moves data between two file descriptors by transferring page references rather than copying the data. When you splice from a regular file into a socket, the kernel does not copy the file contents. It hands the socket a reference to the same memory pages already used by the file's page cache.

This zero-copy behaviour is what makes splice() fast. It is also what makes it dangerous in this context. After a splice() from /usr/bin/su into an AF_ALG socket, the AF_ALG pipeline holds a reference to the live page cache pages backing /usr/bin/su. Those pages are still kernel-owned page cache pages, not a user-controlled copy.

The 2017 Optimisation

Before 2017, algif_aead (the AEAD implementation in the AF_ALG subsystem) maintained separate source and destination scatterlists for cryptographic operations. A scatterlist is a linked list of memory pages describing where input data comes from and where output should be written.

Commit 72548b093ee3 in kernel 4.14 introduced an in-place optimisation. Instead of keeping separate source and destination scatterlists, the code merged them and set req->src = req->dst. For the common case, where the user supplies data via a normal write, this was safe. Both sides of the assignment referenced user-controlled memory.

The optimisation was written without considering the splice() data path. When pages arrive via splice() from a file, those pages are not user-controlled memory. They are the kernel's own live page cache pages for that file. Setting req->src = req->dst to the same scatterlist meant the "output" scatterlist now pointed directly at the file's page cache pages. The authencesn scratch write targeting the "output" buffer was now writing into the kernel's page cache.

The Resulting Primitive

Combining all four components produces a controlled 4-byte write to an arbitrary offset within the page cache of any file the attacker can open for reading. The attacker controls the written value through seqno_lo and controls the offset through the splice() position combined with assoclen and cryptlen. The HMAC verification fails afterwards, returning an error to userspace, but the write has already completed.

The exploit repeats this primitive approximately 40 times to write successive 4-byte chunks of shellcode into the target file's cached pages. Each iteration opens a fresh AF_ALG socket, splices the target file at the next offset, and calls recvmsg() to trigger the scratch write. Each call corrupts exactly 4 bytes of the page cache. After roughly 40 calls, enough shellcode is in place to overwrite the target binary's execution path.

There is no race condition, no kernel version-specific offset table, and no external tool dependency. The same script runs on every vulnerable kernel, which spans 4.14 (released November 2017) through 6.18.21 in the 6.18 series, and 6.19.0 through 6.19.11 in the 6.19 series, covering nine years of distributions.

Answer the questions below

What year was the optimization introduced that created this vulnerability?

Which AEAD algorithm template performs the scratch write that corrupts the page cache?

What system call transfers page cache pages into the AF_ALG socket without copying them?

The HMAC verification fails after the scratch write is performed. Does this undo the page cache corruption? (Answer Format: Yay or Nay)

The proof-of-concept () script pre-placed on the machine targets /usr/bin/su. It overwrites the cached copy of that binary with shellcode, after which any execution of su runs that shellcode with setuid root privileges. The complete operation takes a few seconds. The exploit used in this task is hosted at GitHub (opens in new tab).

When the Start Lab Machine button is pressed, the room opens a split-view terminal in the browser, already logged in as the unprivileged user karen. To use that terminal, run the steps below directly. To connect from your own machine over SSH instead, use the username karen and the password copyfail2026:

ssh karen@MACHINE_IP

Either approach lands you at the same prompt. The steps below should be followed in order in whichever terminal you choose.

Step 1: Confirm Your Context

Confirm the current user has no elevated privileges:

karen@ubuntu:~$ id
uid=1001(karen) gid=1001(karen) groups=1001(karen)

This is the starting context the exploit requires. No elevated privileges, no special groups, just an ordinary local user.

Step 2: Inspect the Proof of Concept

The exploit script lives at /home/karen/exploit.py. Before running it, take a quick look at its high-level structure:

head -30 /home/karen/exploit.py

The script uses only Python standard library modules. The os module supplies splice() and execve(), socket provides the AF_ALG socket calls, and zlib is used for CRC calculations when building the authentication key blob. There are no pip packages and no compiled extensions. The script runs on any Python 3.10 or later installation. Ubuntu 24.04 ships Python 3.12, so os.splice is available out of the box.

At a high level, the script performs the following steps in sequence:

Opens an AF_ALG socket bound to authencesn(hmac(sha256),cbc(aes))
Calculates the target offset within /usr/bin/su where each shellcode chunk should land
Constructs the AAD so that bytes 4-7 carry the shellcode value to write as seqno_lo
Calls splice() to feed page cache pages from /usr/bin/su into the socket at the calculated offset
Calls recvmsg() to trigger the AEAD decryption, at which point authencesn performs its scratch write and the shellcode bytes land in the page cache
Repeats approximately 40 times to write successive 4-byte shellcode chunks
Calls os.execve("/usr/bin/su", ...) so the kernel loads from the corrupted cache and the shellcode runs

Vertical 5-step exploitation flow diagram. Step 1: socket(AF_ALG) opens AF_ALG socket. Step 2: splice(target_file) brings page cache pages into the crypto pipeline. Step 3: recvmsg() called 40 times, each writing 4 bytes of shellcode into the page cache. Step 4: execve(/usr/bin/su) loads corrupted binary from page cache. Step 5: Root shell obtained.

Step 3: Run the Proof of Concept

Execute the script:

python3 /home/karen/exploit.py

The script prints progress as it writes each 4-byte chunk. After the writes complete, it calls execve() on the target binary. The shell prompt should change and the effective user ID becomes root:

karen@ubuntu:~$ python3 /home/karen/exploit.py
# whoami
uid=0(root) gid=0(root) groups=0(root)

Step 4: Read the Flag

root@ubuntu:~# cat /root/flag.txt
THM{xxxxxxxxxxxxxxxxxxxxxxxxxxx}

Step 5: The sha256sum Moment

While still in the root shell, verify the file just exploited against its expected on-disk hash:

root@ubuntu:~# sha256sum /usr/bin/su
c4d2e053445c5f89d13b68bb54de8d67358e1aa20a2b8f0688cb8a47a32edbdf  /usr/bin/su

That hash matches a freshly-installed, unmodified system. AIDE, Tripwire, and IMA would all report this file as unmodified. A file integrity scan run at this moment returns clean.

The exploit only modified the in-memory page cache copy. The on-disk binary at /usr/bin/su was never written to. File tools read from disk, compute the hash from disk, and compare against a baseline taken from disk. None of that touches the page cache. The corruption used to gain root is invisible to every filesystem-based check.

This is a core property of the vulnerability rather than an incidental side effect. Any detection strategy that relies on watching for changes to setuid binaries will miss this exploit entirely.

Note: Copy Fail is not the first exploit to bypass file monitoring this way. Dirty Pipe (CVE-2022-0847) had the same property. The page cache as an attack surface for file bypass is a recurring theme that predates both CVEs. What distinguishes Copy Fail is its reliability. The absence of a race condition means an attacker can run it once and expect it to work.

Step 6: Exit the Root Shell

exit

When the root shell was spawned, execution passed through the corrupted page cache. As part of cleanup, the PoC calls posix_fadvise(POSIX_FADV_DONTNEED) on /usr/bin/su. This hints to the kernel that those pages are no longer needed, and the kernel evicts them from the page cache. The next time any process reads /usr/bin/su, the kernel reloads the original, clean binary from disk.

For a defender, this means the exploitation window is extremely narrow. The corrupted pages exist in memory only between the last recvmsg() call and the POSIX_FADV_DONTNEED cleanup, a window measured in seconds. By the time a filesystem-based alert fires and a responder opens a terminal, the page cache already shows the original binary. There is nothing to find on disk and nothing in the current page cache. Detection must come from observing the exploitation as it happens, which is covered in Task 4.

Note: Re-running the PoC in Task 4 to verify the mitigation blocks it is supported. The cleanup step evicted the corrupted pages, so the next execution of /usr/bin/su loads cleanly from disk.

Answer the questions below

What is the content of /root/flag.txt?

The exploitation window for Copy Fail is seconds wide. The on-disk file is never modified, the page cache is clean by the time cleanup runs, and the uses only standard library calls that blend in with normal process activity. Filesystem monitoring, file checks, and binary signature validation all miss this entirely.

What remains detectable is the process behaviour, specifically the sequence of system calls that no legitimate application produces at the volumes the exploit requires.

Detection

AF_ALG Socket Creation: The Primary Signal

The exploit must call socket(AF_ALG, SOCK_SEQPACKET, 0) approximately 40 times in rapid succession, once for each 4-byte write chunk. Each call opens a fresh AF_ALG socket, and 40 shellcode bytes require roughly 40 iterations.

The SOCK_SEQPACKET detail matters for detection precision. AF_ALG sockets opened with SOCK_DGRAM are commonly used by hashing and symmetric crypto operations, which produces background noise on busy systems. SOCK_SEQPACKET is the type used for AEAD operations and is far less commonly seen on a healthy system, making it the higher-fidelity filter for this exploit chain.

The list of legitimate processes that open AF_ALG AEAD sockets is short and well known. It includes cryptsetup, systemd-cryptsetup, kcapi-enc, kcapi-dgst, kcapi-mac, kcapi-speed, bluez, iwd, and the charon and charon-systemd daemons used by strongSwan for IPsec. Any process outside that list creating an AF_ALG SOCK_SEQPACKET socket is unusual. A process outside that list creating 40 or more such sockets in a few seconds is almost certainly running this exploit or a derivative of it.

The socket(AF_ALG, ...) call uses domain value 38. In auditd, that translates to -F a0=38. The rule below fires on the very first socket call, before any page cache corruption has occurred:

-a always,exit -F arch=b64 -S socket -F a0=38 -k copy_fail_af_alg
-a always,exit -F arch=b64 -S splice -k copy_fail_splice

Key Syscalls to Monitor

Syscall	What to Watch For	Relevance
`socket(AF_ALG, ...)`	Any process outside expected allowlist	Primary signal, fires before corruption
`splice()`	Splice from a setuid binary FD into a socket FD	Page cache pages entering the crypto pipeline
`recvmsg()` on AF_ALG fd	~40 calls in seconds from same PID	Each call writes 4 bytes of shellcode
`posix_fadvise(DONTNEED)`	Called on a setuid binary shortly after AF_ALG activity	Attacker cleanup, page eviction

Falco Rule Outline

For environments running Falco or a compatible eBPF monitoring tool, the detection logic resembles the rule below. A live Falco installation is not required to follow the example. The rule is included as a reference for how the detection logic is encoded:

- macro: expected_af_alg_processes
  condition: >
    proc.name in (kcapi-enc, kcapi-dgst, kcapi-mac, cryptsetup,
                  kcapi-speed, charon, charon-systemd)

- rule: Potential Copy Fail Exploit (AF_ALG Socket Creation)
  desc: >
    Detects AF_ALG socket creation (family 38) by unexpected processes.
    Primary vector for CVE-2026-31431 (Copy Fail) LPE.
  condition: >
    evt.type = socket and
    evt.arg.domain = AF_ALG and
    evt.res >= 0 and
    not expected_af_alg_processes
  output: >
    Anomalous AF_ALG socket created
    (user=%user.name uid=%user.loginuid command=%proc.cmdline
     pid=%proc.pid container_id=%container.id image=%container.image.repository)
  priority: CRITICAL
  tags: [host, container, exploit, privilege_escalation, cve_2026_31431]

The rule fires on the first socket() call from any process not in the allowlist. At this point, no page cache corruption has occurred yet. An IDS or SIEM that correlates repeated AF_ALG alerts from the same PID with a subsequent execve() of a setuid binary in the same short time window has a high-confidence exploit chain indicator.

Detection Timing

The window for finding evidence after the fact is very short. The posix_fadvise(DONTNEED) cleanup evicts the corrupted pages. By the time a responder opens the machine for triage, the page cache is clean, the disk is clean, and there are no modified binaries to find.

Detection for this class of vulnerability works by watching process behaviour at the time of exploitation, not by examining system state afterwards. Correlating recvmsg() calls on an AF_ALG file descriptor with execve(/usr/bin/su) in the same narrow time window is the indicator to build on. Filesystem forensics alone will not find this.

MITRE ATT&CK Mapping

Technique	MITRE ID	Primary Signal
Local Privilege Escalation via kernel flaw	T1068	AF_ALG socket creation by unexpected process
Escape to Host from Container	T1611	Container ID in Falco alert and subsequent host-level activity
Setuid binary abuse	T1548.001	`execve` of setuid binary after AF_ALG activity
Indicator Removal via page eviction	T1070	`posix_fadvise(DONTNEED)` called on setuid binary

Mitigation

The vulnerability lives in the algif_aead kernel module. Disabling that module removes the exploit's ability to open an AF_ALG socket bound to authencesn. The permanent fix is a kernel update to 6.18.22, 6.19.12, or 7.0 (or a vendor backport of the same fix), with the mainline patch landing as commit a664bf3d603d on 1 April 2026. Until your distribution ships a patched kernel, the modprobe blacklist is the recommended interim mitigation on Ubuntu and Debian systems.

Step 1: Verify the Module Is Loadable

First confirm that algif_aead is a loadable module on this system rather than being compiled directly into the kernel:

modinfo algif_aead

Module information should be returned, including the filename path. If the command returns module information, the modprobe blacklist approach will work.

Step 2: Apply the Modprobe Blacklist

echo "install algif_aead /bin/false" | sudo tee /etc/modprobe.d/disable-algif-aead.conf
sudo rmmod algif_aead 2>/dev/null || true

The first command writes a configuration file telling modprobe to run /bin/false instead of actually loading algif_aead. The second unloads the module if it is currently in memory. The || true on the rmmod line prevents a harmless error from stopping the command if the module is already unloaded.

Step 3: Verify the Block Is Active

sudo modprobe algif_aead

This command should now return an error. The module load is blocked.

Step 4: Confirm the PoC Fails

Re-run the exploit script:

python3 /home/karen/exploit.py

The script should now fail at the very first step. The socket(AF_ALG, ...) call returns an error because the module cannot be loaded, and the rest of the exploit chain never executes. This is the expected outcome once the mitigation is applied.

Warning: The modprobe blacklist only works on distributions where algif_aead is a loadable kernel module. On RHEL, CentOS, and AlmaLinux, `algif_aead` is compiled directly into the kernel (`CONFIG_CRYPTO_USER_API_AEAD=y`). The `modprobe` configuration file is silently ignored on these systems because the module load never goes through `modprobe`. RHEL-family systems require the `grubby` approach.

sudo grubby --update-kernel=ALL --args="initcall_blacklist=algif_aead_init"
sudo reboot

Run sudo grubby --info=ALL | grep initcall_blacklist to verify the argument was applied before rebooting. The change can be reverted with --remove-args="initcall_blacklist=algif_aead_init" once the kernel patch has been applied.

Patch Status at Disclosure

The mainline fix was committed on 1 April 2026 (commit a664bf3d603d), nearly a month before public disclosure on 29 April. The fix reverts the 2017 in-place optimisation entirely, restoring separate req->src and req->dst scatterlists so that page cache pages from splice() can never end up in the output scatterlist. No vendor distribution had shipped a patched kernel on disclosure day. AlmaLinux was the first to release a patched kernel, on 1 May 2026. Ubuntu, Debian, RHEL, SUSE, and others followed in the days and weeks after.

The kernels affected from 4.14 through 6.18.21 in the 6.18 series and 6.19.0 through 6.19.11 in the 6.19 series, covering every mainstream distribution released between late 2017 and the April 2026 patch.

Answer the questions below

What command checks whether algif_aead is a loadable module or compiled into the kernel?

Takeaways

The page cache is shared infrastructure. One process writing to it affects every process sharing the same kernel, including containers that appear to be fully isolated by and capability restrictions. Isolation at the layer does not extend down to the page cache.

Exploits with no race condition and no kernel offsets change the operational risk window. Public reimplementations of Copy Fail in C, Rust, Go, and arm64 surfaced on GitHub within days of disclosure, alongside the original Python release. Weapons this reliable are used quickly, not gradually.

File tools hash from disk, while this exploit writes to memory. For this class of vulnerability, detection requires syscall-level monitoring, whether through auditd rules watching socket(AF_ALG) calls, eBPF-based tools such as Falco tracking process behaviour, or kernel-level telemetry correlating the recvmsg() volume with the subsequent execve(). Filesystem monitoring on its own is not sufficient.

The kernel update to 6.18.22, 6.19.12, or 7.0 should be applied as soon as the distribution makes it available. The modprobe blacklist is an effective interim control on Ubuntu and Debian, but it is not a substitute for patching.

Answer the questions below

I can now exploit CVE-2026-31341!

CVE-2026-31431: Copy-Fail

Task 1IntroductionTask includes a deployable machine

Set up your virtual environment

Learning Objectives

Prerequisites

Connecting to the Machine

Task 2The Vulnerability

The Linux Page Cache

AF_ALG: The Kernel Crypto Socket

authencesn and the Scratch Write

splice() and Page References

The 2017 Optimisation

The Resulting Primitive

Task 3Exploitation

Step 1: Confirm Your Context

Step 2: Inspect the Proof of Concept

Step 3: Run the Proof of Concept

Step 4: Read the Flag

Step 5: The sha256sum Moment

Step 6: Exit the Root Shell

Task 4Detection and Remediation

Detection

Mitigation

Task 5Conclusion

Takeaways

Ready to learn Cyber Security?

The Page Cache