堆溢出(1)-堆基础

由于可能即将转战其他时间紧迫的项目，开始填坑免得刚学的就忘了。本文主要讲解堆的结构，关于堆溢出部分后面再写。

什么是堆

堆是linux程序运行时的重要内存部分。程序运行过程中，堆提供动态分配的内存，允许程序申请大小未知的内存(废话)。

堆其实就是程序虚拟地址空间的一块连续的线性区域，由低地址向高地址方向增长。一般称管理堆的那部分程序为堆管理器。(又是废话)

下面是老生常谈的linux内存分布图。

目前常见的堆实现主要有下面几个

dlmalloc
- 通用堆分配器
ptmalloc2
- libc使用，衍生自dlmalloc，提供多线程支持
jemalloc
- Android/FireFox等使用
tcmalloc
- google chrome使用
libumem
- Solaris使用
Windows 10 - segment heap
- 自不必多言

本文既然是做pwn，研究的自然就是libc使用的ptmalloc2。

关于ptmalloc2的多线程

在ptmalloc2中，主线程创建的堆叫main arena，不同线程维护不同的堆，称作per thread arena。

Arena数量受CPU限制:

2*核心数 (32bit)
8*核心数 (64bit)

堆管理相关的主要数据结构

想了解堆，先从管理堆的结构开始。主要有下面几个：

arena
malloc_state
chunks
bins

arena不是一个重点，这里不作详细说明。主要关注剩余三个部分。而因为malloc_state算是一个存放所有堆相关信息的结构体，提及这个就得先说清楚堆到底什么样子，所以下面先介绍chunk。

chunk

chunk是内存块的结构，其定义位于malloc.c。堆内存就是通过一个个chunk构成的。无论chunk是大是小，总是遵从下面的结构。

struct malloc_chunk {
  /*前一chunk如果是空闲，则表示它的大小，否则被其用来存储数据*/
  INTERNAL_SIZE_T      mchunk_prev_size;  /* Size of previous chunk (if free).*/
  /* 当前块的大小 size的对齐为0x10  后三位为AWP标志位 */
  INTERNAL_SIZE_T      mchunk_size;       /* Size in bytes, including overhead.*/

  /*双向链表*/
  struct malloc_chunk* fd;         /* double links -- used only if free. 前向指针*/
  struct malloc_chunk* bk;        /*后向指针*/
  /*最小的chunk大小必须包含前四个域*/
  /* Only used for large blocks: pointer to next larger size.  */
  /*这两个结构在large block中使用*/
  struct malloc_chunk* fd_nextsize; /* double links -- used only if free. */
  struct malloc_chunk* bk_nextsize;
};

源码中给出了一些更形象的图.

对于已分配的chunk主要是三个部分，prev_size，size(包括后三位标志位)，user date。因为当一个chunk被分配时，它将会从双链表上拆下，它的fd和bk自然就失去了作用。mem即malloc函数返回给用户的指针。同样的，当这个chunk不在是空闲chunk时，下一个chunnk的prev_size部分自然也没有任何作用，所以也可以被用户使用。

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk, if unallocated (P clear)  |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of chunk, in bytes                     |A|M|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             User data starts here...                          .
	    .                                                               .
	    .             (malloc_usable_size() bytes)                      .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             (size of chunk, but used for application data)    |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of next chunk, in bytes                |A|0|1|
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

对于一个空闲的chunk，其结构大致如下(其实和已分配的遵循同样的结构；这里并没有展示出fd_nextsize和bk_nextsize部分)，这里的head和foot标记的是chunk的size部分，只是为了方便对chunk的size进行操作，在源码中存在两个宏分别是set_head和set_foot就是指的对这两个位置进行设置。

    chunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of previous chunk, if unallocated (P clear)  |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `head:' |             Size of chunk, in bytes                     |A|0|P|
      mem-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Forward pointer to next chunk in list             |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Back pointer to previous chunk in list            |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Unused space (may be 0 bytes long)                .
	    .                                                               .
	    .                                                               |
nextchunk-> +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    `foot:' |             Size of chunk, in bytes                           |
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
	    |             Size of next chunk, in bytes                |A|0|0|
	    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

size表示的是当前chunk的大小，其中后三位用作标志位，含义如下

P
- PREV_INUSE：如果上一个chunk在使用则set(不包括fastbin)
A
- NON_MAIN_ARENA
M
- IS_MMAPPED:当前chunk是不是通过mmap得到的

malloc_state

malloc_state就是内存中用来管理堆的数据结构，源码中表现形式如下

/*保存堆状态的结构体malloc_state*/
struct malloc_state
{
  /* Serialize access.  */
  /*该变量用于控制程序串行访问同一个分配区，当一个线程获取了分配区之后，其它线程要想访问该分配区，就必须等待该线程分配完成后才能够使用。*/
  __libc_lock_define (, mutex);

  /* Flags (formerly in max_fast).  */
  /*flags 记录了分配区的一些标志*/
  int flags;

  /* Set if the fastbin chunks contain recently inserted free blocks.  */
  /* Note this is a bool but not all targets support atomics on booleans.  */
  int have_fastchunks;/*是否存在fastbin*/

  /* Fastbins */
  mfastbinptr fastbinsY[NFASTBINS];/*记录fastbin*/

  /* Base of the topmost chunk -- not otherwise kept in a bin */
  mchunkptr top;/*记录top chunk*/

  /* The remainder from the most recent split of a small request */
  mchunkptr last_remainder;/*分割后剩余部分*/

  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];/* unsorted bin small bin large bin */

  /* Bitmap of bins */
  unsigned int binmap[BINMAPSIZE];/*标识某个bin是否空的map */

  /* Linked list */
  struct malloc_state *next;/*与下一个malloc_state形成双链表*/

  /* Linked list for free arenas.  Access to this field is serialized
     by free_list_lock in arena.c.  */
  struct malloc_state *next_free;

  /* Number of threads attached to this arena.  0 if the arena is on
     the free list.  Access to this field is serialized by
     free_list_lock in arena.c.  */
  INTERNAL_SIZE_T attached_threads;

  /* Memory allocated from the system in this arena.  */
  INTERNAL_SIZE_T system_mem;
  INTERNAL_SIZE_T max_system_mem;
};

malloc_state中有下面几个关键的域

have_fastchunks
- 标识了当前是否存在fast chunk
fastbinsY
- 存放了所有fastbin链的数组
top
- 存放了当前top chunk位置
last_remainder
- 指向最后一次分割后的残留部分
bins
- 用来记录管理和组织空闲chunk的链表的数组
binmap
- 记录每个bin是否为空

关于每个域所存放的值的具体意义后面将会分别解释。

bins

bins是用来管理和组织空闲内存块的链表结构。ptmalloc2共有127个bin。其中62个small bin，64个large bin以及一个unsorted bin。malloc_state的bins数组存放了所有的bins信息。

#define NBINS             128 /*bin的总数  unsorted bin(1)+smallbin(62)+largebin(63)+1 */
malloc_state{
  ...
  /* Normal bins packed as described above */
  mchunkptr bins[NBINS * 2 - 2];/* unsorted bin small bin large bin */
  ...
}

bins数组中总共有254个bin元素，每一个bin都是双链表之后的，用两个元素存储,其中bin[0]bin[1]保存了unsorted bin链表的头指针.bin[2]和bin[3]合起来指示了一个bin双链表(后面会讲到这就是堆的第一个small bin)，其中bin[2]是头指针，bin[3]是尾指针。

回想一下chunk的结构，为了方便编程统一处理，可以将这样连续两个bins的元素也当一个chunk，只是这个chunk是一个没有size和prev_size部分，而是只有fd和bk，这样处理起来就会更加方便一点。

small bin

small bins总共有62组，每组中(或者说每个bin中)的chunk都是相同的大小，组之间以0x10字节作为大小间距(x86系统下为8字节),也就是说每个bin本身是一个先进先出的双向链表。而前面说过bins中从bin[2]开始每一对元素表示一个bin，也就是说small bin在bins中的存放就是从bin[2]bin[3]到bin[124]bin[125]。

x64系统下，small bin中的chunk大小为小于1024(0x400)，而前面说过，chunk对齐是0x10，所以small chunk范围为[0x20,0x3f0](x64下smallbin的的范围，x86下small chunk的size要小于512字节)。为什么最小是0x20，通过源码可以明白。

/* The smallest possible chunk */
/*定义了最小的chunk，这里可以看到最小的chunk大小为chunk到fd_nextsize偏移
 *所以最小的chunk必须有prev_size size fd bk 四个部分 即0x20
 */
#define MIN_CHUNK_SIZE        (offsetof(struct malloc_chunk, fd_nextsize))

large bin

large bin总共有63组，与small bin不同的是，每个large bin中的bin链并不是都存放着相同大小的chunk(其实这个可以想到，因为chunk变大，这样存放会导致bin的内存开销变大)。每个large bin均是一个先进先出的双向链表，在bins数组中是从bin[126]bin[127]到bin[250]bin[251](这里其实有个疑惑，bins总共254个元素，看来最后一组bin[252]bin[253]好像并没使用？)。

前面提到small bin中chunk的最大要小于1024(x64),large bin的最小size其实就是1024，即0x400。large bin每一组组内的间距在源码中给出如下：

/*
  ...
  32 bins of size      64    0x40
  16 bins of size     512    0x200
  8 bins of size    4096    0x1000
  4 bins of size   32768    0x8000
  2 bins of size  262144    0x40000
  1 bin  of size what's left
  ...
*/

这一段是什么意思呢，也就是说，对前32个large bin 他们的组内间隔最大均为0x40，即第一个largebinbin[126]bin[127]范围就应该是[0x400,0x440]，后面的以此类推。

largebin链表是存在顺序的，chunk是按照大小从大到小排列，同时这些chunk也会通过其尾部的fd_nextsize和bk_nextsize串起来，此外为了节省操作，如果有相同大小的chunk加入bin中不会进入这个nextsize链，并且这个chunk会被放入largebin已存在相同大小chunk的后面(源码中可以看到chunk放入largebin的过程是一个反向扫描链表的过程)。

unsorted bin

unsorted bin简言之就是一个垃圾桶，以双向循环链表的形式存在，bins数组以bin[0]bin[1]存放其链表的头。free后的chunk部分会被放入unsorted bin，在malloc时满足一定条件会进行垃圾清理，将其中的chunk进行整理(后面会专门讲malloc与free的流程)。

fastbins

fastbins为了速度而产生(后面还有了tcache)，它在设计上和其他的bins是明显不同的，在malloc_state中fastbinsY数组存放的就是每个fastbin链的头指针。

fastbin中的chunk以单链表的形式构成chunk链
对每个fastbin的链，其中的chunk大小相同。
fastbin采用后进先出的(后释放的会被先申请)
由于是单链表，fastbin的bk指针并无作用，只会用到fd指针
fastbin中的chunk的prev_inuse一直处于set状态，保证他们不会与相邻的空闲chunk合并

malloc_state的fastbinsY数组中存放的就是每个fastbin链的头指针。

下面是glibc源代码中涉及到fastbin的部分内容,为了速度，大量的以宏的形式存在

/*取malloc_state中对应索引的fastbin头部*/
#define fastbin(ar_ptr, idx) ((ar_ptr)->fastbinsY[idx])

/* offset 2 to use otherwise unindexable first 2 bins */
/*这是对fastbin index的计算 malloc会进行fastbin的size检查，进行fastbin attack时需要保证chunk的size合法  得到的index在正确范围*/
#define fastbin_index(sz) \
  ((((unsigned int) (sz)) >> (SIZE_SZ == 8 ? 4 : 3)) - 2)


/* The maximum fastbin request size we support */
/*支持的最大fastbin大小*/
#define MAX_FAST_SIZE     (80 * SIZE_SZ / 4)

/*fastbin数量*/
#define NFASTBINS  (fastbin_index (request2size (MAX_FAST_SIZE)) + 1)
/* 当释放的 chunk 与该 chunk 相邻的空闲 chunk 合并后的大小大于 FASTBIN_CONSOLIDATION_THRESHOLD 时，内存碎片可能比较多了，
 * 就需要把 fast bins 中的 chunk 都进行合并，以减少内存碎片对系统的影响
 */
#define FASTBIN_CONSOLIDATION_THRESHOLD  (65536UL)

默认情况下，在x86系统上，大小为16-64字节，在x64系统中为32-128字节。

top chunk

对于top chunk，在malloc_state中以top指针存放其地址。top chunk不属于任何的bin，并且位于arena的最高地址处，其中不包含任何的空闲或者分配的chunk。简单的理解方式就是如果arena被分配了10000，前面的chunk用了9000，那么这剩下的1000就属于top chunk。只有当无chunk可分配时，才会去使用top chunk。初始化时使用unsorted chunk作为top chunk。

last remainder

last remainder是最后一次分割后剩余的部分。在malloc时会去unsorted bin检查last_remainder,如果满足了一定条件，便会将其分裂，并将剩余的部分标记为last_remainder。这个没啥好说的，对于堆溢出来说不太影响。

binmap

binmap是记录每个bin(不包括fastbin)是否为空的数据结构，可以快速判断bin的情况。binmap中总共有4个元素，每个元素(就叫做map吧)有32位，每一位标识了一个bin的情况。

binmap在源码中表现形式如下，源码中还提供了一些宏来计算某个idx的bin的binmap情况。

/*
   Binmap

    To help compensate for the large number of bins, a one-level index
    structure is used for bin-by-bin searching.  `binmap' is a
    bitvector recording whether bins are definitely empty so they can
    be skipped over during during traversals.  The bits are NOT always
    cleared as soon as bins are empty, but instead only
    when they are noticed to be empty during traversal in malloc.

    binmap是一个位向量，用于记录bin是否绝对为空，以便在遍历期间可以将其跳过。 
    bin不为空时，并不总是立即清除这些位，而是仅当在malloc中遍历时发现它们为空时才清除

 */

/* Conservatively use 32 bits per map word, even if on 64bit system */
/*每个map设置为32位 可以记录32个bin*/
#define BINMAPSHIFT      5
#define BITSPERMAP       (1U << BINMAPSHIFT)/*0x100000  -> 32*/
#define BINMAPSIZE       (NBINS / BITSPERMAP)/*总共需要  bin总数/32  这么多个map*/
/*下面是从binmap中获取某个bin是否为空的相关操作*/
#define idx2block(i)     ((i) >> BINMAPSHIFT)
#define idx2bit(i)       ((1U << ((i) & ((1U << BINMAPSHIFT) - 1))))

#define mark_bin(m, i)    ((m)->binmap[idx2block (i)] |= idx2bit (i))
#define unmark_bin(m, i)  ((m)->binmap[idx2block (i)] &= ~(idx2bit (i)))
#define get_binmap(m, i)  ((m)->binmap[idx2block (i)] & idx2bit (i))

tcache

tcache是在glibc v2.26(ubuntu 17.10)后引入的新的堆管理结构，通过tcache_put()与tcache_get()从tcache中插入与取出链表。tcache出现以前，fastbin是libc的亲儿子，但tcache出现后其优先级甚至高于fast，并且由此带来了一些malloc和free的改变。

small bin范围大小内的chunk都会使用tcache，其行为几乎和fastbin类似，每个tcache链上最多只能有七个chunk。在第一次malloc时会malloc一块内存放在堆头，在其中存放tcache的一些结构体。

tcache主要有两个重要的结构体tcache_entry和tcache_perthread_struct。

tcache_entry是一个单链表结构，跟fastbin很像，也是后进先出单链表，之前的tcache_entry中是只存放指向下一个tcache(或者说chunk的mem)的指针的，后面加入了key域防止double free(free后key会被置为NULL)。而key指向的就是当前线程管理tcache结构体tcache_perthread_struct。

这里嫖一张ctf-wiki的图。

tcache_perthread_struct结构体中维护了两个域entries和counts，entries存放的是所有tcache链的头部，counts是每个tcache链的chunk数量。counts总是不大于7的。tcache链的数量最多有64个。

#if USE_TCACHE
/* We want 64 entries.  This is an arbitrary limit, which tunables can reduce.  */
# define TCACHE_MAX_BINS		64/*tcache链数量*/
# define MAX_TCACHE_SIZE	tidx2usize (TCACHE_MAX_BINS-1)

/* Only used to pre-fill the tunables.  */
# define tidx2usize(idx)	(((size_t) idx) * MALLOC_ALIGNMENT + MINSIZE - SIZE_SZ)

/* When "x" is from chunksize().  */
# define csize2tidx(x) (((x) - MINSIZE + MALLOC_ALIGNMENT - 1) / MALLOC_ALIGNMENT)
/* When "x" is a user-provided size.  */
# define usize2tidx(x) csize2tidx (request2size (x))

/* With rounding and alignment, the bins are...
   idx 0   bytes 0..24 (64-bit) or 0..12 (32-bit)
   idx 1   bytes 25..40 or 13..20
   idx 2   bytes 41..56 or 21..28
   etc.  */

/* This is another arbitrary limit, which tunables can change.  Each
   tcache bin will hold at most this number of chunks.  */
/*每个tcache链上chunk的最大值*/
# define TCACHE_FILL_COUNT 7

/* Maximum chunks in tcache bins for tunables.  This value must fit the range
   of tcache->counts[] entries, else they may overflow.  */
# define MAX_TCACHE_COUNT UINT16_MAX
#endif
...
#if USE_TCACHE

/* We overlay this structure on the user-data portion of a chunk when
   the chunk is stored in the per-thread cache.  */
/*tcache entry结构体 用于链接空闲的chunk结构体，其中的next指针指向下一个相同大小的chunk*/
typedef struct tcache_entry
{
  struct tcache_entry *next;/* next指向的是chunk的user data的开始，而非chunk的开始*/
  /* This field exists to detect double frees.  */
  /*这个域用来检测double free*/
  struct tcache_perthread_struct *key;
} tcache_entry;

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
/*每个thread维护一个tcache_prethread_struct*/
typedef struct tcache_perthread_struct
{
  uint16_t counts[TCACHE_MAX_BINS];//计数器 记录tcache_entry链上空闲chunk的数目，每条链上最多有7个chunk
  tcache_entry *entries[TCACHE_MAX_BINS];//tcache_entry数组，每个都是以单向链表的方式链接了大小相同的的处于空闲状态或者说free 后的chunk
} tcache_perthread_struct;

static __thread bool tcache_shutting_down = false;
static __thread tcache_perthread_struct *tcache = NULL;

/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
/*将一个chunk放回tcache中*/
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  /*chunk转为mem指针*/
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  //设置放回头部
  e->key = tcache;

  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);//计数器自增
}

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
/*传入index  获取对应tcache链的chunk*/
static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];//得到对应的链表头
  tcache->entries[tc_idx] = e->next;//将第一个tcache拿掉
  --(tcache->counts[tc_idx]);//count减一
  e->key = NULL;//拿出的tcache chunk 其key指针从指向tcache_perthreadStruct结构体置为NULL
  return (void *) e;
}

static void
tcache_thread_shutdown (void)
{
  int i;
  tcache_perthread_struct *tcache_tmp = tcache;

  if (!tcache)
    return;

  /* Disable the tcache and prevent it from being reinitialized.  */
  tcache = NULL;
  tcache_shutting_down = true;

  /* Free all of the entries and the tcache itself back to the arena
     heap for coalescing.  */
  for (i = 0; i < TCACHE_MAX_BINS; ++i)
    {
      while (tcache_tmp->entries[i])
	{
	  tcache_entry *e = tcache_tmp->entries[i];
	  tcache_tmp->entries[i] = e->next;
	  __libc_free (e);
	}
    }

  __libc_free (tcache_tmp);
}

关于堆的结构部分大概就这么多，以后想起什么再补充。下一篇介绍堆的分配和释放，也就是malloc和free的具体过程。