POSTS

Dumping /proc/kcore in 2019

July 4, 2019

In this post I will explain how to use /proc/kcore to read the physical memory (RAM) of a Linux system from userland. I’m also exploring how and why a previous tool for this task (getkcore from volatility) fails under newer (past 4.8) kernels. In parallel I present a simple demonstrational tool to dump the physical memory of a x86-64 system under more recent Linux version.

But first, lets have a look at two virtual files on a Linux system: /proc/kcore and /proc/iomem.

/proc/kcore

/proc/kcore is a file in the virtual /proc filesystem of a Linux machine. It is created by the kernel in fs/proc/kcore.c and allows read access to all the kernels virtual memory space from userland.

When you look at it, the first thing you will notice is its seemingly gigantic size (as displayed by ls -l), often reaching into the hundreds of terabytes (and much larger than all installed memory devices combined). In fact it’s not occupying any disk space at all: as with all other files in /proc, its content is generated on the fly by the kernel whenever the file is read. And it can only be read (writing is neither allowed nor implemented in the kernel), and only with root privileges.

Internally it has the format of an ELF core dump file (ELF Type 4/ET_CORE). That means that it has the same format as a core file from a crashed process; but instead of capturing the (static) state of a single process at the moment of the crash, it provides a real time view into the state of the whole system.

A few words about ELF: the Executable and Linking Format is the file format for executable, shared objects (libraries) and core dumps on Unix-like operating systems. It inner workings are quite complex (and actually turing complete), but only two of its many structures are of interest here:

The ELF header (Elf64_Ehdr): It’s at the start of every ELF file. We need two pieces of information from it: the location and number of entries of the program header table.
The Program segment headers (Elf64_Phdr): An ELF file contains an array of programm header structures. There are various subtypes of program headers, but we care only about the ones marked as PT_LOAD. Each of these headers describe a loadable segment - a part of the file that is loaded into memory. In /proc/kcore, they describe where in the file each portion of the system memory can be found.

More information about ELF can be found here or in the elf manpage.

Every read access to the /proc/kcore virtual file is handled by the function read_kcore in the Linux kernel. If the read access includes the ELF header, program headers or ELF notes, they are generated on the fly and returned to the user. Otherwise, the function matches the file offset to be read to a virtual memory address and copies the content over.

On x86-64 systems, Linux maintains a complete one-to-one map of all physical memory in the kernels virtual address space (kernel identity mapping/paging, see here or here). So by reading the right ranges of kernel virtual memory, one can get a complete copy of the content of the physical memory of that system.

/proc/iomem

The physical RAM of a system is not necessarily at the beginning of the physical address space, nor is it always in one continuous block. To determine which parts of the address space are actual memory (as opposed to memory mapped I/O), we turn to /proc/iomem.

/proc/iomem is an other virtual file, created in kernel/resource.c. It lists the various I/O memory regions that are mapped into the physical address space, including the RAM (see here).

Finding the RAM is quite easy: All RAM ranges are named ‘System RAM’ in /proc/iomem’s output, which also gives us the (physical) addresses of the first and last byte of the range.

getkcore

Getkcore is a tool for dumping /proc/kcore that comes with volatility. The problem with it is simple: it’s not working with newer Linux kernels. On a system with a kernel version greater or equal 4.8, it just creates the output file, but does not write any data to it (and produces no error messages).

So: what goes wrong?

Pre kernel 4.8

Before kernel version 4.8, the kernels virtual mapping of the physical address ranges started at the (constant) virtual address of 0xffff880000000000. This meant that for example a (physical) memory page at the physical address 0x100000 always appeared in the kernels virtual address space at the address 0x100000 + 0xffff880000000000 = 0xffff880000100000. Translating a physical address to it’s counterpart in the kernel identity mapping (and vice versa) was therefor trivial: One just needed to add (or subtract) the constant 0xffff880000000000.

Getkcore heavily relied on this. It’s algorithm was a follows:

Get physical address ranges from /proc/iomem
Add the static offset 0xffff880000000000 to get the corresponding virtual address in the kernel identity mapping
Search the program headers of /proc/kcore for a segment with this virtual address
Dump the segment’s content

(Look here for details)

The Problem: With KASLR (more precise: the KASLR variant introduced in 4.8), the offset is no longer constantly 0xffff880000000000. Instead the offset is (somewhat) randomised (You can find a more detailed explanation and visualisation here).

So when getkcore has the list of RAM ranges from /proc/iomem, it assumes their position in the virtual address space is exactly 0xffff880000000000 bytes above the physical address. It then seaches the program headers of /proc/kcore for a segment with a virtual address that matches the physical address of that range plus 0xffff880000000000. And when it can not find one (there is none there, because the base address was randomized), it fails silently.

(Just to note, it’s not my intention to bash getkcore or the volatility project, from which I learned a lot. I just want to describe my journey from “Hey, that’s cool, I wonder how it works!” over “Why isn’t this working?” to “How could I make it work again?”)

Post kernel 4.8

So, with a kernel greater or equal 4.8 there is a random component in the start address of the kernel identity mapping. This means we can no longer expect to find it at a constant known address in the kernels virtual memory space. Also, the simple offset adding technique to match the physical system RAM addresses from /proc/iomem to the right segment’s virtual address does not work anymore.

But, luckily for us, /proc/kcore is nice and gives us an other way to do this!

The ELF program header contains a field p_paddr. The ELF man page states that this field contains the segment’s physical address on systems ‘for which physical addressing is relevant’. On x86 (both 32-bit and 64-bit), this is generally not the case. But in /proc/kcore, it is used exactly as the name predicts: It contains the physical address for this segment.

And as the physical memory addesses of the System RAM is still available via /dev/iomem, we can now again trivially match each physical RAM range to the corresponding segment in /proc/kcore.

So the new approach goes like this:

get physical address ranges from /proc/iomem
find a segment in /proc/kcore that has this physical address
dump segment content

The code

I wrote a small program to do this:

First, we need to find out at which physical addresses there is actually system RAM (and not some form of memory mapped I/O). We do this by parsing /proc/iomem, looking for lines containing “System RAM”. For each of these lines, we note the start and end addresses in a structure.

int get_system_ram_addrs(struct addr_range *addrs)
{
    FILE *fd;
    char *lineptr = malloc(512);
    size_t n = 512;
    int count = 0;

    if((fd = fopen(IOMEM_FILENAME,"r")) == NULL)
    {
        fprintf(stderr,"Could not open %s\n",IOMEM_FILENAME);
        exit(-1);
    }

    int index = 0;
    while(getline(&lineptr,&n,fd) != -1)
    {
        if(strstr(lineptr,"System RAM"))
        {
            uint64_t start;
            uint64_t end;

            sscanf(lineptr,"%lx-%lx",&start,&end);

            addrs[count].index = index;
            addrs[count].start = start;
            addrs[count].end = end;

            if (++count >= MAX_PHYS_RANGES) {
                fprintf(stderr,"Too many physical ranges\n");
                exit(-1);
            }
        }

        if (lineptr[0] != ' ')
            index++;
    }

    fclose(fd);
    free(lineptr);

    return count;
}

Next, we need to identify the segments in /proc/kcore that match the found physical ranges. That’s done by comparing the start address from /proc/iomem to the p_addr member of each Elf64_Phdr (Program header) structure in /proc/kcore. We save the physical address, the offset in the kcore-file and the segment size.

int match_phdrs(    Elf64_Phdr *prog_hdr,
                    unsigned int num_hdrs,
                    struct addr_range *ranges,
                    unsigned int num_phys_ranges,
                    struct section *sections)
{
    int sections_filled_in = 0;

    for (int i=0;i<num_hdrs;i++)
    {
        for (int j=0;j<num_phys_ranges;j++)
        {
            if (prog_hdr[i].p_paddr == ranges[j].start)
            {
                sections[sections_filled_in].phys_base = ranges[j].start;
                sections[sections_filled_in].file_offset = prog_hdr[i].p_offset;
                sections[sections_filled_in].size = prog_hdr[i].p_memsz;

                sections_filled_in++;
            }
        }
    }

    return sections_filled_in;
}

After that, the dumping of the physical memory can begin. For the dump I chose the LiME file format. In a LiME file, there is a header (containing size and original physical address) for each memory range, followed by its data. The header for the next range follows directly after that.

In this function we create the header and write it to the file. Than we copy the memory content from the right offset in /proc/kcore to the dump file. We do this for every memory range we found.

int write_lime(int kcore_fd,int out_fd,struct section *sections,int num_ranges)
{
    lime_mem_range_header lime_header;
    lime_header.magic = 0x4C694D45;
    lime_header.version = 1;
    memset(&lime_header.reserved,0x00,8);

    for (int i=0;i<num_ranges;i++)
    {
        lime_header.s_addr = sections[i].phys_base;
        lime_header.e_addr = sections[i].phys_base + sections[i].size -1;

        // write lime_mem_range_header
        write(out_fd,&lime_header,sizeof(lime_mem_range_header));

        printf("Copying section %d (0x%llx - 0x%llx)\n",i,lime_header.s_addr,lime_header.e_addr);

        // copy content over
        off64_t pos = lseek64(kcore_fd, sections[i].file_offset, SEEK_SET);
        copy_loop(out_fd,kcore_fd,sections[i].size);
    }
}

And that’s basically all the code needed to dump the physical memory to disk.

I skipped over the main function, because it just ties the other functions together. And there is one other function, copy_loop, which simply does exactly this: copy data from one file descriptor to an other file descriptor in a loop. Using sendfile or copy_file_range would have been more efficient, but both gave me some strange errors when I used them on /proc/kcore, so I chose to do it on my own.

But apart from this, that’s basically it.

The tool

You can find the complete program here.

I’m releasing this program here as a simple Proof-of-Concept tool. It’s intended to demonstrate how physical RAM can be dumped via /proc/kcore. It has undergone only minimal testing (as in: works on my machine) and should not be considered a ready-to-use forensics tool.

Let me repeat this:

THIS TOOL HAS NOT UNDERGONE MUCH TESTING! USE IT AT YOUR OWN RISK!

If you try to use this tool in a real forensics case it might ruin your evidence, set your screen on fire and eat your cat. Or it might not. You have been warned.

For memory acquisition in a read-world scenario you should check out pmem, which uses a similar approach, or LiME.