In my previous post I tried to give a high level overview of how I learned everything I know about programming.
In this post I am going to dive into a specific example. The general
way I have always approached things is by taking them apart and
putting them back together. Along the way I bounce between theory and
practice. That means I will do some research to see what other people
have discovered before, learn the concepts and theories, and then I
will go back to the code and figure out how to apply them. In this
post we’re going to learn how to implement malloc and free.
I started using C back in the early 90s. You may have picked it up
yourself at some point. I learned about the stack and the heap.
When you initialize a variable in a scope it takes memory from the
stack. When you want to initialize a value on the heap you need to
use malloc. I had learned that the stack grows, “from the top” of
the program’s memory. The heap, “grows from the bottom.”
Why is that? I had never really needed to know. Such facts were enough for all of the programs and work I’ve done. And so I decided I wanted to find out.
Use What You Know
Forgive me if I’m wrong about this but I think there’s a generation or
two of folks who were brought up in an era where Google was the tool
people were taught to use when they wanted to know something. If you
wanted to learn how to implement malloc or how to implement a
compiler you would start with a query. And then you look for the
answer.
When StackOverflow began to reign supreme on the Internet, for programmers, this was often the first and final destination. Copy and paste the answer. Move on.
That’s not how I work.
Like when I was a kid, I take things apart. I try to figure out how they work by putting them back together. Fortunately for us, we’re using computers and writing code. It’s hard to break anything. My poor family had piles of broken Speak-And-Spells and radios to deal with in the wake of my quest to learn. These days I have hundreds of little snippets and unfinished programs.
For figuring out how to implement malloc let’s just take the type
signature and figure out how to implement our own using what we know.
Taking Stock
So I already know that when I execute a program on my operating system it creates a process. On my Linux system that program is a binary file in the ELF format. The OS allocates a page of virtual memory, maps my program into that memory, sets up the program counter to point at the main entry point, and probably does some other bits of bookkeeping. Then things go.
I also know the type signatures of the malloc and free functions I
normally use from the standard C library:
void* malloc(size_t);
void free(void*);From looking at this and what I know about using these functions, I
start to pick out what I’m going to need to know in order to implement
these functions myself. In a typical program where I might use
malloc, I use it like this:
uint32_t* values = (uint32_t*)malloc(INIT_VALUES_SIZE * sizeof(uint32_t));I can have many declarations like this in my program. When I later
need to return that memory I call free like this:
free(values);What’s happening under the hood here?
Create a Hypothesis
I’m only using the scientific method in the loosest sense. If you’re a science teacher, you can probably tell I wasn’t a great student. I’m sorry.
The return value of malloc is a void pointer that I cast to a
pointer of the type I want to use it as. The malloc function’s
argument is a size_t: the number of bytes I want to allocate.
When I later want to return that memory to my process I call free.
This function takes a pointer I had previously allocated.
What do I know about pointers in C? The value of a pointer is an
address in memory. It contains no other information: no size, no
type. So how does free know how much memory to return if I only
pass it an address and no size information?
I think there’s probably some kind of book-keeping going on that is
shared between malloc and free. That would mean that if I use
malloc to allocate 13 bytes my process might actually allocate more
than 13 bytes to account for that book keeping data.
What else do I need to know in order to implement malloc? How do I
get memory?
Explore, Research
I recall that one thing I know about processes is that the operating system gives it a page of virtual address space. The operating system gives the process memory, in other words. Which means I should probably figure out how to ask the operating system for more memory.
Now I know of a function called mmap because I’ve used it before for
mapping large files into memory. Maybe I can use that? Time to do
some research.
It turns out that my operating system, Linux, has plenty of built-in
manuals and free source code for me to search. So my first stop is to
the source code for
malloc
itself in the GNU C standard library.
I skim the top of the file which has great overview documentation and
find that it does use mmap but also a system call I hadn’t
encountered before, sbrk:
The ISO C standard says that it must be
unsigned, but a few systems are known not to adhere to this.
Additionally, even when size_t is unsigned, sbrk (which is by
default used to obtain memory from system) accepts signed
arguments, and may not be able to handle size_t-wide arguments
with negative sign bit.
Which I immediately check the manual page for:
$ man sbrk
And bingo:
brk() and sbrk() change the location of the program break, which de‐
fines the end of the process's data segment (i.e., the program break is
the first location after the end of the uninitialized data segment).
Increasing the program break has the effect of allocating memory to the
process; decreasing the break deallocates memory.
Ah, interesting! So from what I know about the ELF format and how a program’s memory is mapped into the process memory when the program is executed, the program break is a pointer to the end of the data segment of my program code.
If we use sbrk, as the manual clearly tells us, this should allocate
memory for us from the bottom up because we’re effectively moving
the address that points to the end of the data segment, giving us more
memory.
We’ve transformed what we knew as a fact, there is a heap in our processes memory space and it grows from the bottom, and figured out the mechanism that makes that work!
Experiment by Hacking
Let’s just write some code! I’m going to start with a simple program
that will use my_alloc and my_free:
#include <stdint.h>
#include <stdio.h>
#include <unistd.h>
int main() {
uint32_t* nums = (uint32_t*)my_alloc(3 * sizeof(uint32_t));
nums[0] = 16;
nums[1] = 32;
nums[2] = 64;
for (int i = 0; i < 3; ++i) {
printf("nums[%d] = %d\n", i, nums[i]);
}
my_free(nums);
return 0;
}Okay and now I will add an implementation for my_alloc and my_free
using a bit of what I learned from the previous phases:
// I figure we need some kind of book keeping, not sure what it needs to
// keep yet but it's something to do with the size and I need to return
// a pointer.
struct heap_t {
size_t size;
void* data;
};
// I notice in the API of malloc/free that we don't pass around this book
// keeping data. So I assume it must be some kind of global data available
// to the functions.
static struct heap_t heap = {
.size = 0,
.data = NULL,
};
// I'm just going to experiment here with `sbrk` here.
void* my_alloc(size_t size) {
void* data = sbrk(size);
if (*(int*)data == -1) {
printf("Nope\n");
return NULL;
}
heap.size = size;
heap.data = data;
return data;
}At this point, I haven’t just read full implementation of glibc’s
malloc; I only skimmed it for clues. I could, and probably will
read through it at some point. But I want to write code at this
point. I want to test my hypothesis and learn if my understanding is
correct, incomplete though it is. I want to take small steps and make
progress and get the reward of learning more.
I need to implement free now which should “undo” the effect of my
allocation. From the manual on sbrk there’s also a related function
called brk which sets the program break. The manual says that
increasing the address of it will effectively deallocate memory
which is what we want.
Which is where I think we start to see the need for book-keeping. In
my simple program after calling my_alloc, my program gets a pointer
and we know that pointers don’t carry any size information. So how is
my_free, which only takes a pointer as an argument, going to know
what to pass to brk to set the program break back and return the
memory? That’s what the heap_t static global is for:
void my_free(void* p) {
int ok = brk(p + heap.size);
if (ok != 0) {
printf("Oh no\n");
return;
}
}Now I can compile this program and run it, low and behold it prints out what I expect:
$ cc -ggdb3 -o heap heap.c
$ ./heap
nums[0] = 16
nums[1] = 32
nums[2] = 64
Is this a good allocator? Heck no. It’s not even a complete one!
But it does demonstrate to me how sbrk and brk work and gives me
new ideas to try out!
Wash, Rinse, Repeat
Now we know something new: we can get more memory from our processes’ memory by moving the program break pointer. I have more questions. Consider this program:
int main()
{
uint32_t* values = (uint32_t*)my_alloc(INIT_VALUES_SIZE * sizeof (uint32_t));
init_values(values);
uint32_t* flags = (uint32_t*)my_alloc(INIT_FLAGS_SIZE * sizeof (uint32_t));
init_flags(flags);
bool ok = process(values, flags);
if (!ok) {
my_free(values);
my_free(flags);
printf("There was a problem!\n");
return -1;
}
do_something_else(values, flags);
return 0;
}There are two calls to my_alloc here. In a real program there
could be several and they could be called upon to allocate memory
dynamically. Our implementation of my_alloc and my_free won’t
work for these kinds of programs because it only allocates one chunk
of data of a particular size and can only free that very same chunk.
And if my program frees a pointer to one chunk it might allocate more later… and we should be able to re-use that memory!
So what will we need to do in order to handle this and make our
my_alloc and my_free more like malloc and free from the
standard library?
Left As An Exercise For the Reader
This is one of the primary ways I learn new things in programming.
It goes like this:
- I ask a question
- I take stock of what I know
- I figure out what is missing, what I need to know
- I might explore and do a little research
- But I start writing code
- I learn a little something and I formulate new questions
- And I answer those…
I repeat this process until I reach my goal. In this case I figured out that I would need to keep track of what memory is free to allocate and what is used. I gravitated towards using a doubly-linked list of free chunks.
And during this process I did look through the standard library
implementation of malloc and I did some research online and learned
about the free list data structure. I was on the right track, my
hypothesis confirmed!
From here there are more places to branch out, more questions to ask. There are problems one encounters when using free lists such as fragmentation. How can we solve that problem? Are there other allocation strategies and book keeping data structures we could use to avoid the performance penalties of a linked data structure and fragmentation?
The answer to these questions uses the same framework as above!
When you want to become the best programmer you can be the skill to learn isn’t how to search for the answer or ask an LLM.
What you want to develop is the skill to find the answers on your own. There’s no short cut to this process. You want to learn how to break down the problem and tell a computer how to solve it for you.
The best way to gain this skill is to practice it. When you have a question don’t ask for the answer. Learn how to answer the question for yourself.
Use your searching skills when you want to compare your notes with the research of others. You will find, in the case of allocators, that there are many ways to write them and there is decades of research here. That may lead you to more questions.
Try answering a few for yourself before you jump to the answer from someone else. You might occasionally find a novel solution to a problem others have been having!
Best of luck out there and happy hacking.