Calloc VS Malloc+Memset

These days, I needed to create a dynamic array and initialize all its elements to 0. I usually use one of the two most common ways, malloc+memset or calloc. Today, I just wondered if there is any performance penalty of using calloc (which I find a bit easier), or if malloc+memset is slower or if both techniques have around the same performance. I google it a bit and I found many interesting articles (as usual in stackoverflow :)). Here, I will present an example benchmark and the explanation provided for the results.

calloc:

#include <stdlib.h>
#define BLOCK_SIZE 1024*1024*256
int main(void)
{
        int i=0;
        char *buf[10];
        while(i<10)
        {
                buf[i] = calloc(1,BLOCK_SIZE);
                i++;
        }

        return 0;
}

Output of calloc Benchmark:

time ./a.out  
real 0m0.079s                                                             
user 0m0.023s
sys 0m0.056s
 

malloc+memset:

#include <stdlib.h>
#include <string.h>
#define BLOCK_SIZE 1024*1024*256
int main(void)
{
        int i=0;
        char *buf[10];
        while(i<10)
        {
                buf[i] = malloc(BLOCK_SIZE);
                memset(buf[i],0,BLOCK_SIZE);
                i++;
        }

        return 0;
}

Output of malloc + memset benchmark:

time ./a.out  
real 0m0.741s
user 0m0.201s
sys 0m0.530s

Hm, I was really suprised by these results. Consequently, I looked for a proper explanation and I think I found one. There is some ‘kernel magic’ in the way. When you try to (c)allocate a large enough region of memory (like in our benchmark), it takes a lot of time in order to zero all this memory and here comes the kernel to cheat :). There is a page of memory already zeroed set aside. All pages in the new allocation point at this one page of physical ram, which is shared among all processes on the system, so it doesn’t actually use any memory.

The “memset” implementation touches every page in the allocation, resulting in much higher memory usage — it forces the kernel to allocate those pages now, instead of waiting until you actually use them.

The “calloc” implementation just changes a few page tables, consumes very little actual memory, writes to very little memory, and returns.

P.S. If the results are different in your machine, just see the implementation details of calloc, malloc and memset. There is an issue of ‘poor’ calloc implementations in some compilers, if I am correct.

Sources:

http://stackoverflow.com/questions/2605476/calloc-v-s-malloc-and-time-efficiency http://stackoverflow.com/questions/2688466/why-mallocmemset-is-slower-than-calloc http://stackoverflow.com/questions/19813072/calloc-slower-than-malloc-memset http://stackoverflow.com/questions/4316696/difference-in-uses-between-malloc-and-calloc/4319790#4319790

This entry was posted in CS and tagged . Bookmark the permalink.

Leave a comment