[GiNaC-list] little optimization => lots of Memory usage

Alexei Sheplyakov alexei.sheplyakov at gmail.com
Fri Oct 22 14:45:24 CEST 2010


Hello,

On Wed, Oct 20, 2010 at 8:47 PM, Cristobal Navarro <axischire at gmail.com> wrote:

> yes, it adds some memoery usage, but doing the math i found that it isnt too much,
> for the 8000x8000 problem, each string should be in the worst case length 21
> (for example string s = "Ld0x1d2x3d4x5d6x7d8x9" ), assuming std::string is 1
> byte per length unit:
> stringMemory <= 21 bytes * 64M <=  1344Mbytes <= 1.3GB

I'm afraid your calculation is wrong.

Let's forget about GiNaC for a moment and estimate the memory
usage of map<string, map<string, long> >. std::map is a red-black
tree. Every node has 3 pointers (2 children and parent), color (int),
and the actual data (key and value). So, every node consumes

3*sizeof(void *) + sizeof(int) + padding
        + sizeof(string) + sizeof(long) + strlen(TypicalString)

= 4*sizeof(void *) + sizeof(int) + padding
        + sizeof(long) + strlen(TypicalString)

(std::string is essentialy a char* as far as memory usage is
concerned). We got N trees each of N nodes, thus we need

(4*sizeof(void *) + sizeof(int) + padding + sizeof(long) +
strlen(TypicalString))*N^2

bytes. On x86_64 this will be (48 + strlen(TypicalString))*N^2.
If typical string is 16 characters long, this equals 64*N^2

# N     kb
1000    62500
2000    250000
3000    562500
4000    1000000


To check this theory consider the following program (it doesn't use GiNaC):

// mapmap.cpp
#include <map>
#include <string>
#include <sstream>
#include <iostream>
#include <cstdlib>
#include <unistd.h>
#include <sys/resource.h>
using namespace std;

int main(int argc, char** argv)
{
        cout << "sizeof(string) " << sizeof(string) << endl;
        const int dflt_size = 800;
        int size = dflt_size;
        if (argc >= 2)
                size = atoi(argv[1]);
        if (size <= 10)
                size = dflt_size;

        map<string, map<string, long> > big_map;
        for (int i = 0; i < size; ++i) {
                map<string, long> m;
                for (int j = 0; j < size; ++j) {
                        ostringstream oss;
                        oss << "Row " << i << " Col" << j; // ~ 16 chars
                        m[oss.str()] = i + j;
                }
                ostringstream oss;
                oss << "Row" << i;
                big_map[oss.str()] = m;
        }
        struct rusage rus;
        getrusage(RUSAGE_SELF, &rus);
        cout << size << "\t" << rus.ru_maxrss << std::endl << std::flush;
        return 0;
}


$ g++ -O2 -Wall -pipe -march=native -o mapmap mapmap.cpp
$ for i in 1000 2000 3000 4000; do ./mapmap $i; done


On my system (Athlon 64 X2 running Debian GNU/Linux) it prints

# N     kb
1000    112096
2000    450804
3000    1019464
4000    1833620

Experimental values are ~ 2x bigger than ones predicted by theory.
Perhaps I've missed a pointer to vtbl somewhere. Anyway, it's clear
that memory usage is O(N^2), and N = 8000 would consume
~ 2^2 * 2Gb = 8Gb (I'm unable to check this experimentally -- my
computer has 2Gb of RAM).

Conclusion: the binary tree is not suitable large amounts of data.
Perhaps you need to use a different data structure.

Also, use ex instead of ex* (ex is actually a wrapped pointer).

Hope this helps,
         Alexei


More information about the GiNaC-list mailing list