[GiNaC-list] A bad idea: GiNaC as the symbolic manipulation engine in Sage

Alexei Sheplyakov varg at theor.jinr.ru
Mon Aug 11 20:25:58 CEST 2008


On Mon, Aug 11, 2008 at 05:09:30PM +0200, Burcin Erocal wrote:

> Are you denying that GiNaC is the best "symbol manipulation engine"
> I can find?

That depends on a definition of a "symbol manipulation engine", i.e. what
kind of features do you expect from it (i.e. GiNaC does not implement
polynomial factorization, which is necessary for symbolic integration).

Also to use GiNaC (or Maxima) in an efficient manner you need to forget
your Mathematica (Maple, whatever) experience and learn from scratch.
 
> Your statements also seem to suggest that there is something wrong with
> interactive use.

GiNaC is not designed for this kind of use. Interactive manipulation is kind
of pointless if  the expressions in question consists of 10^6 -- 10^8
of terms, isn't it?

> OK, so I chose a bad example for a benchmark, and this was pointed out
> before.

I think the benchmark is OK. My point is a bit different: the method you've
used to construct the sum is very inefficient, at least for GiNaC and Maxima.
Apparently it's also inefficient for Mathematica (I guess it's also inefficient
for almost any CAS too, but I can't tell for sure):

$ cat mma_bench1.m 

A = 0;
For[i = 0, i <= 10000, i = i + 1, A = A + (x+y)^i; ];

$ time math < mma_bench.m >/dev/null

real    0m15.535s
user    0m15.509s
sys     0m0.016s

Hence Mathematica provides a more efficient method, it's called Sum:

$ cat mma_bench2.m
A = Sum[(x+y)^i, {i, 0, 10000}];

$ time math < mma_bench2.m >/dev/null

real    0m0.035s
user    0m0.028s
sys     0m0.004s

The point is: although both methods are correct, the non-obvious one
performs _3 orders of magnitude_ better. Likewise, GiNaC (and Maxima)
also provide efficient methods for doing some operations. You might want
to learn them before doing some changes to data structures.

> When I said modifying data structures, I meant that you can keep the
> arguments of an operator in a different structure, instead of a list.

I understand (BTW GiNaC uses arrays for that, not lists).

> This would probably only make sense on very large expressions, and at
> this point, I don't know what the penalty for smaller expressions would
> be.

I don't agree. I think the large expression should be kept as flat as
possible for (at least) two reasons:

1. Otherwise eval() and other inherently recursive functions/methods
   would run out of stack.
2. Any other structure requires more memory to store the same terms.
 
> As for different ways of storing the arguments of an operator, we can
> look at methods for storing and calculating with polynomials. One
> method is to use a hash table to keep the monomials

The same method was used in ancient versions of GiNaC. That was a bad idea.
It makes code much more complicated and does not really improve performance
(in some situation it's even worse). 

N.B. You might want to grep GiNaC source for EXPAIRSEQ_USE_HASHTAB.

To reiterate, a flat N-ary tree (which is what GiNaC::expairseq
essentially is) seems to be the best way to store big expressions.
 
> I got the above timing on this machine:
> 
> $ cat /proc/cpuinfo  | grep name
> model name      : Intel(R) Pentium(R) D CPU 3.40GHz

Although the build time depends on optimization options and a number of other
factors, 25 minutes is way too long for such a box (even if you build both
static and shared libraries). Unless you use some insane optimization options,
this looks like a compiler and/or linker bug (see e.g.
http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=376066). What are exactly
the compiler and binutils versions (as in g++ --version; ld --version)?

Anyway, there are a number of ways to speed up the build. Jens already
mentioned --disable-static. Another useful option is to run build in 
a parallel manner, i.e. invoking make as

make -j N 

where N \approx number of CPU cores + 1, and appending '-pipe' option to
CXXFLAGS. Note: this is useful even if your CPU is single core, because
CLN consists of lots of small source files.

Yet another method is to use a fast(er) shell, such as ash or pdksh, i.e.

SHELL=/bin/ash CONFIG_SHELL=/bin/ash /bin/ash configure --disable-static

This works because GNU libtool (which is responsible for building GiNaC
and CLN, as well as a number of other Free and even not so free libraries)
is a shell script. Of course, ash is not suitable as an interactive shell,
but it's very good at parsing/running big scripts (such as GNU libtool).

@developers: libtool version 2.2 is reportedly much faster than 1.5
(which is what we use), so we might want to upgrade to that version
when releasing next version of GiNaC.


Best regards,
	Alexei

-- 
All science is either physics or stamp collecting.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 827 bytes
Desc: Digital signature
Url : http://www.cebix.net/pipermail/ginac-list/attachments/20080811/44e2660e/attachment.sig 


More information about the GiNaC-list mailing list