[GiNaC-list] Term ordering and compiling C++ code

Mon May 24 14:23:18 CEST 2010

All,

I have written a class to solve the problem.  I expect to post it here in the next day or two.  Because I need to parse each expression anew every time it occurs, it is slow to generate printable C++ code, but it does work, and the code compiles and executes very fast.  (Slow is relative, of course.  On my problem, for which GiNaC generates 50MB of code, this new class takes 5 minutes.  The result is about 700K of code.)

The parsing would be sped up significantly if there was a unique identifier for nodes in a GiNaC expression tree.  Just to see how fast, I pretended ex::gethash() was unique.  Run time was reduced from 5 minutes to well under 1 second.

Is there a method other than this list where I can request that a serial number be added to expression tree nodes?

Thanks,
Doug

Support NPR 20 seconds at a time.  www.twentysecondsatatime.org

--- On Mon, 5/24/10, jros <jros at unavarra.es> wrote:

From: jros <jros at unavarra.es>
Subject: Re: [GiNaC-list] Term ordering and compiling C++ code
To: "GiNaC discussion list" <ginac-list at ginac.de>
Date: Monday, May 24, 2010, 2:17 AM

On Sat, 2010-05-22 at 23:19 +0200, Richard B. Kreckel wrote:

Hi!

jros wrote:
> Although I don't have a solution for your problem, as I'm myself
> addressing similar problems
> matching common subexpressions to variables, in down top manner, I think
> that such a functionality
> is implicitly implemented in GiNaC.
> 
> If I understand GiNaC internal structure correctly, subexpressions
> common to two expresions,
> are frequently  shared internally, to save memory.

This is entirely correct, but...

> So it must be possible to write a print algorithm that goes trough
> an/some expression/s tree, and that replaces
> every shared subexpression (let say sum product) with a variable, that
> again is assigned a expression that would be printed
> in the same way using recursion.  

...first, this sharing is entirely transparent for the user...

You mean that we can not look to the smart ptr of a expression, or some of 

its subexpressions like add and mul to see if there are referenced by more than one

expression.

The idea would be: if an element of a subexpression is referenced more than once we can

call it atom, and printout (for C code) the expression using the atom (avoiding expansion of the subexpression),

and also print atom definition (for C code).

It would be nice to be able to print expresions in GiNaC to see th level of sharing that it is using.

Is sharing also enforced when solving linear equations?. I think this is a really important place for optimization if done.

> Probably allowing/disallowing some kind automatic simplifications (so
> that subexpression sharing expected value increases) can probably help
> to obtain improved results.

...and second, sharing is currently not pursued aggressively! Rather, it 
is exploited whenever it is trivial to do so. (The product rule of 
differentiation is an example where exactly the same terms pop up again 
and again so exploiting sharing comes at no cost.)

It would be nice if sharing aggressiveness could be changed at runtime. without affecting performance for

at least for the level of aggressiveness of the actual implementation.

In this example

e1=a+b

e2=a+b+c

I suppose that no sharing is implied like e2=e1+c, that would be computationally expensive. But, if instead

e2=e1+c

then e1 is referenced by e2?? (I suppose yes).

If this is the case I consider that the level of sharing is respectably good :) . I mean, just defining the

expresions with care would give good results.

The sharing of expressions when diff is really nice.

If parenthesization is kept to a maximum (no expansions made if they are not needed), as I think is the case,

the sharing structure is kept as long as possible, and that is very good also.

So, in my opinion using, what I understand is, the current level of sharing, I mean no changes at all, would allow

to print C output code in a very optimized way.

> I wonder what do the developers think about this.

Well, I think that if the size of generated code is so prohibitively 
large and compiler CSE doesn't help you may be better off writing your 
own algorithm collecting similar terms in an associative array. You 
could then artificially introduce temporaries, in order to help the 
compiler. This would boil down to a more aggressive variant of GiNaC's 
subexpression sharing. What do you think?

I suppose you refer to http://en.wikipedia.org/wiki/Associative_array (I'm not familiar with this).

What would be the "key" and the "value"??.

I think that you propose to go beyond the level of sharing of GiNaC, trying to find common expresions that are not shared by GiNaC. Do you?

So you start traversing the structure and you push in the Associative array whatever subexpression you find,

the "key" will be the subexpression and the "value" a new defined symbol for the atom (and may be the number of references to this atom),  then you substitute the subexpression in the expression with the new atom.

If the subexpression  was already in the array, then no new insertion is made and the subexpression is substituted for the matching atom (the value).

I suppose that to push a subexpression, you first (recursively) apply recursively the previous recipe to it (so it gets atomized before pushing).

I suppose that a important issue is to decide when a expression should be consider atomized, and I think that this is dependent of the internal structure of GiNaC, for example:

If a - b *( c +d +e) *f is a expression.

atom1= c + d + e

atom2=-b*atom1*f

atom3=a + atom2

I suppose that after this, if the expression

b *( c +d +e) *f

is atomized, a new atom will appear

atom4=b*atom1*f

Instead of avoiding the creation of an atom having into account b *( c +d +e) *f=-atom2

I suppose that this would need things like, inserting the subexpression, only if it or its negative are not in the array.

Nevertheless it seems to me that sharing in GiNaC automatically deals with this (or almost).

Fine grained atomization like,

atom1=c+d

atom2=atom1+e

seems more difficult and inefficient to implement, due to the internal expression representation.

In my particular implementation, each time a operation is done the result is atomized, and all the expresions are kept atomized.

To that end I define the special type atom (descent of symbol), that have a pointer to my equivalent (although less optimal) implementation of the associative array.

This special type allows things like implementing making diff print to work flawlessly with atomized expresions (as if they were not). As I dare not  overloading all the GiNaC operators, I only overloading the operators of a matrix class in which all the operations that I need are made.

This implementation (certainly improvable), is optimal in some senses (single representation, maximum sharing an minimum memory, low cost of atomization), nevertheless

fine tunning needs a deep knowledge of the internals of GiNaC.

Thank you very much,

Javier

Bye
   -richy.

-----Inline Attachment Follows-----

_______________________________________________
GiNaC-list mailing list
GiNaC-list at ginac.de
https://www.cebix.net/mailman/listinfo/ginac-list

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.cebix.net/pipermail/ginac-list/attachments/20100524/a6527db7/attachment.htm>