Macros are evil (tm)

Fri Jun 15 17:22:40 CEST 2001

Hi all,

This is gonna be long.  If you just want to know what you soon need to
do to your source files and are not interested in why, please jump to
the end of this email.

The usual way of implementing polymorphic methods accepting ex
arguments in GiNaC involves checking the arguments for their types.
This results in switch-like statements of the form

ex mul::somemethod(const ex & other)
{
    if (is_ex_of_type(add)) {
        // do this...
    } else if (is_ex_of_type(mul)) {
        // do that...
    } else {
        // do something else...
    }
} 

Okay, language lawyers (*) usually construe those switch statements as
bad.  But since C++ does not have generic multiple dispatch the usual
answer is that they offer some variation of home-grown double
dispatch, maybe by bloating the add and mul classes with overloaded
methods like this:

class basic {
    // ...
    virtual ex somemethod(const add & other);
    virtual ex somemethod(const mul & other);
    ex somemethod(const ex & other);
};

// same for add and mul

mul::somemethod(const ex & other)
{
    return ex.bp.somemethod(*this);
}

So the proper implementation is called by two subsequent lookups in
the virtual function table.  Nothing prohibits us from doing it this
way in GiNaC but for the objects we are dealing with in a CAS so far
we have always chosen the switch way of implementing and found this
much more accessible and readable.  Just consider the various print
functions that recently were changed to accept an object of type
print_context or derived to format the output properly.  The different
ways need to be implemented somewhere and why not deal with all of
them in foo::print().

What is this guy ranting about?!?  Who cares?

Cool, if you don't care.  The only thing that really bothered me so
far is the use of macros at this place.  The definition of
is_of_type(obj, type) by a macro
1) lives outside the namespace and may collide some time,
2) accepts all kind of funny arguments with barely and possibility for
   compile-time checking,
3) does not allow overloading, so there is an is_of_type and another
   is_ex_of_type with the same semantics and
4) is generally crap with respect to readabilty and makes steam come
   out of the ears of language lawyers.

A better approach would be to use a template here.  We could write
is_a<numeric>(foo) where foo is either something derived from basic or
an ex and it will produce the expected outcome.  There will of course
also be an is_exactly_a<tensor>(bar) matching only tensors and not
classes derived from tensor.  We can implement it in exactly the same
way as the macros were implmented, for instance like this:

template <class T> inline bool is_a(const basic & obj)
{
    return dynamic_cast<const T *>(&obj)!=0;
}
template <class T> inline bool is_a(const ex & obj)
{
    return is_a<T>(*obj.bp);
}

The only cause for concern is 

template <class T> inline bool is_exactly_a(const class basic & obj)
{ 
    const T foo; return foo.tinfo()==obj.tinfo();
}

because it has to allocate a temporary.  But this is not a big deal,
since we are allowed to specify template specializations, for instance
in file add.h we would write down

template<> inline bool is_exactly_a<add>(const basic & obj)
{
    return obj.tinfo()==TINFO_add;
}

giving us all the performance back.  We all know that "An Inline
Function is As Fast As a Macro" (an actual section title in GCC's
info page).  So we should do it.  Now.

Surprise!  The inliner in GCC-2.95 does some very poor job at flow
analysis inside if statements when inlined functions return some
boolean (or integer, no matter).  Consider this code:

struct ABC {
    virtual ~ABC() {}
};
struct A : public ABC {};

template <class T> inline bool is_a<T>(const ABC & obj)
{
    return (dynamic_cast<const T *>(&obj)!=0);
}

#define is_of_type(OBJ,TYPE) \
    (dynamic_cast<const TYPE *>(&OBJ)!=0)

#ifdef USE_MACRO
int test(const ABC & e)
{
    if (is_of_type(e,A))
        return 1;
    return 0;
}
#else
int test(const ABC & e)
{
    if (is_a<A>(e))
        return 1;
    return 0;
}
#endif

The compiler generates in the case where USE_MACRO is defined at
preprocessing level:

00000000 <test(ABC const &)>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 08                sub    $0x8,%esp
   6:   8b 45 08                mov    0x8(%ebp),%eax
   9:   85 c0                   test   %eax,%eax
   b:   74 28                   je     35 <test(ABC const &)+0x35>
   d:   83 c4 f8                add    $0xfffffff8,%esp
  10:   50                      push   %eax
  11:   68 00 00 00 00          push   $0x0
                        12: R_386_32    ABC type_info function
  16:   8b 10                   mov    (%eax),%edx
  18:   03 02                   add    (%edx),%eax
  1a:   50                      push   %eax
  1b:   6a 01                   push   $0x1
  1d:   68 00 00 00 00          push   $0x0
                        1e: R_386_32    A type_info function
  22:   ff 72 04                pushl  0x4(%edx)
  25:   e8 fc ff ff ff          call   26 <test(ABC const &)+0x26>
                        26: R_386_PC32  __dynamic_cast
  2a:   85 c0                   test   %eax,%eax
  2c:   74 07                   je     35 <test(ABC const &)+0x35>
  2e:   b8 01 00 00 00          mov    $0x1,%eax
  33:   eb 02                   jmp    37 <test(ABC const &)+0x37>
  35:   31 c0                   xor    %eax,%eax
  37:   c9                      leave
  38:   c3                      ret

while in the inline case it generates this entirely contorted code:

00000000 <test(ABC const &)>:
   0:   55                      push   %ebp
   1:   89 e5                   mov    %esp,%ebp
   3:   83 ec 08                sub    $0x8,%esp
   6:   8b 45 08                mov    0x8(%ebp),%eax
   9:   85 c0                   test   %eax,%eax
   b:   74 24                   je     31 <test(ABC const &)+0x31>
   d:   83 c4 f8                add    $0xfffffff8,%esp
  10:   50                      push   %eax
  11:   68 00 00 00 00          push   $0x0
                        12: R_386_32    ABC type_info function
  16:   8b 10                   mov    (%eax),%edx
  18:   03 02                   add    (%edx),%eax
  1a:   50                      push   %eax
  1b:   6a 01                   push   $0x1
  1d:   68 00 00 00 00          push   $0x0
                        1e: R_386_32    A type_info function
  22:   ff 72 04                pushl  0x4(%edx)
  25:   e8 fc ff ff ff          call   26 <test(ABC const &)+0x26>
                        26: R_386_PC32  __dynamic_cast
  2a:   85 c0                   test   %eax,%eax
  2c:   0f 95 c0                setne  %al
  2f:   eb 02                   jmp    33 <test(ABC const &)+0x33>
  31:   b0 00                   mov    $0x0,%al
  33:   84 c0                   test   %al,%al
  35:   75 09                   jne    40 <test(ABC const &)+0x40>
  37:   31 c0                   xor    %eax,%eax
  39:   eb 0a                   jmp    45 <test(ABC const &)+0x45>
  3b:   90                      nop
  3c:   8d 74 26 00             lea    0x0(%esi,1),%esi
  40:   b8 01 00 00 00          mov    $0x1,%eax
  45:   c9                      leave
  46:   c3                      ret

This turned out to be the performance hammer of about 25% that I saw
when I first tried substituting the macros by templates.  And it also
turns out that the new GCC-3.0 produces better and roughly equivalent
code in both cases, the templated one being even slightly superior as
far as I can see.

So, this is what we'll do: (Remember that GCC-3.0 is going to be
released today.)  In all cases the old macros will be supplemented by
the template functions and specializations for is_exactly_a<>() for
all critical cases.  Inside the library we'll stick with the macros
for some while in the time critical parts until GCC-3.0 catches on.
(BTW: GCC-3.0 produces code that is roughly 10%-30% faster at the
GiNaC benchmarks.  Rejoice and upgrade!)  Eventually these macros will
be entirely phased out.

I just finished checking the changes in to CVS.  In my applications I
used a Perl script to convert from the macros to the new inline
template functions which is supplied herewith WITHOUT ANY WARRANTY
WHATSOEVER.  It actually works for converting the GiNaC library but
you should of course make a backup first.  Once GiNaC 0.9.1 rolls out
(or if you are running from CVS) please apply this converter to your
source files.

----------8<------------------8<---------------------8<-----------------
#! /usr/bin/perl -w

my $file;
my $tmpfile;
if ($file = $ARGV[0]) {
    print STDERR "replacing in file $file\n";
    $tmpfile = "tmp${file}tmp";
} else {
    print STDERR "*** no file given\n";
    exit;
}

open (CPPFILE, $file) or die "Can't open source file: $!\n";
open (TMPFILE, "> $tmpfile") or die "Can't open temporary file: $!\n";
while ($_ = <CPPFILE>) {

    # is_exactly_of_type(foo,bar)  ->  is_exactly_a<bar>(foo)
    s/is_exactly_of_type\(([\*\.a-zA-Z_0-9\(\)\[\]\+\-\>]+)[, ]+([a-zA-Z_0-9]+)\)/is_exactly_a\<$2\>\($1\)/g;

    # is_ex_exactly_of_type(foo,bar)  ->  is_exactly_a<bar>(foo)
    s/is_ex_exactly_of_type\(([\*\.a-zA-Z_0-9\(\)\[\]\+\-\>]+)[, ]+([a-zA-Z_0-9]+)\)/is_exactly_a\<$2\>\($1\)/g;

    # is_of_type(foo,bar) ->  is_a<bar>(foo)
    s/is_of_type\(([\*\.a-zA-Z_0-9\(\)\[\]\+\-\>]+)[, ]+([a-zA-Z_0-9]+)\)/is_a\<$2\>\($1\)/g;

    # is_ex_of_type(foo,bar)  ->  is_a<bar>(foo)
    s/is_ex_of_type\(([\*\.a-zA-Z_0-9\(\)\[\]\+\-\>]+)[, ]+([a-zA-Z_0-9]+)\)/is_a\<$2\>\($1\)/g;

    print TMPFILE "$_";
}
close CPPFILE;
close TMPFILE;
`mv $tmpfile $file`;
---------->8------------------>8--------------------->8-----------------

Regards
     -richy.

(*) "The first thing we do, let's kill all the language lawyers."
    Henry VI, part II, taken from TC++PL-3, chapt 2.
--
Richard Kreckel
<Richard.Kreckel at Uni-Mainz.DE>
<http://wwwthep.physik.uni-mainz.de/~kreckel/>