Next: , Previous: Substituting expressions, Up: Methods and functions


5.4 Pattern matching and advanced substitutions

GiNaC allows the use of patterns for checking whether an expression is of a certain form or contains subexpressions of a certain form, and for substituting expressions in a more general way.

A pattern is an algebraic expression that optionally contains wildcards. A wildcard is a special kind of object (of class wildcard) that represents an arbitrary expression. Every wildcard has a label which is an unsigned integer number to allow having multiple different wildcards in a pattern. Wildcards are printed as ‘$label’ (this is also the way they are specified in ginsh). In C++ code, wildcard objects are created with the call

     ex wild(unsigned label = 0);

which is simply a wrapper for the wildcard() constructor with a shorter name.

Some examples for patterns:

Constructed as Output as
wild() $0
pow(x,wild()) x^$0
atan2(wild(1),wild(2)) atan2($1,$2)
indexed(A,idx(wild(),3)) A.$0

Notes:

5.4.1 Matching expressions

The most basic application of patterns is to check whether an expression matches a given pattern. This is done by the function

     bool ex::match(const ex & pattern);
     bool ex::match(const ex & pattern, exmap& repls);

This function returns true when the expression matches the pattern and false if it doesn't. If used in the second form, the actual subexpressions matched by the wildcards get returned in the associative array repls with ‘wildcard’ as a key. If match() returns false, repls remains unmodified.

The matching algorithm works as follows:

Sums (add) and products (mul) are treated in a special way to account for their commutativity and associativity:

In general, having more than one single wildcard as a term of a sum or a factor of a product (such as ‘a+$0+$1’) will lead to unpredictable or ambiguous results.

Here are some examples in ginsh to demonstrate how it works (the match() function in ginsh returns ‘FAIL’ if the match fails, and the list of wildcard replacements otherwise):

     > match((x+y)^a,(x+y)^a);
     {}
     > match((x+y)^a,(x+y)^b);
     FAIL
     > match((x+y)^a,$1^$2);
     {$1==x+y,$2==a}
     > match((x+y)^a,$1^$1);
     FAIL
     > match((x+y)^(x+y),$1^$1);
     {$1==x+y}
     > match((x+y)^(x+y),$1^$2);
     {$1==x+y,$2==x+y}
     > match((a+b)*(a+c),($1+b)*($1+c));
     {$1==a}
     > match((a+b)*(a+c),(a+$1)*(a+$2));
     {$1==b,$2==c}
       (Unpredictable. The result might also be [$1==c,$2==b].)
     > match((a+b)*(a+c),($1+$2)*($1+$3));
       (The result is undefined. Due to the sequential nature of the algorithm
        and the re-ordering of terms in GiNaC, the match for the first factor
        may be {$1==a,$2==b} in which case the match for the second factor
        succeeds, or it may be {$1==b,$2==a} which causes the second match to
        fail.)
     > match(a*(x+y)+a*z+b,a*$1+$2);
       (This is also ambiguous and may return either {$1==z,$2==a*(x+y)+b} or
        {$1=x+y,$2=a*z+b}.)
     > match(a+b+c+d+e+f,c);
     FAIL
     > match(a+b+c+d+e+f,c+$0);
     {$0==a+e+b+f+d}
     > match(a+b+c+d+e+f,c+e+$0);
     {$0==a+b+f+d}
     > match(a+b,a+b+$0);
     {$0==0}
     > match(a*b^2,a^$1*b^$2);
     FAIL
       (The matching is syntactic, not algebraic, and "a" doesn't match "a^$1"
        even though a==a^1.)
     > match(x*atan2(x,x^2),$0*atan2($0,$0^2));
     {$0==x}
     > match(atan2(y,x^2),atan2(y,$0));
     {$0==x^2}

5.4.2 Matching parts of expressions

A more general way to look for patterns in expressions is provided by the member function

     bool ex::has(const ex & pattern);

This function checks whether a pattern is matched by an expression itself or by any of its subexpressions.

Again some examples in ginsh for illustration (in ginsh, has() returns ‘1’ for true and ‘0’ for false):

     > has(x*sin(x+y+2*a),y);
     1
     > has(x*sin(x+y+2*a),x+y);
     0
       (This is because in GiNaC, "x+y" is not a subexpression of "x+y+2*a" (which
        has the subexpressions "x", "y" and "2*a".)
     > has(x*sin(x+y+2*a),x+y+$1);
     1
       (But this is possible.)
     > has(x*sin(2*(x+y)+2*a),x+y);
     0
       (This fails because "2*(x+y)" automatically gets converted to "2*x+2*y" of
        which "x+y" is not a subexpression.)
     > has(x+1,x^$1);
     0
       (Although x^1==x and x^0==1, neither "x" nor "1" are actually of the form
        "x^something".)
     > has(4*x^2-x+3,$1*x);
     1
     > has(4*x^2+x+3,$1*x);
     0
       (Another possible pitfall. The first expression matches because the term
        "-x" has the form "(-1)*x" in GiNaC. To check whether a polynomial
        contains a linear term you should use the coeff() function instead.)

The method

     bool ex::find(const ex & pattern, exset& found);

works a bit like has() but it doesn't stop upon finding the first match. Instead, it appends all found matches to the specified list. If there are multiple occurrences of the same expression, it is entered only once to the list. find() returns false if no matches were found (in ginsh, it returns an empty list):

     > find(1+x+x^2+x^3,x);
     {x}
     > find(1+x+x^2+x^3,y);
     {}
     > find(1+x+x^2+x^3,x^$1);
     {x^3,x^2}
       (Note the absence of "x".)
     > expand((sin(x)+sin(y))*(a+b));
     sin(y)*a+sin(x)*b+sin(x)*a+sin(y)*b
     > find(%,sin($1));
     {sin(y),sin(x)}

5.4.3 Substituting expressions

Probably the most useful application of patterns is to use them for substituting expressions with the subs() method. Wildcards can be used in the search patterns as well as in the replacement expressions, where they get replaced by the expressions matched by them. subs() doesn't know anything about algebra; it performs purely syntactic substitutions.

Some examples:

     > subs(a^2+b^2+(x+y)^2,$1^2==$1^3);
     b^3+a^3+(x+y)^3
     > subs(a^4+b^4+(x+y)^4,$1^2==$1^3);
     b^4+a^4+(x+y)^4
     > subs((a+b+c)^2,a+b==x);
     (a+b+c)^2
     > subs((a+b+c)^2,a+b+$1==x+$1);
     (x+c)^2
     > subs(a+2*b,a+b==x);
     a+2*b
     > subs(4*x^3-2*x^2+5*x-1,x==a);
     -1+5*a-2*a^2+4*a^3
     > subs(4*x^3-2*x^2+5*x-1,x^$0==a^$0);
     -1+5*x-2*a^2+4*a^3
     > subs(sin(1+sin(x)),sin($1)==cos($1));
     cos(1+cos(x))
     > expand(subs(a*sin(x+y)^2+a*cos(x+y)^2+b,cos($1)^2==1-sin($1)^2));
     a+b

The last example would be written in C++ in this way:

     {
         symbol a("a"), b("b"), x("x"), y("y");
         e = a*pow(sin(x+y), 2) + a*pow(cos(x+y), 2) + b;
         e = e.subs(pow(cos(wild()), 2) == 1-pow(sin(wild()), 2));
         cout << e.expand() << endl;
          // -> a+b
     }

5.4.4 The option algebraic

Both has() and subs() take an optional argument to pass them extra options. This section describes what happens if you give the former the option has_options::algebraic or the latter subs_options::algebraic. In that case the matching condition for powers and multiplications is changed in such a way that they become more intuitive. Intuition says that x*y is a part of x*y*z. If you use these options you will find that (x*y*z).has(x*y, has_options::algebraic) indeed returns true. Besides matching some of the factors of a product also powers match as often as is possible without getting negative exponents. For example (x^5*y^2*z).subs(x^2*y^2==c, subs_options::algebraic) will return x*c^2*z. This also works with negative powers: (x^(-3)*y^(-2)*z).subs(1/(x*y)==c, subs_options::algebraic) will return x^(-1)*c^2*z.

Please notice: this only works for multiplications and not for locating x+y within x+y+z.