Next: , Previous: Substituting expressions, Up: Methods and functions

### 5.4 Pattern matching and advanced substitutions

GiNaC allows the use of patterns for checking whether an expression is of a certain form or contains subexpressions of a certain form, and for substituting expressions in a more general way.

A pattern is an algebraic expression that optionally contains wildcards. A wildcard is a special kind of object (of class `wildcard`) that represents an arbitrary expression. Every wildcard has a label which is an unsigned integer number to allow having multiple different wildcards in a pattern. Wildcards are printed as ‘\$label’ (this is also the way they are specified in ginsh). In C++ code, wildcard objects are created with the call

```     ex wild(unsigned label = 0);
```

which is simply a wrapper for the `wildcard()` constructor with a shorter name.

Some examples for patterns:

 Constructed as Output as `wild()` ‘\$0’ `pow(x,wild())` ‘x^\$0’ `atan2(wild(1),wild(2))` ‘atan2(\$1,\$2)’ `indexed(A,idx(wild(),3))` ‘A.\$0’

Notes:

• Wildcards behave like symbols and are subject to the same algebraic rules. E.g., ‘\$0+2*\$0’ is automatically transformed to ‘3*\$0’.
• As shown in the last example, to use wildcards for indices you have to use them as the value of an `idx` object. This is because indices must always be of class `idx` (or a subclass).
• Wildcards only represent expressions or subexpressions. It is not possible to use them as placeholders for other properties like index dimension or variance, representation labels, symmetry of indexed objects etc.
• Because wildcards are commutative, it is not possible to use wildcards as part of noncommutative products.
• A pattern does not have to contain wildcards. ‘x’ and ‘x+y’ are also valid patterns.

#### 5.4.1 Matching expressions

The most basic application of patterns is to check whether an expression matches a given pattern. This is done by the function

```     bool ex::match(const ex & pattern);
bool ex::match(const ex & pattern, exmap& repls);
```

This function returns `true` when the expression matches the pattern and `false` if it doesn't. If used in the second form, the actual subexpressions matched by the wildcards get returned in the associative array `repls` with ‘wildcard’ as a key. If `match()` returns false, `repls` remains unmodified.

The matching algorithm works as follows:

• A single wildcard matches any expression. If one wildcard appears multiple times in a pattern, it must match the same expression in all places (e.g. ‘\$0’ matches anything, and ‘\$0*(\$0+1)’ matches ‘x*(x+1)’ but not ‘x*(y+1)’).
• If the expression is not of the same class as the pattern, the match fails (i.e. a sum only matches a sum, a function only matches a function, etc.).
• If the pattern is a function, it only matches the same function (i.e. ‘sin(\$0)’ matches ‘sin(x)’ but doesn't match ‘exp(x)’).
• Except for sums and products, the match fails if the number of subexpressions (`nops()`) is not equal to the number of subexpressions of the pattern.
• If there are no subexpressions, the expressions and the pattern must be equal (in the sense of `is_equal()`).
• Except for sums and products, each subexpression (`op()`) must match the corresponding subexpression of the pattern.

Sums (`add`) and products (`mul`) are treated in a special way to account for their commutativity and associativity:

• If the pattern contains a term or factor that is a single wildcard, this one is used as the global wildcard. If there is more than one such wildcard, one of them is chosen as the global wildcard in a random way.
• Every term/factor of the pattern, except the global wildcard, is matched against every term of the expression in sequence. If no match is found, the whole match fails. Terms that did match are not considered in further matches.
• If there are no unmatched terms left, the match succeeds. Otherwise the match fails unless there is a global wildcard in the pattern, in which case this wildcard matches the remaining terms.

In general, having more than one single wildcard as a term of a sum or a factor of a product (such as ‘a+\$0+\$1’) will lead to unpredictable or ambiguous results.

Here are some examples in ginsh to demonstrate how it works (the `match()` function in ginsh returns ‘FAIL’ if the match fails, and the list of wildcard replacements otherwise):

```     > match((x+y)^a,(x+y)^a);
{}
> match((x+y)^a,(x+y)^b);
FAIL
> match((x+y)^a,\$1^\$2);
{\$1==x+y,\$2==a}
> match((x+y)^a,\$1^\$1);
FAIL
> match((x+y)^(x+y),\$1^\$1);
{\$1==x+y}
> match((x+y)^(x+y),\$1^\$2);
{\$1==x+y,\$2==x+y}
> match((a+b)*(a+c),(\$1+b)*(\$1+c));
{\$1==a}
> match((a+b)*(a+c),(a+\$1)*(a+\$2));
{\$1==b,\$2==c}
(Unpredictable. The result might also be [\$1==c,\$2==b].)
> match((a+b)*(a+c),(\$1+\$2)*(\$1+\$3));
(The result is undefined. Due to the sequential nature of the algorithm
and the re-ordering of terms in GiNaC, the match for the first factor
may be {\$1==a,\$2==b} in which case the match for the second factor
succeeds, or it may be {\$1==b,\$2==a} which causes the second match to
fail.)
> match(a*(x+y)+a*z+b,a*\$1+\$2);
(This is also ambiguous and may return either {\$1==z,\$2==a*(x+y)+b} or
{\$1=x+y,\$2=a*z+b}.)
> match(a+b+c+d+e+f,c);
FAIL
> match(a+b+c+d+e+f,c+\$0);
{\$0==a+e+b+f+d}
> match(a+b+c+d+e+f,c+e+\$0);
{\$0==a+b+f+d}
> match(a+b,a+b+\$0);
{\$0==0}
> match(a*b^2,a^\$1*b^\$2);
FAIL
(The matching is syntactic, not algebraic, and "a" doesn't match "a^\$1"
even though a==a^1.)
> match(x*atan2(x,x^2),\$0*atan2(\$0,\$0^2));
{\$0==x}
> match(atan2(y,x^2),atan2(y,\$0));
{\$0==x^2}
```

#### 5.4.2 Matching parts of expressions

A more general way to look for patterns in expressions is provided by the member function

```     bool ex::has(const ex & pattern);
```

This function checks whether a pattern is matched by an expression itself or by any of its subexpressions.

Again some examples in ginsh for illustration (in ginsh, `has()` returns ‘1’ for `true` and ‘0’ for `false`):

```     > has(x*sin(x+y+2*a),y);
1
> has(x*sin(x+y+2*a),x+y);
0
(This is because in GiNaC, "x+y" is not a subexpression of "x+y+2*a" (which
has the subexpressions "x", "y" and "2*a".)
> has(x*sin(x+y+2*a),x+y+\$1);
1
(But this is possible.)
> has(x*sin(2*(x+y)+2*a),x+y);
0
(This fails because "2*(x+y)" automatically gets converted to "2*x+2*y" of
which "x+y" is not a subexpression.)
> has(x+1,x^\$1);
0
(Although x^1==x and x^0==1, neither "x" nor "1" are actually of the form
"x^something".)
> has(4*x^2-x+3,\$1*x);
1
> has(4*x^2+x+3,\$1*x);
0
(Another possible pitfall. The first expression matches because the term
"-x" has the form "(-1)*x" in GiNaC. To check whether a polynomial
contains a linear term you should use the coeff() function instead.)
```

The method

```     bool ex::find(const ex & pattern, exset& found);
```

works a bit like `has()` but it doesn't stop upon finding the first match. Instead, it appends all found matches to the specified list. If there are multiple occurrences of the same expression, it is entered only once to the list. `find()` returns false if no matches were found (in ginsh, it returns an empty list):

```     > find(1+x+x^2+x^3,x);
{x}
> find(1+x+x^2+x^3,y);
{}
> find(1+x+x^2+x^3,x^\$1);
{x^3,x^2}
(Note the absence of "x".)
> expand((sin(x)+sin(y))*(a+b));
sin(y)*a+sin(x)*b+sin(x)*a+sin(y)*b
> find(%,sin(\$1));
{sin(y),sin(x)}
```

#### 5.4.3 Substituting expressions

Probably the most useful application of patterns is to use them for substituting expressions with the `subs()` method. Wildcards can be used in the search patterns as well as in the replacement expressions, where they get replaced by the expressions matched by them. `subs()` doesn't know anything about algebra; it performs purely syntactic substitutions.

Some examples:

```     > subs(a^2+b^2+(x+y)^2,\$1^2==\$1^3);
b^3+a^3+(x+y)^3
> subs(a^4+b^4+(x+y)^4,\$1^2==\$1^3);
b^4+a^4+(x+y)^4
> subs((a+b+c)^2,a+b==x);
(a+b+c)^2
> subs((a+b+c)^2,a+b+\$1==x+\$1);
(x+c)^2
> subs(a+2*b,a+b==x);
a+2*b
> subs(4*x^3-2*x^2+5*x-1,x==a);
-1+5*a-2*a^2+4*a^3
> subs(4*x^3-2*x^2+5*x-1,x^\$0==a^\$0);
-1+5*x-2*a^2+4*a^3
> subs(sin(1+sin(x)),sin(\$1)==cos(\$1));
cos(1+cos(x))
> expand(subs(a*sin(x+y)^2+a*cos(x+y)^2+b,cos(\$1)^2==1-sin(\$1)^2));
a+b
```

The last example would be written in C++ in this way:

```     {
symbol a("a"), b("b"), x("x"), y("y");
e = a*pow(sin(x+y), 2) + a*pow(cos(x+y), 2) + b;
e = e.subs(pow(cos(wild()), 2) == 1-pow(sin(wild()), 2));
cout << e.expand() << endl;
// -> a+b
}
```

#### 5.4.4 The option algebraic

Both `has()` and `subs()` take an optional argument to pass them extra options. This section describes what happens if you give the former the option `has_options::algebraic` or the latter `subs_options::algebraic`. In that case the matching condition for powers and multiplications is changed in such a way that they become more intuitive. Intuition says that `x*y` is a part of `x*y*z`. If you use these options you will find that `(x*y*z).has(x*y, has_options::algebraic)` indeed returns true. Besides matching some of the factors of a product also powers match as often as is possible without getting negative exponents. For example `(x^5*y^2*z).subs(x^2*y^2==c, subs_options::algebraic)` will return `x*c^2*z`. This also works with negative powers: `(x^(-3)*y^(-2)*z).subs(1/(x*y)==c, subs_options::algebraic)` will return `x^(-1)*c^2*z`.

Please notice: this only works for multiplications and not for locating `x+y` within `x+y+z`.