Atomic Rule
ast-grep has three categories of rules. Let's start with the most basic one: atomic rule.
Atomic rule defines the most basic matching rule that determines whether one syntax node matches the rule or not. There are four kinds of atomic rule: pattern
, kind
, regex
and nthChild
.
pattern
Pattern will match one single syntax node according to the pattern syntax.
rule:
pattern: console.log($GREETING)
The above rule will match code like console.log('Hello World')
.
By default, a string pattern
is parsed and matched as a whole.
Pattern Object
It is not always possible to select certain code with a simple string pattern. A pattern code can be invalid, incomplete or ambiguous for the parser since it lacks context.
For example, to select class field in JavaScript, writing $FIELD = $INIT
will not work because it will be parsed as assignment_expression
. See playground.
We can also use an object to specify a sub-syntax node to match within a larger context. It consists of an object with three properties: context
, selector
and strictness
.
context
: defines the surrounding code that helps to resolve any ambiguity in the syntax.selector
: defines the sub-syntax node kind that is the actual matcher of the pattern.strictness
: optional. defines how strictly pattern will match against nodes.
Let's see how pattern object can solve the ambiguity in the class field example above.
The pattern object below instructs ast-grep to select the field_definition
node as the pattern target.
pattern:
selector: field_definition
context: class A { $FIELD = $INIT }
ast-grep works like this:
- First, the code in
context
,class A { $FIELD = $INIT }
, is parsed as a class declaration. - Then, it looks for the
field_definition
node, specified byselector
, in the parsed tree. - The selected
$FIELD = $INIT
is matched against code as the pattern.
In this way, the pattern is parsed as field_definition
instead of assignment_expression
. See playground in action.
Other examples are function call in Go and function parameter in Rust.
strictness
You can also use pattern object to control the matching strategy with strictness
field.
By default, ast-grep uses a smart strategy to match pattern against the AST node. All nodes in the pattern must be matched, but it will skip unnamed nodes in target code.
For the definition of named and unnamed nodes, please refer to the core concepts doc.
For example, the following pattern function $A() {}
will match both plain function and async function in JavaScript. See playground
// function $A() {}
function foo() {} // matched
async function bar() {} // matched
This is because the keyword async
is an unnamed node in the AST, so the async
in the code to search is skipped. As long as function
, $A
and {}
are matched, the pattern is considered matched.
However, this is not always the desired behavior. ast-grep provides strictness
to control the matching strategy. At the moment, it provides these options, ordered from the most strict to the least strict:
cst
: All nodes in the pattern and target code must be matched. No node is skipped.smart
: All nodes in the pattern must be matched, but it will skip unnamed nodes in target code. This is the default behavior.ast
: Only named AST nodes in both pattern and target code are matched. All unnamed nodes are skipped.relaxed
: Named AST nodes in both pattern and target code are matched. Comments and unnamed nodes are ignored.signature
: Only named AST nodes' kinds are matched. Comments, unnamed nodes and text are ignored.
Deep Dive and More Examples
strictness
is an advanced feature that you may not need in most cases.
If you are interested in more examples and details, please refer to the deep dive doc on ast-grep's match algorithm.
kind
Sometimes it is not easy to write a pattern because it is hard to construct the valid syntax.
For example, if we want to match class property declaration in JavaScript like class A { a = 1 }
, writing a = 1
will not match the property because it is parsed as assigning to a variable.
Instead, we can use kind
to specify the AST node type defined in tree-sitter parser.
kind
rule accepts the tree-sitter node's name, like if_statement
and expression
. You can refer to ast-grep playground for relevant kind
names.
Back to our example, we can look up class property's kind from the playground.
rule:
kind: field_definition
It will match the following code successfully (playground link).
class Test {
a = 123 // match this line
}
Here are some situations that you can effectively use kind
:
- Pattern code is ambiguous to parse, e.g.
{}
in JavaScript can be either object or code block. - It is too hard to enumerate all patterns of an AST kind node, e.g. matching all Java/TypeScript class declaration will need including all modifiers, generics,
extends
andimplements
. - Patterns only appear within specific context, e.g. the class property definition.
kind
+ pattern
is different from pattern object
You may want to use kind
to change how pattern
is parsed. However, ast-grep rules are independent of each other.
To change the parsing behavior of pattern
, you should use pattern object with context
and selector
field. See this FAQ.
regex
The regex
atomic rule will match the AST node by its text against a Rust regular expression.
rule:
regex: "\w+"
TIP
The regular expression is written in Rust syntax, not the popular PCRE like syntax. So some features are not available like arbitrary look-ahead and back references.
You should almost always combine regex
with other atomic rules to make sure the regular expression is applied to the correct AST node. Regex matching is quite expensive and cannot be optimized based on AST node kinds. While kind
and pattern
rules can be only applied to nodes with specific kind_id
for optimized performance.
nthChild
nthChild
is a rule to find nodes based on their indexes in the parent node's children list. In other words, it selects nodes based on their position among all sibling nodes within a parent node. It is very helpful in finding nodes without children or nodes appearing in specific positions.
nthChild
is heavily inspired by CSS's nth-child
pseudo-class, and it accepts similar forms of arguments.
# a number to match the exact nth child
nthChild: 3
# An+B style string to match position based on formula
nthChild: 2n+1
# object style nthChild rule
nthChild:
# accepts number or An+B style string
position: 2n+1
# optional, count index from the end of sibling list
reverse: true # default is false
# optional, filter the sibling node list based on rule
ofRule:
kind: function_declaration # accepts ast-grep rule
TIP
nthChild
's index is 1-based, not 0-based, as in the CSS selector.nthChild
's node list only includes named nodes, not unnamed nodes.
Example
The following rule will match the second number in the JavaScript array.
rule:
kind: number
nthChild: 1
It will match the following code:
const arr = [ 1, 2, 3, ]
// |- match this number
Tips for Writing Rules
Since one rule will have only one AST node in one match, it is recommended to first write the atomic rule that matches the desired node.
Suppose we want to write a rule which finds functions without a return type. For example, this code would trigger an error:
const foo = () => {
return 1;
}
The first step to compose a rule is to find the target. In this case, we can first use kind: arrow_function
to find function node. Then we can use other rules to filter candidate nodes that does have return type.
Another trick to write cleaner rule is to use sub-rules as fields. Please refer to composite rule for more details.