Atomic Rule
ast-grep has three categories of rules. Let's start with the most basic one: atomic rule.
Atomic rule defines the most basic matching rule that determines whether one syntax node matches the rule or not. There are three kinds of atomic rule: pattern
, kind
and regex
.
pattern
Pattern will match one single syntax node according to the pattern syntax.
rule:
pattern: console.log($GREETING)
The above rule will match code like console.log('Hello World')
.
By default, a string pattern
is parsed and matched as a whole.
We can also use an object to specify a sub-syntax node to match within a larger context. It consists of an object with two properties: context
and selector
.
context
: defines the surrounding code that helps to resolve any ambiguity in the syntax.selector
: defines the sub-syntax node kind that is the actual matcher of the pattern.
For example, to select class field in JavaScript, writing $FIELD = $INIT
will not work because it will be parsed as assignment_expression
.
However, we can provide more code to avoid the ambiguity, and instruct ast-grep to select the field_definition
node as the pattern target.
pattern:
selector: field_definition
context: class A { $FIELD = $INIT }
Other examples are function call in Go and function parameter in Rust.
kind
Sometimes it is not easy to write a pattern because it is hard to construct the valid syntax.
For example, if we want to match class property declaration in JavaScript like class A { a = 1 }
, writing a = 1
will not match the property because it is parsed as assigning to a variable.
Instead, we can use kind
to specify the AST node type defined in tree-sitter parser.
kind
rule accepts the tree-sitter node's name, like if_statement
and expression
. You can refer to ast-grep playground for relevant kind
names.
Back to our example, we can look up class property's kind from the playground.
rule:
kind: field_definition
It will match the following code successfully (playground link).
class Test {
a = 123 // match this line
}
Here are some situations that you can effectively use kind
:
- Pattern code is ambiguous to parse, e.g.
{}
in JavaScript can be either object or code block. - It is too hard to enumerate all patterns of an AST kind node, e.g. matching all Java/TypeScript class declaration will need including all modifiers, generics,
extends
andimplements
. - Patterns only appear within specific context, e.g. the class property definition.
regex
The regex
atomic rule will match the AST node by its text against a Rust regular expression.
rule:
regex: "\w+"
TIP
The regular expression is written in Rust syntax, not the popular PCRE like syntax. So some features are not available like arbitrary look-ahead and back references.
You should almost always combine regex
with other atomic rules to make sure the regular expression is applied to the correct AST node. Regex matching is quite expensive and cannot be optimized based on AST node kinds. While kind
and pattern
rules can be only applied to nodes with specific kind_id
for optimized performance.
Tips for Writing Rules
Since one rule will have only one AST node in one match, it is recommended to first write the atomic rule that matches the desired node.
Suppose we want to write a rule which finds functions without a return type. For example, this code would trigger an error:
const foo = () => {
return 1;
}
The first step to compose a rule is to find the target. In this case, we can first use kind: arrow_function
to find function node. Then we can use other rules to filter candidate nodes that does have return type.
Another trick to write cleaner rule is to use sub-rules as fields. Please refer to composite rule for more details.