Skip to content

Atomic Rule

ast-grep has three categories of rules. Let's start with the most basic one: atomic rule.

Atomic rule defines the most basic matching rule that determines whether one syntax node matches the rule or not. There are three kinds of atomic rule: pattern, kind and regex.

pattern

Pattern will match one single syntax node according to the pattern syntax.

yaml
rule:
  pattern: console.log($GREETING)

The above rule will match code like console.log('Hello World').

By default, a string pattern is parsed and matched as a whole.

We can also use an object to specify a sub-syntax node to match within a larger context. It consists of an object with two properties: context and selector.

  • context: defines the surrounding code that helps to resolve any ambiguity in the syntax.
  • selector: defines the sub-syntax node kind that is the actual matcher of the pattern.

For example, to select class field in JavaScript, writing $FIELD = $INIT will not work because it will be parsed as assignment_expression.

However, we can provide more code to avoid the ambiguity, and instruct ast-grep to select the field_definition node as the pattern target.

yaml
pattern:
  selector: field_definition
  context: class A { $FIELD = $INIT }

Other examples are function call in Go and function parameter in Rust.

kind

Sometimes it is not easy to write a pattern because it is hard to construct the valid syntax.

For example, if we want to match class property declaration in JavaScript like class A { a = 1 }, writing a = 1 will not match the property because it is parsed as assigning to a variable.

Instead, we can use kind to specify the AST node type defined in tree-sitter parser.

kind rule accepts the tree-sitter node's name, like if_statement and expression. You can refer to ast-grep playground for relevant kind names.

Back to our example, we can look up class property's kind from the playground.

yaml
rule:
  kind: field_definition

It will match the following code successfully (playground link).

js
class Test {
  a = 123 // match this line
}

Here are some situations that you can effectively use kind:

  1. Pattern code is ambiguous to parse, e.g. {} in JavaScript can be either object or code block.
  2. It is too hard to enumerate all patterns of an AST kind node, e.g. matching all Java/TypeScript class declaration will need including all modifiers, generics, extends and implements.
  3. Patterns only appear within specific context, e.g. the class property definition.

regex

The regex atomic rule will match the AST node by its text against a Rust regular expression.

yaml
rule:
  regex: "\w+"

TIP

The regular expression is written in Rust syntax, not the popular PCRE like syntax. So some features are not available like arbitrary look-ahead and back references.

You should almost always combine regex with other atomic rules to make sure the regular expression is applied to the correct AST node. Regex matching is quite expensive and cannot be optimized based on AST node kinds. While kind and pattern rules can be only applied to nodes with specific kind_id for optimized performance.

Tips for Writing Rules

Since one rule will have only one AST node in one match, it is recommended to first write the atomic rule that matches the desired node.

Suppose we want to write a rule which finds functions without a return type. For example, this code would trigger an error:

ts
const foo = () => {
	return 1;
}

The first step to compose a rule is to find the target. In this case, we can first use kind: arrow_function to find function node. Then we can use other rules to filter candidate nodes that does have return type.

Another trick to write cleaner rule is to use sub-rules as fields. Please refer to composite rule for more details.

Made with ❤️ with Rust