Skip to content

Custom Language Support

Experimental Feature

Custom language in ast-grep is an experimental option. Use it with caution!

In this guide, we will show you how to use a custom language that is not built into ast-grep.

We will use Mojo 🔥 as an example!


Tree-sitter is a popular parser generator library that ast-grep uses to support many languages. However, not all Tree-sitter compatible languages are shipped with ast-grep command line tool.

If you want to use a custom language that is not built into ast-grep, you can compile it as a dynamic library first and load it via custom language registration.

There will be three steps to achieve this:

  1. Install tree-sitter CLI and prepare the grammar file.
  2. Compile the custom language as a dynamic library.
  3. Register the custom language in ast-grep project config.

Pro Tip

You can also reuse the dynamic library compiled by neovim. See this link to find where the parsers are.

Prepare Tree-sitter Tool and Parser

Before you can compile a custom language as a dynamic library, you need to install the Tree-sitter CLI tool and get the Tree-sitter grammar for your custom language.

The recommended way to install the Tree-sitter CLI tool is via npm:

bash
npm install -g tree-sitter-cli

Alternative installation methods are also available in the official doc.

For the Tree-sitter grammar, you can either write your own or find one from the Tree-sitter grammars repository.

Since Mojo is a new language, we cannot find an existing repo for it. But I have created a mock grammar for Mojo.

You can clone it for the tutorial sake. It is forked from Python and barely contains Mojo syntax(just struct/fn keywords).

bash
git clone https://github.com/HerringtonDarkholme/tree-sitter-mojo.git

Compile the Parser as Dynamic Library

Once we have prepared the tool and the grammar, we can compile the parser as dynamic library.

There are no official instructions on how to do this on the internet, but we can get some hints from Tree-sitter's source code.

One way is to set an environment variable called TREE_SITTER_LIBDIR to the path where you want to store the dynamic library, and then run tree-sitter test in the directory of your custom language parser.

This will generate a dynamic library at the TREE_SITTER_LIBDIR path.

For example:

sh
cd path/to/mojo/parser
export TREE_SITTER_LIBDIR=path/to/your/dir
tree-sitter test

Another way is to use the following commands to compile the parser manually:

shell
gcc -shared -fPIC -fno-exceptions -g -I {header_path} -o {lib_path} -O2 {scanner_path} -xc {parser_path} {other_flags}

where {header_path} is the path to the folder of header file of your custom language parser (usually src) and {lib_path} is the path where you want to store the dynamic library (in this case mojo.so). {scanner_path} and {parser_path} are the c or cc files of your parser. You also need to include other gcc flags if needed.

For example, in mojo's case, the full command will be:

shell
gcc -shared -fPIC -fno-exceptions -g -I 'src' -o mojo.so -O2 src/scanner.cc -xc src/parser.c -lstdc++

WARNING

tree-sitter-cli is the preferred way to compile dynamic library.

Register Language in sgconfig.yml

Once you have compiled the dynamic library for your custom language, you need to register it in the sgconfig.yml file. You can use the command sg new to create a project and find the configuration file in the project root.

You need to add a new entry under the customLanguages key with the name of your custom language and some properties:

yaml
# sgconfig.yml
ruleDirs: ["./rules"]
customLanguages:
  mojo:
      libraryPath: mojo.so     # path to dynamic library
      extensions: [mojo, 🔥]   # file extensions for this language
      expandoChar: _           # optional char to replace $ in your pattern

The libraryPath property specifies the path to the dynamic library relative to the sgconfig.yml file or an absolute path. The extensions property specifies a list of file extensions for this language. The expandoChar property is optional and specifies a character that can be used instead of $ for meta-variables in your pattern.

What's expandoChar?

ast-grep requires pattern to be a valid syntactical construct, but $VAR might not be a valid expression in some language. expandoChar will replace $ in the pattern so it can be parsed successfully by Tree-sitter.

For example, $VAR is not valid in Python Mojo. So we need to replace it with _VAR. You can check the expandoChar of ast-grep's built-in languages here.

Use It!

Now you are ready to use your custom language with ast-grep! You can use it as any other supported language with the -l flag or the language property in your rule.

For example, to search for all occurrences of print in mojo files, you can run:

bash
sg -p "print" -l mojo

Or you can write a rule in yaml like this:

yaml
id: my-first-mojo-rule
language: mojo  # the name we register before!
severity: hint
rule:
  pattern: print

And that's it! You have successfully used a custom language with ast-grep!

Inspect Parser Output

Due to limited bandwidth, ast-grep does not support pretty print Concrete Syntax Trees.

However, you can use tree-sitter-cli to dump the AST tree for your file.

bash
tree-sitter parse [file_path]

Quiz Time

Can you support parse main.ʕ◔ϖ◔ʔ as Golang?

Answer.

Made with ❤️ with Rust