Custom Language Support
Experimental Feature
Custom language in ast-grep is an experimental option. Use it with caution!
In this guide, we will show you how to use a custom language that is not built into ast-grep.
We will use Mojo 🔥 as an example!
Tree-sitter is a popular parser generator library that ast-grep uses to support many languages. However, not all Tree-sitter compatible languages are shipped with ast-grep command line tool.
If you want to use a custom language that is not built into ast-grep, you can compile it as a dynamic library first and load it via custom language registration.
There will be three steps to achieve this:
- Install tree-sitter CLI and prepare the grammar file.
- Compile the custom language as a dynamic library.
- Register the custom language in ast-grep project config.
Pro Tip
You can also reuse the dynamic library compiled by neovim. See this link to find where the parsers are.
Prepare Tree-sitter Tool and Parser
Before you can compile a custom language as a dynamic library, you need to install the Tree-sitter CLI tool and get the Tree-sitter grammar for your custom language.
The recommended way to install the Tree-sitter CLI tool is via npm:
npm install -g tree-sitter-cli
Alternative installation methods are also available in the official doc.
For the Tree-sitter grammar, you can either write your own or find one from the Tree-sitter grammars repository.
Since Mojo is a new language, we cannot find an existing repo for it. But I have created a mock grammar for Mojo.
You can clone it for the tutorial sake. It is forked from Python and barely contains Mojo syntax(just struct
/fn
keywords).
git clone https://github.com/HerringtonDarkholme/tree-sitter-mojo.git
Compile the Parser as Dynamic Library
Once we have prepared the tool and the grammar, we can compile the parser as dynamic library. tree-sitter-cli
is the preferred way to compile dynamic library.
The official way to compile a parser as a dynamic library is to use the tree-sitter build
command.
tree-sitter build --output mojo.so
The build command compiles your parser into a dynamically-loadable library as a shared object (.so, .dylib, or .dll).
Another way is to use the following commands to compile the parser manually:
gcc -shared -fPIC -fno-exceptions -g -I {header_path} -o {lib_path} -O2 {scanner_path} -xc {parser_path} {other_flags}
where {header_path}
is the path to the folder of header file of your custom language parser (usually src
) and {lib_path}
is the path where you want to store the dynamic library (in this case mojo.so
). {scanner_path}
and {parser_path}
are the c
or cc
files of your parser. You also need to include other gcc flags if needed.
For example, in mojo's case, the full command will be:
gcc -shared -fPIC -fno-exceptions -g -I 'src' -o mojo.so -O2 src/scanner.cc -xc src/parser.c -lstdc++
Old tree-sitter does not have build command
Previously there are no official instructions on how to do this on the internet, but we can get some hints from Tree-sitter's source code.
One way is to set an environment variable called TREE_SITTER_LIBDIR
to the path where you want to store the dynamic library, and then run tree-sitter test
in the directory of your custom language parser.
This will generate a dynamic library at the TREE_SITTER_LIBDIR
path.
For example:
cd path/to/mojo/parser
export TREE_SITTER_LIBDIR=path/to/your/dir
tree-sitter test
Register Language in sgconfig.yml
Once you have compiled the dynamic library for your custom language, you need to register it in the sgconfig.yml
file. You can use the command ast-grep new
to create a project and find the configuration file in the project root.
You need to add a new entry under the customLanguages
key with the name of your custom language and some properties:
# sgconfig.yml
ruleDirs: ["./rules"]
customLanguages:
mojo:
libraryPath: mojo.so # path to dynamic library
extensions: [mojo, 🔥] # file extensions for this language
expandoChar: _ # optional char to replace $ in your pattern
The libraryPath
property specifies the path to the dynamic library relative to the sgconfig.yml
file or an absolute path. The extensions
property specifies a list of file extensions for this language. The expandoChar
property is optional and specifies a character that can be used instead of $
for meta-variables in your pattern.
What's expandoChar?
ast-grep requires pattern to be a valid syntactical construct, but $VAR
might not be a valid expression in some language. expandoChar
will replace $
in the pattern so it can be parsed successfully by Tree-sitter.
For example, $VAR
is not valid in Python Mojo. So we need to replace it with _VAR
. You can check the expandoChar
of ast-grep's built-in languages here.
Use It!
Now you are ready to use your custom language with ast-grep! You can use it as any other supported language with the -l
flag or the language
property in your rule.
For example, to search for all occurrences of print
in mojo files, you can run:
ast-grep -p "print" -l mojo
Or you can write a rule in yaml like this:
id: my-first-mojo-rule
language: mojo # the name we register before!
severity: hint
rule:
pattern: print
And that's it! You have successfully used a custom language with ast-grep!
Inspect Parser Output
Due to limited bandwidth, ast-grep does not support pretty print Concrete Syntax Trees.
However, you can use tree-sitter-cli to dump the AST tree for your file.
tree-sitter parse [file_path]