Add New Language to ast-grep
Thank you for your interest in adding a new language to ast-grep! We appreciate your contribution to this project. Adding new languages will make the tool more useful and accessible to a wider range of users.
However, there are some requirements and constraints that you need to consider before you start. This guide will help you understand the process and the standards of adding a new language to ast-grep.
Requirements and Constraints
To keep ast-grep lightweight and fast, we have several factors to consider when adding a new language. As a rule of thumb, we want to limit the binary size of ast-grep under 10MB after zip compression.
- Popularity of the language. While the popularity of a language does not necessarily reflect its merits, our limited size budget allows us to only support languages that are widely used and have a large user base. Online sources like TIOBE index or GitHub Octoverse can help one to check the popularity of the language.
Quality of the Tree-sitter grammar. ast-grep relies on Tree-sitter, a parser generator tool and a parsing library, to support different languages. The Tree-sitter grammar for the new language should be well-written, up-to-date, and regularly maintained. You can search Tree-sitter on GitHub or on crates.io.
Size of the grammar. The new language's grammar should not be too complicated. Otherwise it may take too much space from other languages. You can also check the current size of ast-grep in the releases page.
Availability of the grammar on crates.io. To ease the maintenance burden, we prefer to use grammars that are published on crates.io, Rust's package registry. If your grammar is not on crates.io, you need to publish it yourself or ask the author to do so.
Don't worry if your language is not supported by ast-grep. You can try ast-grep's custom language support and register your own Tree-sitter parser!
If your language satisfies the requirements above, congratulations! Let's see how to add it to ast-grep.
Add to ast-grep Core
ast-grep has several distinct use cases: CLI tool, n-api lib and web playground.
Adding a language includes two steps. The first step is to add the language to ast-grep core. The core repository is multi-crate workspace hosted at GitHub. The relevant crate is language, which defines the supported languages and their tree-sitter grammars.
We will use Ruby as an example to show how to add a new language to ast-grep core. You can see the commit as a reference.
Add Dependencies
- Add
tree-sitter-[lang]
crate asdependencies
to the Cargo.toml in thelanguage
crate.
# Cargo.toml
[dependencies]
...
tree-sitter-ruby = {version = "0.20.0", optional = true }
...
Note the optional
attribute is required here.
- Add the
tree-sitter-[lang]
dependency inbuiltin-parser
list.
# Cargo.toml
[features]
builtin-parser = [
...
"tree-sitter-ruby", // [!code ++]
...
]
The builtin-parser
feature is used for command line tool. Web playground is not using the builtin parser so the dependency must be optional.
Implement Parser
- Add the parser function in parsers.rs, where tree-sitter grammars are imported.
#[cfg(feature = "builtin-parser")]
mod parser_implementation {
...
pub fn language_ruby() -> TSLanguage {
tree_sitter_ruby::language().into()
}
...
}
#[cfg(not(feature = "builtin-parser"))]
mod parser_implementation {
impl_parsers!(
...
language_ruby,
...
);
}
Note there are two places to add, one for #[cfg(feature = "builtin-parser")]
and the other for #[cfg(not(feature = "builtin-parser"))]
.
- Implement
language
trait by using macro in lib.rs
// lib.rs
impl_lang_expando!(Ruby, language_ruby, 'µ');
There are two macros, impl_lang_expando
or impl_lang
, to generate necessary methods required by ast-grep Language
trait.
You need to choose one of them to use for the new language. If the language does not allow $
as valid identifier character and you need to customize the expando_char, use impl_lang_expando
.
You can reference the comment here for more information.
Register the New Language
- Add new lang in
SupportLang
enum.
// lib.rs
pub enum SupportLang {
...
Ruby,
...
}
- Add new lang in
execute_lang_method
// lib.rs
macro_rules! execute_lang_method {
($me: path, $method: ident, $($pname:tt),*) => {
use SupportLang as S;
match $me {
...
S::Ruby => Ruby.$method($($pname,)*),
}
}
}
- Add new lang in
all_langs
,alias
,extension
andfile_types
See this commit for the detailed code change.
Find existing languages as reference
The rule of thumb to add a new language is to find a reference language that is already included in the language crate. Then add your new language by searching and following the existing language.
Add to ast-grep Playground
Adding new language to web playground is a little bit more complex.
The playground has a standalone repository and we need to change code there.
Prepare WASM
- Set up Tree-sitter
First, we need to set up Tree-sitter development tools like. You can refer to the Tree-sitter setup section in this link.
- Build WASM file
Then, in your parser repository, use this command to build a WASM file.
tree-sitter generate # if grammar is not generated before
tree-sitter build-wasm
Note you may need to install docker when building WASM files.
- Move WASM file to the website
public
folder.
You can also see other languages' WASM files in the public directory. The file name is in the format of tree-sitter-[lang].wasm
. The name will be used later in parserPaths
.
Add language in Rust
You need to add the language in the wasm_lang.rs. More specifically, you need to add a new enum variant in WasmLang
, handle the new variant in execute_lang_method
and implement FromStr
.
// new variant
pub enum WasmLang {
// ...
Swift,
}
// handle variant in macro
macro_rules! execute_lang_method {
($me: path, $method: ident, $($pname:tt),*) => {
use WasmLang as W;
match $me {
W::Swift => L::Swift.$method($($pname,)*),
}
}
}
// impl FromStr
impl FromStr for WasmLang {
// ...
fn from_str(s: &str) -> Result<Self, Self::Err> {
Ok(match s {
"swift" => Swift,
})
}
}
Add language in TypeScript
Finally you need to add the language in TypeScript to make it available in playground. The file is lang.ts. There are two changes need to make.
// Add language parserPaths
const parserPaths = {
// ...
swift: 'tree-sitter-swift.wasm',
}
// Add language display name
export const languageDisplayNames: Record<SupportedLang, string> = {
// ...
swift: 'Swift',
}
You can see Swift's support as the reference commit.