https://github.com/nrc/stupid-stats

Tutorial and demo of rust compiler replacement tooling
https://github.com/nrc/stupid-stats
Last synced: 8 months ago
JSON representation
Tutorial and demo of rust compiler replacement tooling
Host: GitHub
URL: https://github.com/nrc/stupid-stats
Owner: nrc
Created: 2015-02-20T23:12:41.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2021-03-23T00:53:03.000Z (over 4 years ago)
Last Synced: 2025-03-19T02:44:17.665Z (8 months ago)
Language: Rust
Size: 34.2 KB
Stars: 56
Watchers: 2
Forks: 10
Open Issues: 0
Metadata Files:
- Readme: README.md
Awesome Lists containing this project

README

          # A tutorial on creating a drop-in replacement for rustc.

Many tools benefit from being a drop-in replacement for a compiler. By this, I

mean that any user of the tool can use `mytool` in all the ways they would

normally use `rustc` - whether manually compiling a single file or as part of a

complex make project or Cargo build, etc. That could be a lot of work;

rustc, like most compilers, takes a large number of command line arguments which

can affect compilation in complex and interacting ways. Emulating all of this

behaviour in your tool is annoying at best, especically if you are making many

of the same calls into librustc that the compiler is.

The kind of things I have in mind are tools like rustdoc or a future rustfmt.

These want to operate as closely as possible to real compilation, but have

totally different outputs (documentation and formatted source code,

respectively). Another use case is a customised compiler. Say you want to add a

custom code generation phase after macro expansion, then creating a new tool

should be easier than forking the compiler (and keeping it up to date as the

compiler evolves).

I have gradually been trying to improve the API of librustc to make creating a

drop-in tool easier to produce (many others have also helped improve these

interfaces over the same time frame). It is now pretty simple to make a tool

which is as close to rustc as you want it to be. In this tutorial I'll show

how.

Note/warning, everything I talk about in this tutorial is internal API for

rustc. It is all extremely unstable and likely to change often and in

unpredictable ways. Maintaining a tool which uses these APIs will be non-

trivial, although hopefully easier than maintaining one that does similar things

without using them.

This tutorial starts with a very high level view of the rustc compilation

process and of some of the code that drives compilation. Then I'll describe how

that process can be customised. In the final section of the tutorial, I'll go

through an example - stupid-stats - which shows how to build a drop-in tool.

## Overview of the compilation process

Compilation using rustc happens in several phases. We start with parsing, this

includes lexing. The output of this phase is an AST (abstract syntax tree).

There is a single AST for each crate (indeed, the entire compilation process

operates over a single crate). Parsing abstracts away details about individual

files which will all have been read in to the AST in this phase. At this stage

the AST includes all macro uses, attributes will still be present, and nothing

will have been eliminated due to `cfg`s.

The next phase is configuration and macro expansion. This can be thought of as a

function over the AST. The unexpanded AST goes in and an expanded AST comes out.

Macros and syntax extensions are expanded, and `cfg` attributes will cause some

code to disappear. The resulting AST won't have any macros or macro uses left

in.

The code for these first two phases is in [libsyntax](https://github.com/rust-lang/rust/tree/master/src/libsyntax).

After this phase, the compiler allocates ids to each node in the AST

(technically not every node, but most of them). If we are writing out

dependencies, that happens now.

The next big phase is analysis. This is the most complex phase and

uses the bulk of the code in rustc. This includes name resolution, type

checking, borrow checking, type and lifetime inference, trait selection, method

selection, linting, and so forth. Most error detection is done in this phase

(although parse errors are found during parsing). The 'output' of this phase is

a bunch of side tables containing semantic information about the source program.

The analysis code is in [librustc](https://github.com/rust-lang/rust/tree/master/src/librustc)

and a bunch of other crates with the 'librustc_' prefix.

Next is translation, this translates the AST (and all those side tables) into

LLVM IR (intermediate representation). We do this by calling into the LLVM

libraries, rather than actually writing IR directly to a file. The code for this is in

[librustc_trans](https://github.com/rust-lang/rust/tree/master/src/librustc_trans).

The next phase is running the LLVM backend. This runs LLVM's optimisation passes

on the generated IR and then generates machine code. The result is object files.

This phase is all done by LLVM, it is not really part of the rust compiler. The

interface between LLVM and rustc is in [librustc_llvm](https://github.com/rust-lang/rust/tree/master/src/librustc_llvm).

Finally, we link the object files into an executable. Again we outsource this to

other programs and it's not really part of the rust compiler. The interface is

in [librustc_back](https://github.com/rust-lang/rust/tree/master/src/librustc_back)

(which also contains some things used primarily during translation).

All these phases are coordinated by the driver. To see the exact sequence, look

at the `compile_input` function in [librustc_driver/driver.rs](https://github.com/rust-lang/rust/tree/master/src/librustc_driver/driver.rs).

The driver (which is found in [librust_driver](https://github.com/rust-lang/rust/tree/master/src/librustc_driver))

handles all the highest level coordination of compilation - handling command

line arguments, maintaining compilation state (primarily in the `Session`), and

calling the appropriate code to run each phase of compilation. It also handles

high level coordination of pretty printing and testing. To create a drop-in

compiler replacement or a compiler replacement, we leave most of compilation

alone and customise the driver using its APIs.

## The driver customisation APIs

There are two primary ways to customise compilation - high level control of the

driver using `CompilerCalls` and controlling each phase of compilation using a

`CompileController`. The former lets you customise handling of command line

arguments etc., the latter lets you stop compilation early or execute code

between phases.

### `CompilerCalls`

`CompilerCalls` is a trait that you implement in your tool. It contains a fairly

ad-hoc set of methods to hook in to the process of processing command line

arguments and driving the compiler. For details, see the comments in

[librustc_driver/lib.rs](https://github.com/rust-lang/rust/tree/master/src/librustc_driver/lib.rs).

I'll summarise the methods here.

`early_callback` and `late_callback` let you call arbitrary code at different

points - early is after command line arguments have been parsed, but before

anything is done with them; late is pretty much the last thing before

compilation starts, i.e., after all processing of command line arguments, etc. is

done. Currently, you get to choose whether compilation stops or continues at

each point, but you don't get to change anything the driver has done. You can

record some info for later, or perform other actions of your own.

`some_input` and `no_input` give you an opportunity to modify the primary input

to the compiler (usually the input is a file containing the top module for a

crate, but it could also be a string). You could record the input or perform

other actions of your own.

Ignore `parse_pretty`, it is unfortunate and hopefully will get improved. There

is a default implementation, so you can pretend it doesn't exist.

`build_controller` returns a `CompileController` object for more fine-grained

control of compilation, it is described next.

We might add more options in the future.

### `CompilerController`

`CompilerController` is a struct consisting of `PhaseController`s and flags.

Currently, there is only flag, `make_glob_map` which signals whether to produce

a map of glob imports (used by save-analysis and potentially other tools). There

are probably flags in the session that should be moved here.

There is a `PhaseController` for each of the phases described in the above

summary of compilation (and we could add more in the future for finer-grained

control). They are all `after_` a phase because they are checked at the end of a

phase (again, that might change), e.g., `CompilerController::after_parse`

controls what happens immediately after parsing (and before macro expansion).

Each `PhaseController` contains a flag called `stop` which indicates whether

compilation should stop or continue, and a callback to be executed at the point

indicated by the phase. The callback is called whether or not compilation

continues.

Information about the state of compilation is passed to these callbacks in a

`CompileState` object. This contains all the information the compiler has. Note

that this state information is immutable - your callback can only execute code

using the compiler state, it can't modify the state. (If there is demand, we

could change that). The state available to a callback depends on where during

compilation the callback is called. For example, after parsing there is an AST

but no semantic analysis (because the AST has not been analysed yet). After

translation, there is translation info, but no AST or analysis info (since these

have been consumed/forgotten).

## An example - stupid-stats

Our example tool is very simple, it simply collects some simple and not very

useful statistics about a program; it is called stupid-stats. You can find

the (more heavily commented) complete source for the example on [Github](https://github.com/nick29581/stupid-stats/blob/master/src).

To build, just do `cargo build`. To run on a file `foo.rs`, do `cargo run

foo.rs` (assuming you have a Rust program called `foo.rs`. You can also pass any

command line arguments that you would normally pass to rustc). When you run it

you'll see output similar to

```

In crate: foo,

Found 12 uses of `println!`;

The most common number of arguments is 1 (67% of all functions);

25% of functions have four or more arguments.

```

To make things easier, when we talk about functions, we're excluding methods and

closures.

You can also use the executable as a drop-in replacement for rustc, because

after all, that is the whole point of this exercise. So, however you use rustc

in your makefile setup, you can use `target/stupid` (or whatever executable you

end up with) instead. That might mean setting an environment variable or it

might mean renaming your executable to `rustc` and setting your PATH. Similarly,

if you're using Cargo, you'll need to rename the executable to rustc and set the

PATH. Alternatively, you should be able to use

[multirust](https://github.com/brson/multirust) to get around all the PATH stuff

(although I haven't actually tried that).

(Note that this example prints to stdout. I'm not entirely sure what Cargo does

with stdout from rustc under different circumstances. If you don't see any

output, try inserting a `panic!` after the `println!`s to error out, then Cargo

should dump stupid-stats' stdout to Cargo's stdout).

Let's start with the `main` function for our tool, it is pretty simple:

```

fn main() {

    let args: Vec<_> = std::env::args().collect();

    rustc_driver::run_compiler(&args, &mut StupidCalls::new());

    std::env::set_exit_status(0);

}

```

The first line grabs any command line arguments. The second line calls the

compiler driver with those arguments. The final line sets the exit code for the

program.

The only interesting thing is the `StupidCalls` object we pass to the driver.

This is our implementation of the `CompilerCalls` trait and is what will make

this tool different from rustc.

`StupidCalls` is a mostly empty struct:

```

struct StupidCalls {

    default_calls: RustcDefaultCalls,

}

```

This tool is so simple that it doesn't need to store any data here, but usually

you would. We embed a `RustcDefaultCalls` object to delegate to in our impl when

we want exactly the same behaviour as the Rust compiler. Mostly you don't want

to do that (or at least don't need to) in a tool. However, Cargo calls rustc

with the `--print file-names`, so we delegate in `late_callback` and `no_input`

to keep Cargo happy.

Most of the rest of the impl of `CompilerCalls` is trivial:

```

impl<'a> CompilerCalls<'a> for StupidCalls {

    fn early_callback(&mut self,

                        _: &getopts::Matches,

                        _: &config::Options,

                        _: &diagnostics::registry::Registry,

                        _: ErrorOutputType)

                      -> Compilation {

        Compilation::Continue

    }

    fn late_callback(&mut self,

                     t: &TransCrate,

                     m: &getopts::Matches,

                     s: &Session,

                     c: &CrateStore,

                     i: &Input,

                     odir: &Option,

                     ofile: &Option)

                     -> Compilation {

        self.default_calls.late_callback(t, m, s, c, i, odir, ofile);

        Compilation::Continue

    }

    fn some_input(&mut self,

                  input: Input,

                  input_path: Option)

                  -> (Input, Option) {

        (input, input_path)

    }

    fn no_input(&mut self,

                m: &getopts::Matches,

                o: &config::Options,

                odir: &Option,

                ofile: &Option,

                r: &diagnostics::registry::Registry)

                -> Option<(Input, Option)> {

        self.default_calls.no_input(m, o, odir, ofile, r);

        // This is not optimal error handling.

        panic!("No input supplied to stupid-stats");

    }

    fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> {

        ...

    }

}

```

We don't do anything for either of the callbacks, nor do we change the input if

the user supplies it. If they don't, we just `panic!`, this is the simplest way

to handle the error, but not very user-friendly, a real tool would give a

constructive message or perform a default action.

In `build_controller` we construct our `CompileController`. We only want to

parse, and we want to inspect macros before expansion, so we make compilation

stop after the first phase (parsing). The callback after that phase is where the

tool does it's actual work by walking the AST. We do that by creating an AST

visitor and making it walk the AST from the top (the crate root). Once we've

walked the crate, we print the stats we've collected:

```

fn build_controller(&mut self, _: &Session) -> driver::CompileController<'a> {

    // We mostly want to do what rustc does, which is what basic() will return.

    let mut control = driver::CompileController::basic();

    // But we only need the AST, so we can stop compilation after parsing.

    control.after_parse.stop = Compilation::Stop;

    // And when we stop after parsing we'll call this closure.

    // Note that this will give us an AST before macro expansions, which is

    // not usually what you want.

    control.after_parse.callback = box |state| {

        // Which extracts information about the compiled crate...

        let krate = state.krate.unwrap();

        // ...and walks the AST, collecting stats.

        let mut visitor = StupidVisitor::new();

        visit::walk_crate(&mut visitor, krate);

        // And finally prints out the stupid stats that we collected.

        let cratename = match attr::find_crate_name(&krate.attrs[]) {

            Some(name) => name.to_string(),

            None => String::from_str("unknown_crate"),

        };

        println!("In crate: {},\n", cratename);

        println!("Found {} uses of `println!`;", visitor.println_count);

        let (common, common_percent, four_percent) = visitor.compute_arg_stats();

        println!("The most common number of arguments is {} ({:.0}% of all functions);",

                 common, common_percent);

        println!("{:.0}% of functions have four or more arguments.", four_percent);

    };

    control

}

```

That is all it takes to create your own drop-in compiler replacement or custom

compiler! For the sake of completeness I'll go over the rest of the stupid-stats

tool.

```

struct StupidVisitor {

    println_count: usize,

    arg_counts: Vec,

}

```

The `StupidVisitor` struct just keeps track of the number of `println!`s it has

seen and the count for each number of arguments. It implements

`syntax::visit::Visitor` to walk the AST. Mostly we just use the default

methods, these walk the AST taking no action. We override `visit_item` and

`visit_mac` to implement custom behaviour when we walk into items (items include

functions, modules, traits, structs, and so forth, we're only interested in

functions) and macros:

```

impl<'v> visit::Visitor<'v> for StupidVisitor {

    fn visit_item(&mut self, i: &'v ast::Item) {

        match i.node {

            ast::Item_::ItemFn(ref decl, _, _, _, _) => {

                // Record the number of args.

                self.increment_args(decl.inputs.len());

            }

            _ => {}

        }

        // Keep walking.

        visit::walk_item(self, i)

    }

    fn visit_mac(&mut self, mac: &'v ast::Mac) {

        // Find its name and check if it is "println".

        let ast::Mac_::MacInvocTT(ref path, _, _) = mac.node;

        if path_to_string(path) == "println" {

            self.println_count += 1;

        }

        // Keep walking.

        visit::walk_mac(self, mac)

    }

}

```

The `increment_args` method increments the correct count in

`StupidVisitor::arg_counts`. After we're done walking, `compute_arg_stats` does

some pretty basic maths to come up with the stats we want about arguments.

## What next?

These APIs are pretty new and have a long way to go until they're really good.

If there are improvements you'd like to see or things you'd like to be able to

do, let me know in a comment or [GitHub issue](https://github.com/rust-lang/rust/issues).

In particular, it's not clear to me exactly what extra flexibility is required.

If you have an existing tool that would be suited to this setup, please try it

out and let me know if you have problems.

It'd be great to see Rustdoc converted to using these APIs, if that is possible

(although long term, I'd prefer to see Rustdoc run on the output from save-

analysis, rather than doing its own analysis). Other parts of the compiler

(e.g., pretty printing, testing) could be refactored to use these APIs

internally (I already changed save-analysis to use `CompilerController`). I've

been experimenting with a prototype rustfmt which also uses these APIs.
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/nrc/stupid-stats

Awesome Lists containing this project

README