About performances #211

irevoire · 2023-10-21T15:38:58Z

irevoire
Oct 21, 2023

Hey, in #210, you mentioned that « we do care about the startup speed ».

I thought about it a little bit. First of all, I profiled the prelude import here. You can see a quick overview here:

As expected, we pretty much spent on the interpreter and not on the importer or something else unrelated; that’s nice.

Now, let’s dive deep into the interpret method:

The interesting parts are:

The parser & lexer account for almost nothing of the total execution time; they don't need work at all
56% are spent on the bytecode interpreter
22% are spent on the prefix_transformer. I have no idea what it is and don't know if it's expected (and I would love to know what it is 👀)
14.5% are spent on the type-checker. It doesn't seem huge considering how important that is in numbat. But maybe that's too high

One idea to optimize the startup time could be to pre-compile the prelude to bytecode and add the ability to import bytecode directly to a context.
That should halve the startup time straight away.

sharkdp · 2023-10-21T20:37:49Z

sharkdp
Oct 21, 2023
Maintainer

Thank you for looking into this. I did a similar profile a while back and optimized the prefix transformer a bit (because it was the slowest part back then), but apart from this, there is still a lot of potential!

I should also clarify my "we do care about the startup speed" statement. What I mean is the following: I really like command line applications that feel snappy. To be honest, most people will probably not care if the Numbat CLI takes 50 ms or 5 ms. But there is a threshold. Startup times tend to be noticeable once you get closer to 100 ms or 200 ms. And 500 ms already feels slow. I can feel a difference between python starting up (25 ms) and node starting up (300 ms). Currently, numbat is on par with python — which I think is more than enough. But of course this situation might change as we add more features and more code to the prelude. So it's definitely good to have this in mind.

Also, since the startup time is dominated by the interpreter, speeding up startup time means optimizing the interpreter, which is also relevant for executing larger amounts of code.

The parser & lexer account for almost nothing of the total execution time; they don't need work at all

👍

56% are spent on the bytecode interpreter

Okay. Here it's probably worth taking a closer look. This consists of two parts. The actual compilation step (AST => bytecode). And then running the bytecode VM.

22% are spent on the prefix_transformer. I have no idea what it is and don't know if it's expected (and I would love to know what it is 👀)

Yeah. That's a special part of Numbat. Maybe it would be possible to integrate this into the parser, but I found it easier to implement as a separate stage. But that also comes at a cost (traversing and rewriting the full AST). The job of the prefix parser is to do name resolution. And in particular: to distinguish between variables and units. If we see an identifier like mo, is that a variable called "mo"? Or did the user define a (short alias for a) unit named "o" and we have to parse this as "milli-o"?

14.5% are spent on the type-checker. It doesn't seem huge considering how important that is in numbat. But maybe that's too high

The typechecker also does a full rewrite of the parse tree (currently: ast) to the typed AST (typed_ast). Both structures are quite similar, with the latter containing a bit more semantic information (the types). I think some other languages use a single structure for that and leave the type-fields empty in the beginning? This would allow us to get rid of a lot of cloning. But it also seems a little less clean to me (modifying the AST instead of producing a new one).

One idea to optimize the startup time could be to pre-compile the prelude to bytecode and add the ability to import bytecode directly to a context.

Yes. Like pythons .pyc. I thought about it, but it seemed a bit early for me to introduce something like this. When we don't really have any performance problems yet (as far as I know). If it would just be the bytecode, it would probably be rather easy to implement (apart from the whole cache invalidation part, which might be more tricky than expected). But we would also need to serialize the rest of the VM structure (the ".data" section). Which would mean defining a whole new binary format etc.

1 reply

irevoire Oct 21, 2023
Author

The job of the prefix parser is to do name resolution. And in particular: to distinguish between variables and units.

Ooooh, ok, that’s super interesting!
A few years ago, I started working on my own implementation of something really similar to numbat, and it was when I was wondering how to do this step properly that I discovered insect.sh 😂

This would allow us to get rid of a lot of cloning.

Yeah, I don’t know how to screenshot this correctly, but I’ve seen many mallocs everywhere. But it’s always hard to say which parts are normal or problematic without a better knowledge of the codebase.

Really nice overview, though, thanks!

hamirmahal · 2024-03-13T06:52:46Z

hamirmahal
Mar 13, 2024

Out of curiosity, what're you using for benchmarking in those screenshots?

1 reply

irevoire Mar 13, 2024
Author

« Instrument »
A tool only available on macOS (and since nothing else works, I tend to use it).

I believe it gives the same kind of info you would find with valgrind + kcachegrind on linux

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About performances #211

{{title}}

Replies: 2 comments 2 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

About performances #211

irevoire Oct 21, 2023

Replies: 2 comments · 2 replies

sharkdp Oct 21, 2023 Maintainer

irevoire Oct 21, 2023 Author

hamirmahal Mar 13, 2024

irevoire Mar 13, 2024 Author

irevoire
Oct 21, 2023

Replies: 2 comments 2 replies

sharkdp
Oct 21, 2023
Maintainer

irevoire Oct 21, 2023
Author

hamirmahal
Mar 13, 2024

irevoire Mar 13, 2024
Author