Skip to content

M4 parse lines

Mark Overmeer edited this page Aug 21, 2025 · 4 revisions

In-memory parser

Email messages where originally expected to be stored in (mbox) files, hence read from files. However, nowadays, many messages are kept in databases or arrive in the program some other way. The trick for these situations, a work-around via Mail::Box::FastScalar, is slow. By implementing Mail::Box::Parser::Lines, and in-memory message parser, we avoid the trick. Will this pay-off?

Initial timings

Each average of 10 runs on a random 62MB mailbox with 4122 messages. The changes are actually not in Mail-Box, but in the Mail-Message distribution 3.018.

                                    User       System
Mail-Box release 3.011 from file    9.846s     0.172s
Mail-Box 3.012 to be from file      9.531s     0.167s   3.3% faster

Separate messages in-memory 3.011  11.496s     0.007s
Separate msgs in-mem 3.012 to be   11.088s     0.010s   3.7% faster
Separate msgs in-mem 3.012 ref str 11.192s     0.008s   2.7% faster

In this (old) case, the in-memory messages are parsed via the pseudo-file interface offered by Mail::Box::FastScalar. This object simulates a scalar to be a file, and therefore it is slower than a real file.

Not included in the last three numbers is the 0.3 - 0.5 seconds to read the mailbox file, and split it in separate messages.

Introduction of Mail::Box::Parser::Lines

With the new parser, which does reimplement message parsing without the file interfaces, we get

::Lines with str                    5.859s     0.007s   49% faster (2x)
::Lines with ref str                5.868s     0.000s   49% faster (2x)

Concluding: the file-handle simulation was simple but expensive.

Conclusion

This new implementation is useful!

Clone this wiki locally