Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perfect hashing? #348

Open
NiLuJe opened this issue Jun 29, 2020 · 1 comment
Open

Perfect hashing? #348

NiLuJe opened this issue Jun 29, 2020 · 1 comment

Comments

@NiLuJe
Copy link
Member

NiLuJe commented Jun 29, 2020

Just a random drive-by so that I don't forget about it ;).

Inspired by ArtifexSoftware/mupdf@6bbfe28 in MµPDF, I'm wondering if we couldn't also make use of something similar in a few places to speed things up.

I vaguely remember you switched some similar stuff to a binary search recently, @poire-z ?

@poire-z
Copy link
Contributor

poire-z commented Jun 29, 2020

I have no real idea how/what gperf really does :/
But looking at the mupdf patch:

  • they replaced some late string comparisons with some id/hash comparisons - we already do that, by using some sequential id (when parsing the stylesheet text into an intermediate binary form) from this enum:

    enum css_decl_code {
    cssd_unknown,
    cssd_display,
    cssd_white_space,
    cssd_text_align,
    cssd_text_align_last,
    cssd_text_decoration,

  • they use that gperf stuff when parsing - we indeed do many case insensitive string comparisons:

    static const char * css_decl_name[] = {
    "",
    "display",
    "white-space",
    "text-align",
    "text-align-last",
    "text-decoration",

    static css_decl_code parse_property_name( const char * & res )
    {
    const char * str = res;
    for (int i=1; css_decl_name[i]; i++)
    {
    if (substr_icompare( css_decl_name[i], str )) // css property case should not matter (eg: "Font-Weight:")
    {
    // found!
    skip_spaces(str);
    if ( substr_compare( ":", str )) {
    #ifdef DUMP_CSS_PARSING
    CRLog::trace("property name: %s", lString8(res, str-res).c_str() );
    #endif
    skip_spaces(str);
    res = str;
    return (css_decl_code)i;

I guess we could benefit from gperf for that 2nd part, but the benefit will not be much (vs the added gperf opacity/complexity when debugging): in my experience debugging and timing things, the time spent parsing a 30K or 500K OPS/publisher.css is really peanuts compared to the time spent checking/applying it to all the nodes of a book (even if parsing a 500K css takes 2 seconds, checking/applying the thousand rules can take minutes).
(I thought I had mentionned that at #276 (comment) and #276 (comment) , but didn't explicitely).
One case where it might help is when you have in an EPUB 4000 small html each linking the same huge CSS: we'll be parsing 4000 times that huge CSS, and it might be more expansive than applying them to the few dozens nodes in each HTML.

So, I don't think the benefit will really be noticable.
(But feel free to go at it if you wish, and it doesn't make the code too much unreadable.)
(Also, dunno gperf seems to have an option for ignoring case sensitivity, which we should for CSS properties, but MuPDF did not specify it it seems.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants