Perfect hashing? #348

NiLuJe · 2020-06-29T18:14:40Z

Just a random drive-by so that I don't forget about it ;).

Inspired by ArtifexSoftware/mupdf@6bbfe28 in MµPDF, I'm wondering if we couldn't also make use of something similar in a few places to speed things up.

I vaguely remember you switched some similar stuff to a binary search recently, @poire-z ?

The text was updated successfully, but these errors were encountered:

poire-z · 2020-06-29T18:51:09Z

I have no real idea how/what gperf really does :/
But looking at the mupdf patch:

they replaced some late string comparisons with some id/hash comparisons - we already do that, by using some sequential id (when parsing the stylesheet text into an intermediate binary form) from this enum:

crengine/crengine/src/lvstsheet.cpp

Lines 30 to 36 in 0dee202

    
           enum css_decl_code { 
        
               cssd_unknown, 
        
               cssd_display, 
        
               cssd_white_space, 
        
               cssd_text_align, 
        
               cssd_text_align_last, 
        
               cssd_text_decoration,

they use that gperf stuff when parsing - we indeed do many case insensitive string comparisons:

crengine/crengine/src/lvstsheet.cpp

Lines 123 to 129 in 0dee202

    
           static const char * css_decl_name[] = { 
        
               "", 
        
               "display", 
        
               "white-space", 
        
               "text-align", 
        
               "text-align-last", 
        
               "text-decoration",

crengine/crengine/src/lvstsheet.cpp

Lines 297 to 312 in 0dee202

    
           static css_decl_code parse_property_name( const char * & res ) 
        
           { 
        
               const char * str = res; 
        
               for (int i=1; css_decl_name[i]; i++) 
        
               { 
        
                   if (substr_icompare( css_decl_name[i], str )) // css property case should not matter (eg: "Font-Weight:") 
        
                   { 
        
                       // found! 
        
                       skip_spaces(str); 
        
                       if ( substr_compare( ":", str )) { 
        
           #ifdef DUMP_CSS_PARSING 
        
                           CRLog::trace("property name: %s", lString8(res, str-res).c_str() ); 
        
           #endif 
        
                           skip_spaces(str); 
        
                           res = str; 
        
                           return (css_decl_code)i;

I guess we could benefit from gperf for that 2nd part, but the benefit will not be much (vs the added gperf opacity/complexity when debugging): in my experience debugging and timing things, the time spent parsing a 30K or 500K OPS/publisher.css is really peanuts compared to the time spent checking/applying it to all the nodes of a book (even if parsing a 500K css takes 2 seconds, checking/applying the thousand rules can take minutes).
(I thought I had mentionned that at #276 (comment) and #276 (comment) , but didn't explicitely).
One case where it might help is when you have in an EPUB 4000 small html each linking the same huge CSS: we'll be parsing 4000 times that huge CSS, and it might be more expansive than applying them to the few dozens nodes in each HTML.

So, I don't think the benefit will really be noticable.
(But feel free to go at it if you wish, and it doesn't make the code too much unreadable.)
(Also, dunno gperf seems to have an option for ignoring case sensitivity, which we should for CSS properties, but MuPDF did not specify it it seems.)

NiLuJe added the enhancement label Jun 29, 2020

poire-z mentioned this issue Oct 31, 2023

Improve css class matching #545

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perfect hashing? #348

Perfect hashing? #348

NiLuJe commented Jun 29, 2020 •

edited

Loading

poire-z commented Jun 29, 2020

Perfect hashing? #348

Perfect hashing? #348

Comments

NiLuJe commented Jun 29, 2020 • edited Loading

poire-z commented Jun 29, 2020

NiLuJe commented Jun 29, 2020 •

edited

Loading