Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance suggestions #30

Open
abrom opened this issue Jan 22, 2020 · 3 comments
Open

Performance suggestions #30

abrom opened this issue Jan 22, 2020 · 3 comments

Comments

@abrom
Copy link
Contributor

abrom commented Jan 22, 2020

I've noticed some pretty nasty performance issues related to the use of the Text::Hyphen package. When passed in really long words it can take 5+ minutes to 'visualise'. Given there is no memoization of the Text::Hyphen dictionary (and it's caches) then calling each of the various methods that use your syllable_count method can end up taking hours to calculate what is otherwise a pretty simple piece of text (albeit with a few errant 'long' words).

Suggested solution would be to memoize the dictionary. Given that each of the methods in the TextStat class are class methods it does complicate things a little. An option would be to move each of the the class methods into instance methods then create wrappers on the class to create an instance (to keep backward compatibility).

ie

def self.sentence_count(text)
  new.sentence_count text
end

def sentence_count(text)
  ...
end

Of course could also use some meta programming to simplify the various class method calls too.

This would allow the dictionary to be memoized and minimise its cache being flushed, reducing runtime from hours to minutes.

Happy to put together a PR

@kupolak
Copy link
Owner

kupolak commented Feb 7, 2020

Good points. I know there may be some problems with performance due to the fact that the original python Texstat library utilizes LRU caching.

@abrom
Copy link
Contributor Author

abrom commented Feb 8, 2020

Here are the changes i've been using for the past few weeks with great success:

master...Studiosity:cache-dictionary

Note I've been sure to leave the existing interface untouched whilst opening up using the tools with great speed-ups:

text_service = TextStat.new 'ru'
reading_ease = text_service.flesch_reading_ease text
smog_index = text_service.smog_index text
...

Another (slight) improvement would be to pass the text block in through the initialiser to save passing it to each instance method..

Again, happy to put this in a PR if you're interested.

@kupolak
Copy link
Owner

kupolak commented Feb 8, 2020

PRs welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants