Skip to content

Update words.x #275

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from
Open

Update words.x #275

wants to merge 2 commits into from

Conversation

PiotrJustyna
Copy link

@PiotrJustyna PiotrJustyna commented May 27, 2025

A simple comment update to bring it up to date.

Additionally, I changed the function's return type from () to String as it's easier on the eye while executing the lexer.

If this is useful to anybody, here's a link to a sandbox repository I created for alex: https://github.com/PiotrJustyna/alex-sandbox

Copy link
Member

@andreasabel andreasabel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion.

I am not sure about this PR, not a strict improvement...

examples/words.x Outdated
-- Performance test; run with input /usr/dict/words, for example
-- run with:
-- alexScanTokens " 123 abc "
-- for example
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this was intended as a performance test, running on a large collection of words.
Agreed, the instructions are not portable, e.g. /usr/dict/words does not exist on macOS.
However, the intention can be reconstructed from the comment.

The new comment loses the intention of this example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the response.

I see, I didn't realize that those words come with the os. I did some research and did find them on my macos under /usr/share/dict/words but I use haskell primary in an alpine container and those words do not ship with the os by default. They are available though as a package: https://pkgs.alpinelinux.org/packages?name=words&branch=edge&repo=&arch=&origin=&flagged=&maintainer= and when installed, are available here: /usr/share/dict/british-english (there are many languages there).

I'm happy to revert my changes not to affect the intended use of the example (performance test), but probably a descriptive comment and basic instructions how to execute such test would make sense? I feel it would be quite beneficial for those new to alex like myself.

What do you think?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks for finding /usr/share/dict/words; it would be good to add this to the existing comment.

but probably a descriptive comment and basic instructions how to execute such test would make sense?

Yes, I agree.
Maybe you can make a longer comment at the beginning of words.x that

  • explains its purpose as a benchmark and points to /usr{/share}/dict/words
  • explains how to run it in ghci (the thing you wanted to contribute)

I would also think that the "word" <> prefixing would be better removed, just returning the list of lexed strings.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let me get started on that comment - I feel it would add a lot of value.

You're right - since this is intended to be a performance test, adding the "word: " <> feels unnecessary. Strings feel like a good return type.

@andreasabel
Copy link
Member

FYI: benchmark run:

time ./words.bin +RTS -s < /usr/share/dict/words
235976
   1,749,384,800 bytes allocated in the heap
      10,807,296 bytes copied during GC
          62,848 bytes maximum residency (2 sample(s))
          35,456 bytes maximum slop
               6 MiB total memory in use (0 MiB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       418 colls,     0 par    0.004s   0.004s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.000s   0.001s     0.0006s    0.0012s

  INIT    time    0.004s  (  0.004s elapsed)
  MUT     time    0.204s  (  0.203s elapsed)
  GC      time    0.004s  (  0.005s elapsed)
  EXIT    time    0.000s  (  0.012s elapsed)
  Total   time    0.212s  (  0.224s elapsed)

  %GC     time       0.0%  (0.0% elapsed)

  Alloc rate    8,577,728,309 bytes per MUT second

  Productivity  96.2% of total user, 90.7% of total elapsed

0.21s user 0.02s system 47% cpu 0.487 total

The benchmark run isn't affected noticeably by the changes in this PR.

@PiotrJustyna
Copy link
Author

PiotrJustyna commented May 28, 2025

Comment done. Feels like we could use a more general guide on how to get started with alex, but that's maybe for another day. Thanks for reviewing again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants