Skip to content

PMD Comments Removal #889

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 42 commits into
base: master
Choose a base branch
from
Open

Conversation

Malmahrouqi3
Copy link
Collaborator

@Malmahrouqi3 Malmahrouqi3 commented Jun 16, 2025

Description

This is subsequent to (#882) with the mere addition of filtering out inline comments and commented lines. For the PMD details, check out the original issue (#646)

To show off the difference:
Filter-off: pmd-old branch https://github.com/Malmahrouqi3/MFC-mo2/actions/runs/15693861383/job/44214858621
Filter-on: pmd-new branch https://github.com/Malmahrouqi3/MFC-mo2/actions/runs/15694477828/job/44216673771

else
# Overwrite the original file with the processed content
mv "$TMP_FILE" "$file"
echo -e "Successfully processed $file"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove

@Malmahrouqi3
Copy link
Collaborator Author

Malmahrouqi3 commented Jun 16, 2025

@sbryngelson the number of violations is identical (63) whether the filter is on or off.

@sbryngelson
Copy link
Member

strange

@sbryngelson
Copy link
Member

idk about the number of violations, but the violations are different. This looks like an improvement to me.

-e '/^[ \t]*[cC*dD]/s/.*//' \
-e 's/([^"'\'']*("[^"]*"[^"'\'']*|'\''[^'\'']*'\''[^"'\'']*)*[^"'\'']*)[!].*$/\1/' \
"$file" > "$TMP_FILE"

Copy link
Member

@sbryngelson sbryngelson Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also remove line breaks: https://chatgpt.com/s/t_6850a2d138c0819187272dc044cfb208

sed -E '
  :a                      # ① label for looping
  /&$/ {                  # ② if the *current* line ends with “&”
        N                 #    read the *next* line into the pattern space
        s/&[[:space:]]*\n[[:space:]]*&/ /   # ③ delete the trailing “&”,
                                            #    the following newline, the
                                            #    leading “&” on the next line,
                                            #    and join them with one space
        ba                # ④ repeat until the next line no longer starts
                          #    with “&” (handles runs of many “&” lines)
  }
  s/&//g                  # ⑤ finally strip any stray “&” left in the file
'  input.txt  >  output.txt

(i haven't checked to see if this works myself). PS this will mess up the line numbers but whatever.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might break for openacc line continuations which have something like

!$acc <......> &
!$acc <......>

idk what it will do to code that looks like this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It deems them as comments as far as I know

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

supposedly this will work for the acc statements but the first command might break acc w/ line continuation. i would run the acc one first, then the one above

sed -E '
  :a
  /[[:space:]]*&[[:space:]]*$/ { N; s/[[:space:]]*&[[:space:]]*\n[[:space:]]*/ /; ba }
  s/^[[:space:]]*!\$acc[[:space:]]*/!$acc /
  :b
  s/[[:space:]]+!\$acc[[:space:]]*/ /g
  tb
'  source.f90  >  clean.f90

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It deems them as comments as far as I know

it has no notion of comments, it just looks at text. i'm trying to make the text as simple as possible, and having line continuations in different places is not a meaningful code difference so we should remove them before CPD

Copy link
Collaborator Author

@Malmahrouqi3 Malmahrouqi3 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There were some syntax errors and I got rid off em. Also I lowered the number of tokens=20 kinda semi-helpful cuz at 40 we only see big blocks mostly that are barely affected by comments. Further, I added these flags to avoid dummy errors/violations. --no-fail-on-violation \--no-fail-on-error so the test will fail iff pmd.yml itself is messed up - syntax wise.

Current implementation

                  sed -E '
                    :a
                    /&$/ {
                          N
                          s/&[[:space:]]*\n[[:space:]]*&/ /
                          ba
                    }
                    s/&//g
                  ' "$file" | \
                  sed -E \
                      -e '/^\s*!/s/.*//' \
                      -e '/^[cC*dD]/s/.*//' \
                      -e '/^[ \t]*[cC*dD]/s/.*//' \
                      -e 's/([^"'\'']*("[^"]*"[^"'\'']*|'\''[^'\'']*'\''[^"'\'']*)*[^"'\'']*)[!].*$/\1/' \
                      > "$TMP_FILE"

Copy link
Collaborator Author

@Malmahrouqi3 Malmahrouqi3 Jun 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(i haven't checked to see if this works myself). PS this will mess up the line numbers but whatever.

You can just copy and paste the redundant lines and search them up. It ain't gonna be a wild hunting job.

@sbryngelson
Copy link
Member

something isn't right with concatenating the line continuations because this is a repeated pattern in pmd:

                                q_sf(j, k, l) = q_sf(j, k, l) 
                                                + q_prim_vf0(mom_idx%beg)%sf(j, k, l)*fd_coeff_x(r, j)* 
                                                q_prim_vf0(mom_idx%beg)%sf(r + j, k, l) 
                                                + q_prim_vf0(mom_idx%beg + 1)%sf(j, k, l)*fd_coeff_y(r, k)* 
                                                q_prim_vf0(mom_idx%beg)%sf(j, r + k, l)

if they were concatted then this would be on one line

@Malmahrouqi3
Copy link
Collaborator Author

I guess it should now presumably detect more duplicate lines if they are exactly the same operation but split out into lines differently.

q_sf(j, k, l) = q_sf(j, k, l)+q_prim_vf0(mom_idx%beg)%sf(j, k, l)*fd_coeff_x(r, j)*q_prim_vf0(mom_idx%beg)%sf(r+j, k, l)+q_prim_vf0(mom_idx%beg+1)%sf(j, k, l)*fd_coeff_y(r, k)*q_prim_vf0(mom_idx%beg)%sf(j, r+k, l)+q_prim_vf0(mom_idx%end)%sf(j, k, l)*fd_coeff_z(r, l)*q_prim_vf0(mom_idx%beg)%sf(j, k, r+l)/y_cc(k)

@sbryngelson
Copy link
Member

yes that's what i was thinking. you could even strip spaces (maybe?)

@sbryngelson
Copy link
Member

this is actually a very valuable tool!

@Malmahrouqi3
Copy link
Collaborator Author

I left behind = .or. .and. and few subtle things.
Other than that, all spaces should be taken off around math/comparison operators, inside indexing parentheses and brackets.

@sbryngelson
Copy link
Member

I left behind = .or. .and. and few subtle things. Other than that, all spaces should be taken off around math/comparison operators, inside indexing parentheses and brackets.

yes agreed. so you already did this or not yet?

@Malmahrouqi3
Copy link
Collaborator Author

yup, you can check out the last commit PMD check.

@Malmahrouqi3
Copy link
Collaborator Author

Filter Full Implementation

                  sed -E '
                    # First handle & continuation style (modern Fortran)
                    :ampersand_loop
                    /&[[:space:]]*$/ {
                      N
                      s/&[[:space:]]*\n[[:space:]]*(&)?/ /g
                      tampersand_loop
                    }

                    # Handle fixed-form continuation (column 6 indicator)
                    :fixed_form_loop
                    /^[[:space:]]{0,5}[^[:space:]!&]/ {
                      N
                      s/\n[[:space:]]{5}[^[:space:]]/ /g
                      tfixed_form_loop
                    }

                    # Remove any remaining continuation markers
                    s/&//g

                    # Normalize spacing - replace multiple spaces with single space
                    s/[[:space:]]{2,}/ /g

                    # Remove spaces around mathematical operators
                    s/[[:space:]]*\*[[:space:]]*/*/g
                    s/[[:space:]]*\+[[:space:]]*/+/g
                    s/[[:space:]]*-[[:space:]]*/-/g
                    s/[[:space:]]*\/[[:space:]]*/\//g
                    s/[[:space:]]*\*\*[[:space:]]*/\*\*/g

                    # Remove spaces in common Fortran constructs (array indexing, function calls)
                    s/\([[:space:]]*([^,)[:space:]]+)[[:space:]]*,/(\1,/g      # First argument
                    s/,[[:space:]]*([^,)[:space:]]+)[[:space:]]*,/,\1,/g       # Middle arguments
                    s/,[[:space:]]*([^,)[:space:]]+)[[:space:]]*\)/,\1)/g      # Last argument
                    s/\([[:space:]]*([^,)[:space:]]+)[[:space:]]*\)/(\1)/g     # Single argument

                    # Remove spaces around brackets and parentheses
                    s/\[[[:space:]]*/</g
                    s/\[[[:space:]]*/>/g
                    s/\[[[:space:]]*/</g
                    s/[[:space:]]*\]/]/g
                    s/\([[:space:]]*/(/g
                    s/[[:space:]]*\)/)/g

                    # Remove spaces around comparison operators
                    s/[[:space:]]*<=[[:space:]]*/</g
                    s/[[:space:]]*>=[[:space:]]*/>/g
                    s/[[:space:]]*<[[:space:]]*/</g
                    s/[[:space:]]*>[[:space:]]*/>/g
                    s/[[:space:]]*==[[:space:]]*/==/g

                    # Remove full-line comments
                    /^\s*!/d
                    /^[cC*dD]/d
                    /^[ \t]*[cC*dD]/d

                    # Remove end-of-line comments, preserving quoted strings
                    s/([^"'\''\\]*("[^"]*")?('\''[^'\'']*'\''?)?[^"'\''\\]*)[!].*$/\1/
                  ' "$file" > "$TMP_FILE"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants