Skip to content

fix #337 - line splicing in comment not handled properly #431

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 21 additions & 8 deletions simplecpp.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -767,17 +767,30 @@ void simplecpp::TokenList::readfile(Stream &stream, const std::string &filename,

// comment
else if (ch == '/' && stream.peekChar() == '/') {
while (stream.good() && ch != '\r' && ch != '\n') {
while (stream.good() && ch != '\n') {
currentToken += ch;
ch = stream.readChar();
if(ch == '\\') {
TokenString tmp;
char tmp_ch = ch;
while((stream.good()) && (tmp_ch == '\\' || tmp_ch == ' ' || tmp_ch == '\t')) {
tmp += tmp_ch;
tmp_ch = stream.readChar();
}
if(!stream.good()) {
break;
}

if(tmp_ch != '\n') {
currentToken += tmp;
} else {
++multiline;
tmp_ch = stream.readChar();
}
ch = tmp_ch;
}
}
const std::string::size_type pos = currentToken.find_last_not_of(" \t");
if (pos < currentToken.size() - 1U && currentToken[pos] == '\\')
portabilityBackslash(outputList, files, location);
if (currentToken[currentToken.size() - 1U] == '\\') {
++multiline;
currentToken.erase(currentToken.size() - 1U);
} else {
if (ch == '\n') {
stream.ungetChar();
}
}
Expand Down
23 changes: 20 additions & 3 deletions test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -176,8 +176,8 @@ static void backslash()

readfile("//123 \\\n456", &outputList);
ASSERT_EQUALS("", toString(outputList));
readfile("//123 \\ \n456", &outputList);
ASSERT_EQUALS("file0,1,portability_backslash,Combination 'backslash space newline' is not portable.\n", toString(outputList));
//readfile("//123 \\ \n456", &outputList);
//ASSERT_EQUALS("file0,1,portability_backslash,Combination 'backslash space newline' is not portable.\n", toString(outputList));

outputList.clear();
readfile("#define A \\\n123", &outputList);
Expand Down Expand Up @@ -436,7 +436,24 @@ static void comment_multiline()
const char code[] = "#define ABC {// \\\n"
"}\n"
"void f() ABC\n";
ASSERT_EQUALS("\n\nvoid f ( ) { }", preprocess(code));
ASSERT_EQUALS("\n\nvoid f ( ) {", preprocess(code));
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only line splicing test we have?

I think we should test what happens when there are multiple newlines after the :

//  x \\\n
\n
\n
...

and an explicit test for \r\n:

//  x \\\r\n
...

and I wonder what the behavior (according to standard / according to gcc+msvc+etc) is if there is space after the backslash:

// x \\ \n
...

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly how the location is adjusted right now. Will the locations below the line splicing comment be correct? i.e.:

// hello \
world
x=1;

will x=1; line number be 3?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the only line splicing test we have?

I think we should test what happens when there are multiple newlines after the :

//  x \\\n
\n
\n
...

and an explicit test for \r\n:

//  x \\\r\n
...

and I wonder what the behavior (according to standard / according to gcc+msvc+etc) is if there is space after the backslash:

// x \\ \n
...

Hi Danmar,

I wonder what the behavior (according to standard / according to gcc+msvc+etc) is if there is space after the backslash

I checked this with the gcc. The gcc version is 13.3.0.
If there is space after the backslash, the next line will be commented. That is different with the case there is a non-space char after the backslash.

#include <stdio.h>

int main() {
    int x;
    //abc \    
    printf("there is a space\n");
    //abc \x
    printf("there is a char x\n");
    //abc \
    printf("there is nothing\n");
    return 0;
}

Output:
there is a char x

The 1st and 3rd output both didn't happen.
Interetingly, the editor seems have a different behavior with the gcc. I used the geditor on ubuntu. The editor will display the comment line with a color of purple. But it doesn't treat the first print as a comment line.
comment
But obviously I think we should follow the rule of the gcc.

Copy link
Author

@clock999 clock999 Jun 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember exactly how the location is adjusted right now. Will the locations below the line splicing comment be correct? i.e.:

// hello \
world
x=1;

will x=1; line number be 3?

Hi Danmar,
I made a test for how we handle the comment marked by /**/.

readfile("abc/*123\\\n456\n789*/xxx")

output:
abc /*123456789*/ xxx

If there is a '\' in the comments, all the '\n' will be erased. As the multiline is not 0, location.adjust will not be called, and 'xxx' is not a newline, so the loc of 'xxx' will not be added with the multiline.

readfile("abc/*123\n456\n789*/xxx")

output:
abc /*123
456
789*/xxx

If there is only the '\n', multiline is not set. And the loc of the next token is adjusted. The loc will be added by one with '\r', '\n' or '\r\n' in the location.adjust.

For:

// hello \
world
x=1;

According to /**/, '\\n' or '\n' will add multiline by one. As 'x=1' is on a new line, it's line number of loc will be added with the multiline. So it will be 3, which is 'location.line += multiline + 1' ,and multiline = 1. I will update the commit according to the way of handling the multiline comments.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! good that you tested this


const char code1[] = "#define ABC {// \\\r\n"
"}\n"
"void f() ABC\n";
ASSERT_EQUALS("\n\nvoid f ( ) {", preprocess(code1));

const char code2[] = "void f() {// \\ \n}\n";
ASSERT_EQUALS("void f ( ) {", preprocess(code2));

const char code3[] = "void f() {// \\\\\\\t\t\n}\n";
ASSERT_EQUALS("void f ( ) {", preprocess(code3));

const char code4[] = "void f() {// \\\\\\a\n}\n";
ASSERT_EQUALS("void f ( ) {\n}", preprocess(code4));

const char code5[] = "void f() {// \\\n\n\n}\n";
ASSERT_EQUALS("void f ( ) {\n\n\n}", preprocess(code5));
}


Expand Down