Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jorge/fix multiword surnames #213

Merged
merged 2 commits into from
Dec 17, 2024
Merged

Jorge/fix multiword surnames #213

merged 2 commits into from
Dec 17, 2024

Conversation

jorgefandinno
Copy link
Contributor

This closes #184 regarding multi-word last names. It also fixes inconsistencies with the von part in many names.
For instance, both K. von Luck and K. {von Luck} appeared in krr.bib and should be written as the former.
Similarly, names with capitalized von parts such as D. {Van Nieuwenborgh} should be written as D. {V}an Nieuwenborgh (https://us.mirrors.cicku.me/ctan/info/bibtex/tamethebeast/ttb_en.pdf, page 26).

The conversion can be automatically applied again if needed by the command

python last_name_format_transition/authfmt_t.py 

The folder last_name_format_transition and its content can be removed after this is merged.

The script authfmt.py has been updated to produce the new format.

@rkaminsk
Copy link
Contributor

rkaminsk commented Dec 9, 2024

Is this ready to be merged? I looked at the bibfile and it looks okay.

@rkaminsk
Copy link
Contributor

rkaminsk commented Dec 9, 2024

Names that do not follow bibtex rules but are already present in the
bibliography are parsed correctly. For instance, if {Manuel Ojeda} Aciego is
already present in the bibliography in that format, new entries with Manuel Ojeda Aciego will be parsed correctly. This means that names that do not
follow the bibtex rules only need to be protected by braces the first time they
are introduced in the bibliography.

How do you decide what is the old and new entry?

@jorgefandinno
Copy link
Contributor Author

jorgefandinno commented Dec 9, 2024

Is this ready to be merged? I looked at the bibfile and it looks okay.

If you agree with the new format, it is ready.

For the lack of safety with edited ones, I trust people know what they are doing. Or do we need some extra safety measure? (see below)

@jorgefandinno
Copy link
Contributor Author

jorgefandinno commented Dec 9, 2024

Names that do not follow bibtex rules but are already present in the
bibliography are parsed correctly. For instance, if {Manuel Ojeda} Aciego is
already present in the bibliography in that format, new entries with Manuel Ojeda Aciego will be parsed correctly. This means that names that do not
follow the bibtex rules only need to be protected by braces the first time they
are introduced in the bibliography.

How do you decide what is the old and new entry?

It does not distinguish between old and new entries. Only between names with multi-word last names are otherwise.
If they do not have braces, the bibtex rules usually force them to have a single-word last name (there are exceptions if they also have a von part). If they have braces or use the comma format, they may have multi-words last names. The idea is that unedited names can be introduced after the first time. For instance, dblp names are provided without braces, and in this way, they need to be edited only once.

It does not have a safe measure if someone decides to introduce a wrongly edited one.

For instance, if the bibliography contains an entry with the name Juan Carlos Nieves, which is correct, and someone introduced a new entry with the name Juan {Carlos Nieves}, which is incorrect, then all entries will be incorrectly rewritten as the latter.

I trust people know what they are doing, and that in the worst case, this will not be merged. Or do we need some extra safety measures?

@rkaminsk
Copy link
Contributor

rkaminsk commented Dec 9, 2024

I trust people know what they are doing, and that in the worst case, this will not be merged. Or do we need some extra safety measures?

To explicitly mark the correct one, we would need a separate file with author names. Since I do not trust that everyone opening PRs here really has an eye for such kind of details, I would like to delegate this question to @tortinator. He is usually the one doing the final merge.

@tortinator
Copy link
Contributor

Hi @jorgefandinno and @rkaminsk, thanks for your great efforts! What would be needed as well is a concise description that we could put into our documentation for the other members in the group.
To be honest, I thought that composed last names are safest written with braces, so that strange bibstyles cannot do harm. Bunow the idea is to loosen this up again..? Sure that (most) styles can handle that?

@jorgefandinno
Copy link
Contributor Author

jorgefandinno commented Dec 10, 2024

Hi @jorgefandinno and @rkaminsk, thanks for your great efforts! What would be needed as well is a concise description that we could put into our documentation for the other members in the group. To be honest, I thought that composed last names are safest written with braces, so that strange bibstyles cannot do harm. Bunow the idea is to loosen this up again? Sure that (most) styles can handle that?

bibtex distinguish between von and last part. For instance, D. {V}an Nieuwenborgh is parsed as

first="D." 
von="Van"
last="Nieuwenborgh"

while D. {Van Nieuwenborgh} is parsed as

first="D." 
von=""
last="Van Nieuwenborgh"

My understanding is that entries should be in order according to the last part and ignore the von part. Anyway, this is up to the .sty file to decide. Both tplp and aaai styles ignore this rule and produce the same result with both formats. llncs does apply the rule.

D. {V}an Nieuwenborgh gives:

image

and D. {Van Nieuwenborgh} gives

image

It will also help in the future to produce keys of entries automatically. The key for this entry took ni as the two first letters of the surname, not va.

@rkaminsk
Copy link
Contributor

To be honest, I thought that composed last names are safest written with braces, so that strange bibstyles cannot do harm. Bunow the idea is to loosen this up again..? Sure that (most) styles can handle that?

I'd follow the recommended way here. My two cents regarding broken bibtex styles: that's for the editor to fix.

@jorgefandinno jorgefandinno force-pushed the jorge/fix-multiword-surnames branch from 522de61 to 90483b3 Compare December 11, 2024 15:22
@jorgefandinno
Copy link
Contributor Author

Hi @jorgefandinno and @rkaminsk, thanks for your great efforts! What would be needed as well is a concise description that we could put into our documentation for the other members in the group.

@tortinator, added a description to the README.

@jorgefandinno
Copy link
Contributor Author

@tortinator Are we missing something to merge?

@rkaminsk
Copy link
Contributor

I'll have a look and take care of merging.

@tortinator
Copy link
Contributor

@tortinator Are we missing something to merge?

All good @jorgefandinno , I am just busy and asked @rkaminsk to care about putting things in place 😇

@rkaminsk rkaminsk force-pushed the jorge/fix-multiword-surnames branch from b1aaa4e to 3e0d7f6 Compare December 17, 2024 14:40
@rkaminsk
Copy link
Contributor

Hi @jorgefandinno, I tried to streamline and refactor a bit. If this is okay for you, I would merge.

@rkaminsk rkaminsk force-pushed the jorge/fix-multiword-surnames branch from 4ddd558 to b587a44 Compare December 17, 2024 15:13
@jorgefandinno
Copy link
Contributor Author

Hi @jorgefandinno, I tried to streamline and refactor a bit. If this is okay for you, I would merge.

It looks good to me.

I would pin the version in the README for:

pip install bibtexparser

because there is a new API incompatible version

@rkaminsk
Copy link
Contributor

Hi @jorgefandinno, I tried to streamline and refactor a bit. If this is okay for you, I would merge.

It looks good to me.

I would pin the version in the README for:

pip install bibtexparser

because there is a new API incompatible version

Seems like we have to upgrade at some point. But let's not do it now.

@rkaminsk rkaminsk force-pushed the jorge/fix-multiword-surnames branch from 4d61f26 to f923f14 Compare December 17, 2024 16:56
@rkaminsk rkaminsk force-pushed the jorge/fix-multiword-surnames branch from f923f14 to 6241634 Compare December 17, 2024 17:00
@rkaminsk rkaminsk merged commit f0e0693 into master Dec 17, 2024
1 check passed
@rkaminsk rkaminsk deleted the jorge/fix-multiword-surnames branch December 17, 2024 17:02
@rkaminsk
Copy link
Contributor

Done. Hope it will work. 🤞

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Two words surnames do not work well with TPLP format
3 participants