-
-
Notifications
You must be signed in to change notification settings - Fork 601
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Switch to an active citations parser dependency to replace pybtex
#3648
Comments
pybtex
Making changes in the We use |
Can you suggest some useful resources that might help. Also if no one is already working on this I would love to ! |
Hi @prady0t, we currently use We also use For example, the @article{Chen2020,
author = {Chen, Chang-Hui and Brosa Planella, Ferran and O'Regan, Kieran and Gastol, Dominika and Widanage, W. Dhammika and Kendrick, Emma},
title = {{Development of Experimental Techniques for Parameterization of Multi-scale Lithium-ion Battery Models}},
journal = {Journal of The Electrochemical Society},
volume = {167},
number = {8},
pages = {080534},
year = {2020},
publisher = {The Electrochemical Society},
doi = {10.1149/1945-7111/ab9050},
} gets converted to UTF-8 plaintext, using
which we use further to print citations or create citation tags. The functions in the I will suggest starting with a design for a method that takes in some BibTeX entries first (i.e., modify Note that P.S. I will label this as a medium-difficulty issue based on the extent of the tasks required. Maybe we can take (some) ideas from the citations workflow for the |
Thanks a lot for the detailed info! This would really be helpful |
Can you please explain why do we use citations and how does it help? For example :here. Also how does it register a citation above? |
Right, a little bit of background won't hurt – we use this citations workflow for letting researchers using PyBaMM to know what papers they are supposed to cite for their publications as described in the Citing PyBaMM section. A citation is registered through the An example of a citation being registered can be found by looking into the This way, the citations class keeps parsing, collecting, and preparing plaintext citations throughout the course of a scientific experiment conducted via PyBaMM's functionality (i.e., please refer to the example Jupyter notebooks). P.S. I feel in hindsight that the functionality for citation tags (see #2961 (comment)) made things a bit complicated (citations are now not parsed at runtime, but at the time of printing them) – maybe we can make this data flow more streamlined as we go forward with this dependency replacement and go back to parsing them for correctness at an appropriate position. |
I was successfully able to parse using Line 233 in c3def31
But we can make a custom function for converting to desired type of string output. Logic can look something like this : import bibtexparser
bibtex_str = """
@article{Harris2020,
title = {{Array programming with NumPy}},
author = {Harris, Charles R. and Millman, K. Jarrod and van der Walt, St{\'{e}}fan J. and Gommers, Ralf and Virtanen, Pauli and Cournapeau, David and Wieser, Eric and Taylor, Julian and Berg, Sebastian and Smith, Nathaniel J. and others},
journal = {Nature},
volume = {585},
number = {7825},
pages = {357--362},
year = {2020},
publisher = {Nature Publishing Group},
doi = {10.1038/s41586-020-2649-2},
}
@article{Sulzer2021,
title = {{Python Battery Mathematical Modelling (PyBaMM)}},
author = {Sulzer, Valentin and Marquis, Scott G. and Timms, Robert and Robinson, Martin and Chapman, S. Jon},
doi = {10.5334/jors.309},
journal = {Journal of Open Research Software},
publisher = {Software Sustainability Institute},
volume = {9},
number = {1},
pages = {14},
year = {2021}
}
"""
library = bibtexparser.parse_string(bibtex_str)
entries = library.entries
author = str
title = str
journal = str
volume = str
year = str
pages = str
txt_format = str
count = 0
for entry in entries:
#key = entry.key
count += 1
for key, value in entry.items():
if key == "author":
author = value
if key == "title":
title = value
if key == "journal":
journal = value
if key == "volume":
volume = value
if key == "year":
year = value
if key == "pages":
pages = value
txt_format = f'''[{count}] {author},"{title}" {journal} {volume} ({year}): {pages}'''
print(txt_format) Output:
Whereas output of
Obviously we can change output format as per needed. P.S. There's another parser mentioned in the article mentioned above, Citeproc-py. We can also maybe look into that (if printing like this is not satisfactory) |
Thanks for the article @prady0t, I had a brief look at
Switching to such a dependency would be a step in the wrong direction as we start supporting newer Python versions and dropping older ones every year. OTOH, P.S. you should avoid hardcoding the types of fields in a BibTeX entry as much as possible and look at a dynamic parser (the |
If we do not use any kind of hardcoding and just print the parsed values as string something like this : for entry in entries:
key = entry.key
count += 1
for key, value in entry.items():
txt_format = txt_format + " " + value
print(f"[{count}] {txt_format}")
txt_format = ' ' We gat this output :
It does print all values without data loss but their sequence depends upon bibtex entry.(Notice how title comes before names) We can somewhat fine tune the output by applying conditions like : if key != "ID" and key != "ENTRYTYPE":
txt_format = txt_format + " " + value will result in output :
Is this approach good enough? Rest I was able to modify all functions inside |
Looks great to me :) I do wonder if we can use the middlewares – it would be great to incorporate them if they can provide better outputs. We are not in a rush at this time about this (already patched up on our end temporarily and we won't add or drop support for another Python version soon), so I advise taking up as much time as you need for this for the new design. This approach might just work – though the output format is a bit clunky as you mentioned; could we try and aim to match the current Pybtex output in this fashion (the names of the authors first, followed by the paper, where it was published, etc. and the rest of the fields) as close as it is possible? We will have to test this approach across all the papers currently listed in the Are you currently able to parse a single |
Yes I am. All models are able to register respective citations and they are later printed when model1 = pybamm.lithium_ion.SPM()
nc.print_citations() Gives this output:
I've not added function of string formatting in |
Yes, this looks good – thanks for sharing. Once you are ready with the string formatting, feel free to raise a PR! We can go forward with further discussions there, or here itself if you don't wish to raise a PR just yet. |
@agriyakhetarpal I've drafted a PR so that we can see progress. |
Let's start looking for more solutions to this issue – |
If
pybtex
is dead, should we switch to something else instead?I found this active project for parsing bibtex: https://github.com/sciunto-org/python-bibtexparser
Originally posted by @kratman in #3645 (comment)
The text was updated successfully, but these errors were encountered: