NameIt is a software tool that renames research articles in pdf files in a standardised way.
Based on the pdf metadata and on the content of the document first page, it renames the file with author, year, title, publication, and publisher. It can extract structured data from PDFs using: (1) pdf meta-data, (2) the CrossRef API, and (3) state-of-the-art layout understanding AI/ML models.
- One or several pdf files (e.g., journal articles)
- The same pdf files are renamed in a standardised way - author, year, title, publication (e.g., journal name), and publisher.
Example: "s13174-015-0028-2.pdf" as downloaded from the publisher would become "Teixeira et al. (2015). Lessons learned from applying social network analysis on an industrial Free/Libre/Open Source Software ecosystem. Journal of Internet Services and Applications. Springer.pdf" as renamed by the NameIt tool.
- MISSION 1 - Enable researchers across the world to store and find research articles in their computers, servers or shared folders in a easier and faster way.
- MISSION 2 - Advance the standardisation on how research articles in pdf format are named.
- Research articles are easier to get.
- Research articles are stored in a standardized way.
- Faster to find it
- Faster to open it
- Faster to print it (consider sustainability and ecological issues before printing)
- Less duplication of digital resources (note that you are saving space on your hard disk and saving traffic on the web)
- No need to login
- No need to connect to VPN
- No need to resolve some DOI
- No adds
- No paywalls
- No overload with related or non-related information
- You can get the pdf straight before even seeing the HTML web version
- No cookies tracking your behaviour online
- No hidden fingerprinting codes or watermarking schemes that identify the buyers for every copy of each PDF sold
- Easier to understand what is on a file as its filename encodes information on author, year, title, publication (e.g., journal name), and publisher.
- Easier to link pdf files to entries on software tools suporting systematic literature reviews (see https://aut.ac.nz.libguides.com/systematic_reviews/tools for more information).
- Easier to link pdf files to reference management systems (see https://www.helsinki.fi/en/helsinki-university-library/using-collections/courses-and-workshops/reference-management-software for more information).
- Better interopability between software tools that deal with research articles in pdf.
- Predictability (i.e., you will not be surprised on how a pdf file is named after being downloaded from a publisher website or the website of colleague).
- Easier exchangability of collection of research articles between researchers, research groups, libraries and publishers.
- Human users - Researchers or research teams that want to store and exchange files with a standard naming convention.
- Software users - Software tools supporting the management of citations, bibliometrics, references, and literature reviews have now a tool and a standard way to "file name" research articles in PDF format.
- Python 3 - Tested on > 3.10
- Habanero
- PyMuPDF
- Transformers
- PyTorch
We suport the ext3, ext4, xfs, zfs, NTFS, APFS, HFS+ and xFAT filesystems. Support for FAT32 pending. Should work in most modern computers running Linux, macOS and Windows. Problems could arise with the use of old USB flash drives and SDcard disks in old Linux Kernels.
NameIt is pure python3 code. It depends on several online public packages, published on the Python Package Index (PyPI). Those dependencies can be installed then using pip/pip3 - the package installer for Python.
$ pip3 install habanero <- required to call the Crossref API (it returns the article's metadata for a given DOI identifier).
$ pip3 install PyMuPDF <- required to process the first page of a pdf article.
$ pip3 install transformers torch <- required to use the LayoutLM AI/ML model
-
Clone/Fork the repository or download its source code
-
Install the Habanero, PyMuPDF, Transformers, PyTorch and other dependencies
$ pip3 install -r requirements.txt -
Invoke the Python script and pass the file to be renamed as an argument.
-
Last tested on Mac and Linux with Python 3.10, 3.11, 3.12 and 3.13
-
You can also pass a folder as an argument and NameIt will attempt to rename all pdf files in that folder.
$ NameIt 4242343.pdf
$ NameIt research-articles-collection
$ NameIt --help
By default, the tool first tries to find pdf's metadata to rename the file. If no metadata is found, it will look on the article first page for a DOI (Digital Object Identifier) and then tries to connect to Internet to retrieve the metadata associated to the DOI.
If pdf's metadata is not found and the retrieval of metadata via the DOI fails, the tool will still try to find the author, year, title, publication, and publisher from the article 1st page. It looks for the size of the text fonts to distinguishes between what is a title or what is the author information and so on, all by using the LayoutLMv3 pre-trained multimodal Transformer AI/ML model.
From more that 100 journal articles downloaded directly from the publishers websites. We could rename 98 without issues. Author names with no so common accents and articles titles with not so common characters can be problematic.
- A GUI version for less tech users is forthcoming <funding needed - funding being appied - new contributors welcome>.
- Support for wildcards (e.g., NameIt folder1/*pdf Elon*.pdf)
MIT license. Please acknowledge derivative works.
- First created by Jose Teixeira jose.teixeira@abo.fi
- First contribuitions by Sukrit
- Support from the Academy of Finland via the DiWIL project <see https://web.abo.fi/projekt/diwil/>
- Support from the open-science initiatives of Åbo Akademi