Skip to content

Commit bc80a34

Browse files
omkarrr2533koppor
andauthored
Support html when parsing arXiv identifiers (#14474)
* Support html when parsing arXiv identifiers - Added 'html' to ARXIV_PREFIX regex pattern to recognize arxiv.org/html/ URLs - Added test cases for HTML URLs with HTTP/HTTPS and with/without version numbers Fixes #14451 * Fix checkstyle violations - separate variable declarations * Discard changes to .idea/codeStyles/Project.xml * Add CHANGELOG.md entry * Fix alphabetical ordering --------- Co-authored-by: Oliver Kopp <[email protected]>
1 parent d61b659 commit bc80a34

File tree

3 files changed

+23
-1
lines changed

3 files changed

+23
-1
lines changed

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -29,6 +29,7 @@ Note that this project **does not** adhere to [Semantic Versioning](https://semv
2929
- We added the generation of follow-up questions in AI chat. [#12243](https://github.com/JabRef/jabref/issues/12243)
3030
- We added support for getting bibliographic information based on the arXiv ID or the ISSN. [#14458](https://github.com/JabRef/jabref/pull/14458)
3131
- We added support for "Search Google Scholar" and "Search Semantic Scholar" to quickly search for a selected entry's title in Google Scholar or Semantic Scholar directly from the main table's context menu [#12268](https://github.com/JabRef/jabref/issues/12268)
32+
- We added support for `html` when parsing the arXiv identifiers. [#14451](https://github.com/JabRef/jabref/issues/14451)
3233
- When parsing a plain text citation, we added support for recognizing and extracting arXiv identifiers. [#14455](https://github.com/JabRef/jabref/pull/14455)
3334
- We introduced a new "Search Engine URL Template" setting in Preferences to allow users to customize their search engine URL templates [#12268](https://github.com/JabRef/jabref/issues/12268)
3435

jablib/src/main/java/org/jabref/model/entry/identifier/ArXivIdentifier.java

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@
2020
public class ArXivIdentifier extends EprintIdentifier {
2121
private static final Logger LOGGER = LoggerFactory.getLogger(ArXivIdentifier.class);
2222

23-
private static final String ARXIV_PREFIX = "http(s)?://arxiv.org/(abs|pdf)/|arxiv|arXiv";
23+
private static final String ARXIV_PREFIX = "http(s)?://arxiv.org/(abs|html|pdf)/|arxiv|arXiv";
2424
private final String identifier;
2525
private final String classification;
2626
private final String version;

jablib/src/test/java/org/jabref/model/entry/identifier/ArXivIdentifierTest.java

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -171,4 +171,25 @@ void constructCorrectURLForEprint() throws URISyntaxException {
171171
Optional<ArXivIdentifier> parsed = ArXivIdentifier.parse("0706.0001v1");
172172
assertEquals(Optional.of(new URI("https://arxiv.org/abs/0706.0001v1")), parsed.get().getExternalURI());
173173
}
174+
175+
@Test
176+
void parseHtmlUrl() {
177+
Optional<ArXivIdentifier> parsed = ArXivIdentifier.parse("https://arxiv.org/html/2511.01348v2");
178+
179+
assertEquals(Optional.of(new ArXivIdentifier("2511.01348", "2", "")), parsed);
180+
}
181+
182+
@Test
183+
void parseHtmlUrlWithoutVersion() {
184+
Optional<ArXivIdentifier> parsed = ArXivIdentifier.parse("https://arxiv.org/html/2511.01348");
185+
186+
assertEquals(Optional.of(new ArXivIdentifier("2511.01348", "", "")), parsed);
187+
}
188+
189+
@Test
190+
void parseHttpHtmlUrl() {
191+
Optional<ArXivIdentifier> parsed = ArXivIdentifier.parse("http://arxiv.org/html/1502.05795v1");
192+
193+
assertEquals(Optional.of(new ArXivIdentifier("1502.05795", "1", "")), parsed);
194+
}
174195
}

0 commit comments

Comments
 (0)