Handle "clusters" on paper extraction #85

bzz · 2022-09-26T12:27:05Z

On extracting publications (papers) from emails, a class of papers that in email look like

https://scholar.google.com/scholar?cluster=14905208172666766997&hl=en&oi=scholaralrt&hist=KBiQzPUAAAAJ:3103465405719670724:AAGBfm3tO_7Uk2dTXZseJcyJq0Kjaug97Q&html=&folt=rel

are skipped (14 papers out of +2k) as ATM we use a regex to extract the pdf URL from such links and it fails to match.
Instead of the usual /scholar_url?url=<url-to-the.pdf> pattern, these links looks like /scholar?cluster=14905208172666766997&... and a way to get the URL to individual pdf (any from the cluster) is not obvious.

One option is too keep those links as-is, so the user will have to choose the PDF from a scholar page themselves.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle "clusters" on paper extraction #85

Handle "clusters" on paper extraction #85

bzz commented Sep 26, 2022 •

edited

Loading

Handle "clusters" on paper extraction #85

Handle "clusters" on paper extraction #85

Comments

bzz commented Sep 26, 2022 • edited Loading

bzz commented Sep 26, 2022 •

edited

Loading