Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Help retrieving abstract and keywords #197

Open
ZuaniaColon opened this issue Oct 13, 2024 · 0 comments
Open

Help retrieving abstract and keywords #197

ZuaniaColon opened this issue Oct 13, 2024 · 0 comments

Comments

@ZuaniaColon
Copy link

Hi, I am trying to retrieve all the metadata from the manuscripts searched in PubMed. I was able to get most of them using the functions extract_from_esummary and linkout_urls in Rentrez, but I need help getting the abstract and keywords. I tried retrieving the abstracts using the pipeline you shared using the XML package. I got 41 abstracts, but I only have 35 papers. Can you help me? Here is the code:

search

PubMed_search = entrez_search(db = "pubmed",
term = query,
retmax = 30000,
use_history=TRUE)

summary

PubMed_search_summs <- entrez_summary(db="pubmed",
web_history = PubMed_search$web_history)

extracting information into a list

df_ngs_records <- list(pubmed_id = extract_from_esummary(PubMed_search_summs, "uid"),
pmc_id = extract_from_esummary(PubMed_search_summs, "articleids"),
publication_type = extract_from_esummary(PubMed_search_summs, "pubtype"),
date = extract_from_esummary(PubMed_search_summs, "pubdate"),
article_title = extract_from_esummary(PubMed_search_summs, "title"),
language = extract_from_esummary(PubMed_search_summs, "lang"),
authors = extract_from_esummary(PubMed_search_summs, "authors"),
journal = extract_from_esummary(PubMed_search_summs, "fulljournalname"),
journal_abbr = extract_from_esummary(PubMed_search_summs, "source"),
volume = extract_from_esummary(PubMed_search_summs, "volume"),
issue = extract_from_esummary(PubMed_search_summs, "issue"),
pages = extract_from_esummary(PubMed_search_summs, "pages"),
doi = extract_from_esummary(PubMed_search_summs, "elocationid"))

doi

df_ngs_records$urls = linkout_urls(entrez_link(dbfrom = "pubmed",
id = df_ngs_records$pubmed_id,
cmd="llinks"))

fetch

PubMed_fetch <- entrez_fetch(db = "pubmed",
id = PubMed_search$ids,
rettype = "xml",
parsed = TRUE)

abstract

df_ngs_records$abstract = xpathSApply(PubMed_fetch, "//Abstract/AbstractText", xmlValue)
length(df_ngs_records$abstract) # 41, but only 35 papers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant