Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOI and home page's URL #4

Open
feromes opened this issue Jun 14, 2022 · 6 comments
Open

DOI and home page's URL #4

feromes opened this issue Jun 14, 2022 · 6 comments

Comments

@feromes
Copy link

feromes commented Jun 14, 2022

We are engaged at CEM/FFLCH (Centro de Estudo da Metrópole) in promoting automatization in researchers' CVs. This work is remarkable to do that, but we note that DOI and URL of some works are not being imported properly.

We're going to suggest an update to fix it as well.

Thanks a lot.

feromes added a commit to cem-usp/SLattes that referenced this issue Jun 14, 2022
@feromes feromes mentioned this issue Jun 14, 2022
@arademaker
Copy link
Owner

Thank you for the PR, but:

  1. You left data from your researchers in the commit. It doesn't make sense to add data from particular resumes in this repo.

  2. I prefer to keep the readme as Org Mode and not to move to Markdown.

  3. It seems that XML specification of the Lattes files changed, is that right? Can you confirm that? Can you point me to the current DTD or XML T

@feromes
Copy link
Author

feromes commented Jun 14, 2022

Hello,

  1. Sorry, it's a mistake, because just the first commit make sense, other ones is just for our use to validate the results and other plannings we're considering.
  2. Sure, again another issue related to my mistake of PR all commits
  3. I am not sure, but I am considering it. Tomorrow I'll get back to this job and will investigate it

I'll be back here proposing a new PR, just with the changes on lattes2mods.xsl file

Sorry again about this mess I've done.

@feromes
Copy link
Author

feromes commented Jun 15, 2022

Sorry again,

Now I've done another PR #6 updating, just the lattes2mods.xsl

Answering the question:

It seems that XML specification of the Lattes files changed, is that right? Can you confirm that? Can you point me to the current DTD or XML T

I was not able to confirm if the specification has changed, but I am linking the actual version here https://memoria.cnpq.br/c/document_library/get_file?uuid=772309c0-fb72-4c6a-8c88-64b0ba46ae5d&groupId=313759

Thanks,

@arademaker
Copy link
Owner

OK, I found at https://memoria.cnpq.br/web/portal-lattes/extracoes-de-dados that CNPq now makes the definition available as an XML Schema. I downloaded my CV and tested:

With the current DTD in this repo:

% xmllint --dtdvalid LMPLCurriculo.DTD --noout ~/Downloads/curriculo.xml
/Users/ar/Downloads/curriculo.xml:1: element DADOS-GERAIS: validity error : No declaration for attribute ORCID-ID of element DADOS-GERAIS
/Users/ar/Downloads/curriculo.xml:1: element DETALHAMENTO-DA-PATENTE: validity error : Element DETALHAMENTO-DA-PATENTE content does not follow the DTD, expecting (REGISTRO-OU-PATENTE)?, got (REGISTRO-OU-PATENTE HISTORICO-SITUACOES-PATENTE)
/Users/ar/Downloads/curriculo.xml:1: element REGISTRO-OU-PATENTE: validity error : No declaration for attribute NOME-DO-DEPOSITANTE of element REGISTRO-OU-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute DESCRICAO-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute DATA-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute STATUS-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element DETALHAMENTO-DA-PATENTE: validity error : Element DETALHAMENTO-DA-PATENTE content does not follow the DTD, expecting (REGISTRO-OU-PATENTE)?, got (REGISTRO-OU-PATENTE HISTORICO-SITUACOES-PATENTE)
/Users/ar/Downloads/curriculo.xml:1: element REGISTRO-OU-PATENTE: validity error : No declaration for attribute NOME-DO-DEPOSITANTE of element REGISTRO-OU-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute DESCRICAO-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute DATA-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element HISTORICO-SITUACOES-PATENTE: validity error : No declaration for attribute STATUS-SITUACAO-PATENTE of element HISTORICO-SITUACOES-PATENTE
/Users/ar/Downloads/curriculo.xml:1: element DADOS-BASICOS-DE-ORIENTACOES-CONCLUIDAS-PARA-MESTRADO: validity error : Value "NAO_INFORMADO" for attribute TIPO of DADOS-BASICOS-DE-ORIENTACOES-CONCLUIDAS-PARA-MESTRADO is not among the enumerated set
Document /Users/ar/Downloads/curriculo.xml does not validate against LMPLCurriculo.DTD

Using the new XSD downloaded from the above link:

% xmllint --schema CurriculoLattes.xsd --noout ~/Downloads/curriculo.xml
/Users/ar/Downloads/curriculo.xml validates

But note that none of the changes were addressed by your PR. I will make specific comments in the #6

@arademaker arademaker mentioned this issue Jun 16, 2022
@arademaker
Copy link
Owner

@feromes você pode colocar aqui um link para o CV que vc identificou que, quando processado pela transformação lattes2mods.xsl não tem a URL e DOI capturados?

@arademaker
Copy link
Owner

@feromes vc também poderia confirmar se o CV que vc está tentando aplicar a transformação passa na validação usando o novo XSD? Veja como fazer a validação no README, acabei de atualizar as instruções. Difícil termos respostas do CNPq, mas parece que os XML agora seguem esta especificação XSD. O site http://lmpl.cnpq.br/lmpl/, que antes eu usava como referencia, parece abandonado.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants