EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #06076
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Antwort: Re: Antwort: Antwort: Re: Antwort: Re: fail to import PubMedID
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Antwort: Re: Antwort: Antwort: Re: Antwort: Re: fail to import PubMedID
- From: martin.braendle@id.uzh.ch
- Date: Tue, 8 Nov 2016 13:16:11 +0100
Hi Justin,
it looks like your endpoint is returning HTML (entity-escaped XML embedded in a <pre> tag). And the DOCTYPE is HTML.
Best regards,
Martin
--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich
mail: martin.braendle@id.uzh.ch
phone: +41 44 63 56705
fax: +41 44 63 54505
http://www.zi.uzh.ch
Justin Bradley ---08/11/2016 13:03:15---Thanks Martin. I was just starting to look into this too. But I’ll look to use yours instead.
Von: Justin Bradley <jb4@ecs.soton.ac.uk>
An: eprints-tech@ecs.soton.ac.uk
Datum: 08/11/2016 13:03
Betreff: Re: [EP-tech] Antwort: Antwort: Re: Antwort: Re: fail to import PubMedID
Gesendet von: eprints-tech-bounces@ecs.soton.ac.uk
Thanks Martin.
I was just starting to look into this too. But I’ll look to use yours instead.
Just to double check while we are looking. Should we still be using the same end point, or should we move over to something more like:
https://www.ncbi.nlm.nih.gov/pubmed/?term=(26686599[PMID])&report=xml&format=xml
Regards,
Justin
- On 8 Nov 2016, at 11:52, martin.braendle@id.uzh.ch wrote:
I have published our version of the PubMedID Import plugin to
https://github.com/eprintsug/PubMedID-Import
It has been updated to cope with the https protocol that NCBI uses and also contains some code that does a duplicate check in the EPrints repo. See also attached phrases files (English and German).
Feel free to use from this code whatever you think is useful for your implementation.
Best regards,
Martin
--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich
mail: martin.braendle@id.uzh.ch
phone: +41 44 63 56705
fax: +41 44 63 54505
http://www.zi.uzh.ch
<graycol.gif>jens.vieler---07/11/2016 16:05:41---...i think, it is more general if XML::LibXML can't deal with https. So it's here: perl_lib/EPrints/
Von: jens.vieler@id.uzh.ch
An: eprints-tech@ecs.soton.ac.uk
Datum: 07/11/2016 16:05
Betreff: [EP-tech] Antwort: Re: Antwort: Re: fail to import PubMedID
Gesendet von: eprints-tech-bounces@ecs.soton.ac.uk
...i think, it is more general if XML::LibXML can't deal with https. So it's here: perl_lib/EPrints/XML/LibXML.pm (Line 69) and 'XML::LibXML->new();' is the wrong parser for our needs.
What would you suggest? Changing Import/PubMedID.pm and bin/metadata_update from anything like
EPrints::XML::parse_url( $url );
to something like
- using LWP to retrieve it
- then LibXML to decode it to xml
or create a more general and new EPrints::XML module?
Workarounds or other quick & dirtys are also welcome
Jens
--
Jens Vieler
Zentrale Informatik
Universität Zürich
Stampfenbachstrasse 73
CH-8006 Zürich
mail: jens.vieler@id.uzh.ch
phone: +41 44 63 56777
http://www.id.uzh.ch
<graycol.gif>Adam Field ---07.11.2016 14:39:46---….on, incidentally, it’s this line: https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Plu
Von: Adam Field <Adam.Field@jisc.ac.uk>
An: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Datum: 07.11.2016 14:39
Betreff: Re: [EP-tech] Antwort: Re: fail to import PubMedID
Gesendet von: eprints-tech-bounces@ecs.soton.ac.uk
….on, incidentally, it’s this line:
https://github.com/eprints/eprints/blob/3.3/perl_lib/EPrints/Plugin/Import/PubMedID.pm#L58
SHERPA services analyst developer |
From: Adam Field <Adam.Field@jisc.ac.uk>
Date: Monday, 7 November 2016 13:32
To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] Antwort: Re: fail to import PubMedID
I can confirm this – I can also download the metadata via https using curl.
Jens’ suggestions are good. We should be able to respond to this kind of thing as a community – it’s a non-core, simple bug. I’m happy to offer advice, code review and testing if anyone wants to give it a stab. Alternatively, is there anyone out there who can offer me the same if I take a stab?
Best
SHERPA services analyst developer |
From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of "jens.vieler@id.uzh.ch" <jens.vieler@id.uzh.ch>
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Date: Monday, 7 November 2016 10:45
To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: [EP-tech] Antwort: Re: fail to import PubMedID
Dear Adam, Hiroshi, List
Watching the same since this morning #-) ...they changed to https this weekend.
wget'ing https works fine, but we canot simply change the protocol in our script, because it seems LibXML can't handle it. So what about getting the https from out of the script and change parse_url into parse_file on that local file. Or change to LWP::Protocol::https?
Jens
--
Jens Vieler
Zentrale Informatik
Universität Zürich
Stampfenbachstrasse 73
CH-8006 Zürich
mail: jens.vieler@id.uzh.ch
phone: +41 44 63 56777
http://www.id.uzh.ch
<35252086.gif>Adam Field ---07.11.2016 11:30:30---Visiting the URL, I get: <eFetchResult>
Von: Adam Field <Adam.Field@jisc.ac.uk>
An: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Datum: 07.11.2016 11:30
Betreff: Re: [EP-tech] fail to import PubMedID
Gesendet von: eprints-tech-bounces@ecs.soton.ac.uk
Visiting the URL, I get:
<eFetchResult>
<ERROR>WebEnv parameter is required</ERROR>
</eFetchResult>
If I add a dummy WebEnb parameter, I get:
<eFetchResult>
<ERROR>query_key parameter is required</ERROR>
</eFetchResult>
…it looks like the API the plugin is using has changed L It’s unlikely to be a local problem.
SHERPA services analyst developer |
From: <eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Hiroshi Watabe <hwatabe@m.tohoku.ac.jp>
Organization: CYRIC
Reply-To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Date: Monday, 7 November 2016 01:27
To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Subject: [EP-tech] fail to import PubMedID
Dear all,
It seems PubMed only accepts https now and I cannot import PubMed ID
anymore. I got the following warning message.
Unhandled warning in Import::PubMedID: http error : Unknown IO error
I modified PubMedID.pm as follows but no success.
27c27
< $self->{EFETCH_URL} =
'http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&rettype=full';
---
$self->{EFETCH_URL} =
'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=pubmed&retmode=xml&rettype=full';
Error message is as follows;
Unhandled exception in Import::PubMedID: Could not create file parser
context for file
Could you help me?
Hiroshi
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
Jisc is a registered charity (number 1149740) and a company limited by guarantee which is registered in England under Company No. 5747339, VAT No. GB 197 0632 86. Jisc’s registered office is: One Castlepark, Tower Hill, Bristol, BS2 0JA. T 0203 697 5800.
Jisc Services Limited is a wholly owned Jisc subsidiary and a company limited by guarantee which is registered in England under company number 2881024, VAT number GB 197 0632 86. The registered office is: One Castle Park, Tower Hill, Bristol BS2 0JA. T 0203 697 5800. *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
--
Justin Bradley
Strategy & Technical Lead
EPrints Services
University of Southampton
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
- References:
- [EP-tech] fail to import PubMedID
- From: Hiroshi Watabe <hwatabe@m.tohoku.ac.jp>
- Re: [EP-tech] Antwort: Re: fail to import PubMedID
- From: Adam Field <Adam.Field@jisc.ac.uk>
- Re: [EP-tech] Antwort: Re: fail to import PubMedID
- From: Adam Field <Adam.Field@jisc.ac.uk>
- [EP-tech] fail to import PubMedID
- Prev by Date: Re: [EP-tech] Antwort: Antwort: Re: Antwort: Re: fail to import PubMedID
- Next by Date: [EP-tech] Booking open for EPrints User Group and Hack Day
- Previous by thread: Re: [EP-tech] Antwort: Antwort: Re: Antwort: Re: fail to import PubMedID
- Next by thread: [EP-tech] Booking open for EPrints User Group and Hack Day
- Index(es):