EPrints Technical Mailing List Archive
Message: #08591
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Plan S - Persistent Identifiers
- To: <eprints-tech@ecs.soton.ac.uk>, James Kerwin <jkerwin2101@gmail.com>
- Subject: Re: [EP-tech] Plan S - Persistent Identifiers
- From: David R Newman <drn@ecs.soton.ac.uk>
- Date: Wed, 28 Apr 2021 10:50:45 +0100
Hi James,
Fortunately (or unfortunately) I have had quite a few thoughts on
the matter. I have done my best to keep them to the point.
First, I don't think it is possible to account for the same item being in multiple repositories. As an individual institutional repository owner you have no control over other institutional repositories who may have shared authors on publications and have the right to make the same publication available on their institutional repositories. Having a background in the Semantic Web, trying to determine if two things with different unique identifiers are actually the same thing is a near impossible problem to solve definitively. The best you can do is ensure the same unique identifier is not somehow used for two different things and also avoid creating and using more unique identifiers than are absolutely necessary.
EPrints has always had a unique identifier in the form of a URI
(e.g. http://eprints.example.org/id/eprint/123). I would suggest
this is the most appropriate unique identifier to use as every
item in your repository will have one but not every item will
necessarily have a DOI or similar unique identifier. You could
configure your repository to use a DOI minting service (e.g. data
repositories often use DataCite) but this rather breaks the rule
of creating more unique identifiers than are absolutely
necessary.
One potential problem I have noted with EPrints URIs is that
these were all originally http but if you modify you HTTPS
configuration to ensure HTTPS is used everywhere, then these URIs
will likely also be changed to https, making them non-persistent
which is another big no-no. For this reason, early on in EPrints
3.4 I introduced a configuration properly 'uri_url' to ensure that
you could modify a repository's HTTPS configuration but if you had
this configuration option set you could keep the URIs as http. As
in the context of being a unique identifier, you need to consider
the URI as being a string of characters and if this string of
characters changes, then it is no longer the same unique
identifier, even though it is still describing the same thing.
I think you also identified another potential problem with the
structure of an EPrints URI, which is if there is a change to the
hostname of the repository itself. Again the uri_url option
should allow you to ensure URIs do not change. Unfortunately,
this may lead to confusion for users who wonder why the hostname
for these URIs is different to the hostname of the repository.
Also, depending what happens to the old hostname's DNS
registration these URIs may become unresolvable. However, there
is no requirement for URIs, as any unique identifier, to be
resolvable.
If an item has a DOI provided by a journal, an ISBN provided by a book publisher, etc. then this would typically be more useful than an institutional repository's URI, as this would be used in a general context (i.e. you would expect a DOI or ISBN to appear in the citation for such an item). However, I think to provide the best possible coverage there is need for both forms for unique identifier: the one from the original publisher (if that is not the institutional repository, which would likely be the case for theses, etc.) and one from the institutional repository. If you provide export formats that can be ingested by third-party applications that include both unique identifiers and therefore build a link between the two, it is possible to build and network of unique identifiers for a particular item. Then when you get a journal article that has authors from multiple institutions, it will be possible to see that a publication from institution A is the same publication as from institution B.
Regards
David Newman
On 28/04/2021 10:02, James Kerwin via Eprints-tech wrote:
CAUTION: This e-mail originated outside the University of Southampton.Hi All,
For once I have not broken anything, just looking for opinions and advice.
As part of Plan S we need to have persistent identifiers for scholarly publications. I have read this EPrints wiki:
https://wiki.eprints.org/w/Plan_S
At Liverpool we aren't 100% sure about this topic. DOI would be the obvious choice, but there are some on my team who reasonably point out that the same item could be in several repositories and end up having several separate DOIs associated with it. I'm not sure how much that matters.
Does anybody have any thoughts on this point? We spoke with my predecessor, Adam, who was really helpful. Unconvinced team members have suggested using handle.net which I think is overkill and doesn't necessarily meet the needs of Plan S in itself.
Also, the URL/EPrints ID for each item, is this not a suitable persistent identifier? The wiki linked above does mention this. There's always the possibility a repository URL could change in the future, but I would expect some sort of redirect to overcome this.
If there is a more suitable place for this type of discussion please send me there.
Thanks,James
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/
- References:
- [EP-tech] Plan S - Persistent Identifiers
- From: James Kerwin <jkerwin2101@gmail.com>
- [EP-tech] Plan S - Persistent Identifiers
- Prev by Date: [EP-tech] Plan S - Persistent Identifiers
- Next by Date: [EP-tech] Ask about upload progress
- Previous by thread: [EP-tech] Sort view with creators_name and corp_creators
- Index(es):