EPrints Technical Mailing List Archive

Message: #08212


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] DSpace Harvester and OAI_Bibliography.pm


You're right.

OAI plugins works in this way:

for every archive record, for every plugin, do the metadata format. But OAI_Bibliography works only for items with bibliography
, not for all the records. So, it should not be used as generic oai plugin which expect to have valid metadata for every item. Bibliography has valid metadata only on bibliography items.

You can disable it in the config, being it a plugin:

$c->{plugins}{"Export::OAI_Bibliography"}{params}{disable} = 1;

I think this should be a default setting, maybe worth a pull request on the git repository here:

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Flib%2Fdefaultcfg%2Fcfg.d%2Fplugins.pl&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca25dc2382e0b44b9d02008d81414cbef%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=jeLEcjRDezn6X486Fo9igN1qrzGA9%2F0aWLl%2BxoSrfxA%3D&amp;reserved=0

Il 18/06/20 20:50, Tomasz Neugebauer ha scritto:
Hi Yuri, thank you for the detailed info.  Yes, it looks like an issue with DSpace harvester.

The issue did make me think about our oai_bibl metadata prefix, though, is that OAI_Bibliography.pm file doing something useful, if it suggests a metadata prefix in the OAI endpoint that returns empty records?    If anyone has any comments on this, that's great, but the harvester question is resolved AFAIK, it should be requesting a specific prefix oai_dc.

Tomasz


-----Original Message-----
From: eprints-tech-bounces@ecs.soton.ac.uk <eprints-tech-bounces@ecs.soton.ac.uk> On Behalf Of Yuri via Eprints-tech
Sent: June 18, 2020 4:51 AM
To: eprints-tech@ecs.soton.ac.uk
Subject: Re: [EP-tech] DSpace Harvester and OAI_Bibliography.pm

Attention This email originates from outside the concordia.ca domain. // Ce courriel provient de l'exterieur du domaine de concordia.ca I would exclude this format/plugin from oai2 in:

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Fcgi%2Foai2%23L559&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca25dc2382e0b44b9d02008d81414cbef%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=eTpePCErS2X5P4%2FP98lM1DWmOib%2BXPvZQVgXbME%2FP0w%3D&amp;reserved=0

or you can change sort here (weak):

https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Feprints%2Feprints%2Fblob%2F3.3%2Fcgi%2Foai2%23L565&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca25dc2382e0b44b9d02008d81414cbef%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=5MPma7%2BgUxXVqi8BXPuRZHQnB0yxvZxOn6MSNxE4dNU%3D&amp;reserved=0

I think this is an issue in DSpace, it should use always oai_dc as default format (instead of checking schema, the OAI specs cite oai_dc).


Il 17/06/20 20:22, Tomasz Neugebauer via Eprints-tech ha scritto:
Hi everyone...  in attempting to harvest some EPrinst repositories
using DSpace harvester, the following issue was reported in 2016:

https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdspac
e.2283337.n4.nabble.com%2FHarvesting-EPrints-repository-from-DSpace-td
4681086.html&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cbf431
9bea44c499bca5e08d81364c1dc%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp
;sdata=nI%2FI5a0TlLkyap47s5F1Z0Qp14h%2FyFtkbPF6siVr5Ig%3D&amp;reserved
=0
<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fdspa
ce.2283337.n4.nabble.com%2FHarvesting-EPrints-repository-from-DSpace-t
d4681086.html&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cbf43
19bea44c499bca5e08d81364c1dc%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&am
p;sdata=nI%2FI5a0TlLkyap47s5F1Z0Qp14h%2FyFtkbPF6siVr5Ig%3D&amp;reserve
d=0>

"What happens in this case is that EPrints has more than one entry for
the supported metadata formats using OAI_DC (oai_bibl and oai_dc
prefixes):

.
<metadataFormat>
   <metadataPrefix>oai_bibl</metadataPrefix>
<schema>https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F
%2Fwww.openarchives.org%2FOAI%2F2.0%2Foai_dc.xsd&amp;data=01%7C01%7Cep
rints-tech%40ecs.soton.ac.uk%7Cbf4319bea44c499bca5e08d81364c1dc%7C4a53
78f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=MLrEz%2BO6rjrjKedBKsZP4s2gY
E5HBcmFChuRyJKP2lE%3D&amp;reserved=0
<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
openarchives.org%2FOAI%2F2.0%2Foai_dc.xsd&amp;data=01%7C01%7Ceprints-t
ech%40ecs.soton.ac.uk%7Cbf4319bea44c499bca5e08d81364c1dc%7C4a5378f929f
44d3ebe89669d03ada9d8%7C0&amp;sdata=MLrEz%2BO6rjrjKedBKsZP4s2gYE5HBcmF
ChuRyJKP2lE%3D&amp;reserved=0></schema>
<metadataNamespace>https://eur03.safelinks.protection.outlook.com/?url
=http%3A%2F%2Fwww.openarchives.org%2FOAI%2F2.0%2Foai_dc%2F&amp;data=01
%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cbf4319bea44c499bca5e08d81364c
1dc%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=DcH06bBnqrStDrAwV
Ka7OheMydO6ax9Vw86FYCLAbu4%3D&amp;reserved=0
<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
openarchives.org%2FOAI%2F2.0%2Foai_dc%2F&amp;data=01%7C01%7Ceprints-te
ch%40ecs.soton.ac.uk%7Cbf4319bea44c499bca5e08d81364c1dc%7C4a5378f929f4
4d3ebe89669d03ada9d8%7C0&amp;sdata=DcH06bBnqrStDrAwVKa7OheMydO6ax9Vw86
FYCLAbu4%3D&amp;reserved=0></metadataNamespace>
</metadataFormat>
<metadataFormat>
   <metadataPrefix>oai_dc</metadataPrefix>
<schema>https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F
%2Fwww.openarchives.org%2FOAI%2F2.0%2Foai_dc.xsd&amp;data=01%7C01%7Cep
rints-tech%40ecs.soton.ac.uk%7Cbf4319bea44c499bca5e08d81364c1dc%7C4a53
78f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=MLrEz%2BO6rjrjKedBKsZP4s2gY
E5HBcmFChuRyJKP2lE%3D&amp;reserved=0
<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
openarchives.org%2FOAI%2F2.0%2Foai_dc.xsd&amp;data=01%7C01%7Ceprints-t
ech%40ecs.soton.ac.uk%7Cbf4319bea44c499bca5e08d81364c1dc%7C4a5378f929f
44d3ebe89669d03ada9d8%7C0&amp;sdata=MLrEz%2BO6rjrjKedBKsZP4s2gYE5HBcmF
ChuRyJKP2lE%3D&amp;reserved=0></schema>
<metadataNamespace>https://eur03.safelinks.protection.outlook.com/?url
=http%3A%2F%2Fwww.openarchives.org%2FOAI%2F2.0%2Foai_dc%2F&amp;data=01
%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cbf4319bea44c499bca5e08d81364c
1dc%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=DcH06bBnqrStDrAwV
Ka7OheMydO6ax9Vw86FYCLAbu4%3D&amp;reserved=0
<https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.
openarchives.org%2FOAI%2F2.0%2Foai_dc%2F&amp;data=01%7C01%7Ceprints-te
ch%40ecs.soton.ac.uk%7Cbf4319bea44c499bca5e08d81364c1dc%7C4a5378f929f4
4d3ebe89669d03ada9d8%7C0&amp;sdata=DcH06bBnqrStDrAwVKa7OheMydO6ax9Vw86
FYCLAbu4%3D&amp;reserved=0></metadataNamespace>
</metadataFormat>
.

DSpace's harvester is then selecting the first metadataPrefix, i.e.
oai_bibl, for which EPrints is returning records with no metadata."

Someone is having a similar issue now with EPrints repositories, so
I'm wondering, is this still an issue, or was there a fix/modification
added to EPrints for this?

I haven't tried the solution to remove OAI_Bibliography.pm from the
core files.

Tomasz


*** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive:
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.e
prints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.a
c.uk%7Cbf4319bea44c499bca5e08d81364c1dc%7C4a5378f929f44d3ebe89669d03ad
a9d8%7C0&amp;sdata=FHk26N61rfj82zHanYPYmPj4MZ2%2Bw0fyHLb%2FiWX0fmI%3D&
amp;reserved=0
*** EPrints community wiki:
https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.
eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Cbf4
319bea44c499bca5e08d81364c1dc%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&a
mp;sdata=77NiOEIH%2F2QizbYVyA2a8PVGoYkO4XgtFtE85W8zgEg%3D&amp;reserved
=0
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.eprints.org%2Ftech.php%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca25dc2382e0b44b9d02008d81414cbef%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=oj5mUhefMSnxVWtWkyhJbiQ4TUNR33KxM9LOeLncXf0%3D&amp;reserved=0
*** EPrints community wiki: https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwiki.eprints.org%2F&amp;data=01%7C01%7Ceprints-tech%40ecs.soton.ac.uk%7Ca25dc2382e0b44b9d02008d81414cbef%7C4a5378f929f44d3ebe89669d03ada9d8%7C0&amp;sdata=AhsLoJJT0REuL4%2BVsKnTWYSa2SG4jEai8p0pArrlxps%3D&amp;reserved=0