EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10314


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

RE: [EP-tech] OAI Harvester - Huge Author Lists


CAUTION: This e-mail originated outside the University of Southampton.

Hi James,

We had a very similar block of records in our OAI-PMH interface.

 

ExLibris/Primo were very quick to say it was a problem with our repository.

It took me a while to convince them that setting a slightly longer timeout on a harvest request would make it work again.

 

In our case, I could demonstrate that the repository reliably took ~ 65 seconds to generate the response. At the time, the timeout was 60 seconds.



To resolve this issue:

 

  • Your ‘other’ idea would be OK until something (e.g. a system update) makes all those records appear to have changed at the same time. Then you’d have to run through the review/live process all over again.

 

* There may be other things you can 'fix' with a custom profile e.g. how Primo understands/displays open, restricted, requestable, oa-location or no-fulltext records.


Cheers,

John


 

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> On Behalf Of James Kerwin
Sent: 08 January 2026 13:00
To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] OAI Harvester - Huge Author Lists

 

CAUTION: External Message. Use caution opening links and attachments.

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi everyone,

 

Our harvest from EPrints to Primo via the OAI harvester keeps failing. We think we have pinned down the issue to a page under one of the resumption tokens that contains ~17 physics records that each have ~3000 authors. We think this causes the resumption token url to take too long to load and breaks the harvest.

 

As a potential solution, we have decided to try creating a new export plugin that limits the number of authors being sent to the harvester. Ideally only our Primo instance would use this plugin.

 

Does this sound like a reasonable approach? If anyone has experienced something similar, would you be willing to share your experience or solutions?

I have had another idea where we move the problem records into review, get a successful harvest, then gradually move them back into the live archive and let the harvester grab them since they would have a more recent last modified date. This is less of a solution and more of a way through it for now.

Thanks,

James