EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10307


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] OAI Harvester - Huge Author Lists


CAUTION: This e-mail originated outside the University of Southampton.
Hi everyone,

Our harvest from EPrints to Primo via the OAI harvester keeps failing. We think we have pinned down the issue to a page under one of the resumption tokens that contains ~17 physics records that each have ~3000 authors. We think this causes the resumption token url to take too long to load and breaks the harvest.

As a potential solution, we have decided to try creating a new export plugin that limits the number of authors being sent to the harvester. Ideally only our Primo instance would use this plugin.

Does this sound like a reasonable approach? If anyone has experienced something similar, would you be willing to share your experience or solutions?

I have had another idea where we move the problem records into review, get a successful harvest, then gradually move them back into the live archive and let the harvester grab them since they would have a more recent last modified date. This is less of a solution and more of a way through it for now.

Thanks,
James