EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #09985
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Bulk import plugin
- To: Will Hughes <w.p.hughes@reading.ac.uk>, "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Bulk import plugin
- From: David R Newman <drn@ecs.soton.ac.uk>
- Date: Mon, 17 Feb 2025 10:47:12 +0000
Hi Will,
Import plugins really are something that probably would be best maintained outside of the main codebase, as the transposition they provides changes on a different schedule to EPrints releases as the specification BibTex, Endnote RIS, etc, changes. That is something for our development team to consider.
Inevitably for 20,000 items the import will take some time. It sound like you did not do this in screen. Therefore, if you want this to keep running you will need to type Ctrl+z and then bg %1 on the command line before you can logout and still let the import continue. (It is probably %1 but they number is whatever appears inside the square brackets next to Stopped when you press Ctrl+z).
Regards
David Newman
CAUTION: This e-mail originated outside the University of Southampton.David
Wow, that was a struggle but I got it working! Thanks for the pointers. I ended up using RIS format from EndNote as it seems more transparent to me and I'm used to it. The RIS.pm plugin provided in the installation was flaky and threw lots of errors. I've edited a lot and it might be worth uploading it for future users as the existing one is probably better suited to an older version. Shall I send it through to you when I've tidied it up?
The next step is to find a tidy way to deal with default values for some fields, and changes to some required fields to make them optional so that I don't need them in the data.
I see what you mean about the amount of time it takes to ingest a batch of items this way. It is remarkably slow! But still much better than processing them by hand. Do I understand correctly that, if I set it to work and then log out of my user account on the server, it will just keep cooking in the background? That would be cool.
Best wishes
Will____
From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of Will Hughes <w.p.hughes@reading.ac.uk>
Sent: Saturday, February 15, 2025 8:09:59 PM
To: David R Newman <drn@ecs.soton.ac.uk>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] Bulk import pluginCAUTION: This e-mail originated outside the University of Southampton.CAUTION: This e-mail originated outside the University of Southampton.Excellent, thank you! I shall try this tomorrow and let you know
Thanks again
Best wishes
Will____
From: David R Newman <drn@ecs.soton.ac.uk>
Sent: Saturday, February 15, 2025 7:54:27 PM
To: Will Hughes <w.p.hughes@reading.ac.uk>; eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] Bulk import pluginHi Will,
You can do a similar bulk import for EndNote or BibTeX by modifying the import command I provide in my earlier email:
EPRINTS_PATH/bin/import ARCHIVE_ID eprint BibTeX metadata.bib
EPRINTS_PATH/bin/import ARCHIVE_ID eprint EndNote metadata.enl
As I said before, you will need to add the skip_buffer to a config file and set a --user argument in the command to import as a specific user. It may be worth reviewing what BibTex/EndNote attributes are supported by the versions you have of the following files (assuming you are run EPrints 3.4.x series):
EPRINTS_PATH/flavours/pub_lib/plugins/EPrints/Plugin/Import/EndNote.pm
EPRINTS_PATH/flavours/pub_lib/plugins/EPrints/Plugin/Import/BibTeX.pm
under their "sub convert_input" functions.
Regards
David Newman
On 15/02/2025 7:37 pm, Will Hughes wrote:
You don't often get email from w.p.hughes@reading.ac.uk. Learn why this is important
CAUTION: This e-mail originated outside the University of Southampton.Hi, David
Thanks for the quick response. No, I'm not moving them from one to another Eprints repository. I am moving them from an entirely different source. Currently, I have everything in EndNote and can export as BibTex successfully. And this is just metadata, not a repository as such. I am providing a metadata database with URLs to the original papers and theses. This a specialist subject database for a research association, not an institutional repository.
Sorry, I should have mentioned this in the original question!
Best wishes
Will____
From: David R Newman <drn@ecs.soton.ac.uk>
Sent: Saturday, February 15, 2025 7:00:18 PM
To: eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>; Will Hughes <w.p.hughes@reading.ac.uk>
Subject: Re: [EP-tech] Bulk import plugin
You don't often get email from drn@ecs.soton.ac.uk. Learn why this is important
Hi Will,
I am going to assume this is 20,000 records currently in an EPrints repository you want to transfer to a new/different EPrints repository. If that is not the case please let me know what format you currently have for these records you want to import.
Exporting the existing records from your old EPrints repository should entail carrying out an (admin menu) EPrint search (for presumably all items in the live archive) and then an export as "EP3 XML with Files Embedded". If you have big files (e.g. videos), as long as all the files you want to import are currently publicly accessible on the old EPrints repository, you can choose the EP3 XML export.
Importing is most easily/efficiently done from the (SSH) command line of the new EPrints repository server. First, copy the export file generated from above. Next, you need to run the following command to import the records (substituting EPRINTS_PATH and ARCHIVE_ID and OLD_ARCHIVE_ID as appropriate:
EPRINTS_PATH/bin/import ARCHIVE_ID --enable-file-imports --enable-web-imports eprint XML export_OLD_ARCHIVE_ID_XMLFiles.xml
However, these will be imported into the review buffer rather than the live archive, so you need to (temporarily) add the following to a configuration file in your new archive's cfg/cfg.d/ directory (e.g. z_skip_buffer.pl):
$c->{skip_buffer} = 1;
For more information about the import command see:
https://wiki.eprints.org/w/API:bin/import
In particular, you may want to set a user to import these records. I would advise creating a special user for this, as having 20,000 records under a user account you regularly want to manage deposits will make this less responsive as it has to evaluate all 20,000 records to determine which to show on the first page of Manage Deposits.
Regards
David Newman
On 15/02/2025 6:13 pm, Will Hughes wrote:
CAUTION: This e-mail originated outside the University of Southampton.CAUTION: This e-mail originated outside the University of Southampton.Hi
With a new installation I am finding my way around the software. I am looking for the functionality to import records in bulk, straight to the repository.
I understand that there is or was a plugin for bulk import, but I cannot find it anywhere. What I want to do is to bring in 20,000 records in a way that make them immediately live. Is there a plugin that can be fired up from the website, or is this a command line interface kind of thing?
Any suggestions welcome
Thanks
Best wishes
Will
Will Hughes
Emeritus Professor of Construction Management and Economics
School of the Built Environment
University of Reading, PO Box 219, Whiteknights
Reading, RG6 6DF, UK
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List *** Archive: https://www.eprints.org/tech.php/ *** EPrints community wiki: https://wiki.eprints.org/
- Follow-Ups:
- Re: [EP-tech] Bulk import plugin
- From: Yuri <yurj@alfa.it>
- Re: [EP-tech] Bulk import plugin
- References:
- [EP-tech] Bulk import plugin
- From: Will Hughes <w.p.hughes@reading.ac.uk>
- Re: [EP-tech] Bulk import plugin
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] Bulk import plugin
- From: Will Hughes <w.p.hughes@reading.ac.uk>
- Re: [EP-tech] Bulk import plugin
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] Bulk import plugin
- From: Will Hughes <w.p.hughes@reading.ac.uk>
- Re: [EP-tech] Bulk import plugin
- From: Will Hughes <w.p.hughes@reading.ac.uk>
- [EP-tech] Bulk import plugin
- Prev by Date: Re: [EP-tech] Bulk import plugin
- Next by Date: Re: [EP-tech] Bulk import plugin
- Previous by thread: Re: [EP-tech] Bulk import plugin
- Next by thread: Re: [EP-tech] Bulk import plugin
- Index(es):