EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #10152
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- To: Agung Prasetyo W. <prazetyo@gmail.com>
- Subject: Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- From: David R Newman <drn@ecs.soton.ac.uk>
- Date: Wed, 18 Jun 2025 20:08:42 +0100
Hi Agung,
You don't want to produce thousands of indiviudal XML files you really just want one or possibly several, as with embedded files, I could imagine the files getting quite big. You could write some script to generate all present eprints 1-1000, 1001-2000, etc. However, assuming you can cope with a large (many gigabytes file) and transfer it to your new server, the a single file should be fine. I have run the following command to export all live archive items (500 in my test case, whuch was 2.6GB with embedded files):
EPRINTS_PATH/bin/export ARCHIVE_ID archive XMLFiles > EXPORT_FILENAME.xml
I then use the following command to import those same items I
just exported. There is only an XML import plugin and it works
out whether files are embedded or if it should try to download
them from the URLs in the XML (if exported using just the XML
rather than XMLFiles export plugin). The latter will only work if
you set the --enable-web-imports.
EPRINTS_PATH/bin/import ARCHIVE_ID eprint XML EXPORT_FILENAME.xml
--user 1
The --user 1 sets the owner of all the eprints to the user with ID 1. If you want to assign their ownership to their original users, you would already have needed to recreate the user records on the importing repository.
One issue with this export and import with embedded files is you
need the extra diskspace for the export file and the space the
files you are about to import will take up. If you are running on
a VM, it makes sense to setup a large temporary disk and, mount
that as say /import and copy the EXPORT_FILENAME.xml to
/import/EXPORT_FILENAME.xml. Then when you are done you can
unmount and destroy the disk in you VM management interface.
Regards
David Newman
[1] https://wiki.eprints.org/w/API:bin/import
CAUTION: This e-mail originated outside the University of Southampton.Hi David,
if I want to export as many as 10,000 files with the embedded option file, then where is the output location of the .xml file?
I tried using this command <EPRINTS_PATH>/bin/export archive_id archive XMLFiles 8076 8075
As for import, if I already have 10,000 .xml files for example repo-8076.xml, repo-8075.xml etc, how do I run your command?
<EPRINTS_PATH>/bin/import <ARCHIVE_ID> eprint XML eprints.xml --verbose
Regards,Agung PW
On Wed, 18 Jun 2025 at 16:59, David R Newman <drn@ecs.soton.ac.uk> wrote:
Hi Agung,
Whether you want to import one eprint or thousands you will need to use the <eprints> tags. This is because it needs to work for both the single and multiple case. If you had multple eprints without any eprints tags the XML would be invalid as there would be no root element. To save having to implement a different solution for single eprint import, this also requires the eprints tags to indicate a set of (in this case of 1) eprints is to be imported.
If you want to import thousands of records with file data, you may be better off doing this from the command line:
<EPRINTS_PATH>/bin/import <ARCHIVE_ID> eprint XML eprints.xml --verbose
Regards
David Newman
On 18/06/2025 09:33, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.CAUTION: This e-mail originated outside the University of Southampton.Hi,
Maybe if it's only 1 item, it's okay to add the <eprint> and </eprint> tags. But if the data I want to export is tens of thousands, of course this is a waste of time.
RegardsAgung PW
On Wed, 18 Jun 2025 at 15:07, David R Newman <drn@ecs.soton.ac.uk> wrote:
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_ListHi Agung,
I have seen this when I have exported a single item and try to reimport but not sure how you got this with a full archive export. There is no <eprints> tag enclosing the set of <eprint> records. Usually you can just edit the file to add these <eprints> tags at the top (after the <?xml ... line) and then right at the end, so it looks like:
<?xml version='1.0' encoding='utf-8'?>
<eprints xmlns='http://eprints.org/ep2/data/2.0'>
<eprint id='https://eprints.example.org/1'>
...
</eprint>
<eprint id='https://eprints.example.org/2'>
....
</eprint>
</eprints>
Regards
David Newman
On 18/06/2025 08:39, Alan.Stiles [He/Him/They] wrote:
CAUTION: This e-mail originated outside the University of Southampton.CAUTION: This e-mail originated outside the University of Southampton.It looks like your file that you are trying to import isn’t correctly formatted, from the line
Unexpected tag: expected <eprints> found <eprint>
If you export just one file from your existing system it will give you an example file to compare against. I’m sure there’s an entry in the wiki for it but I can’t find it at the moment.
Alan
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List *** Archive: https://www.eprints.org/tech.php/ *** EPrints community wiki: https://wiki.eprints.org/
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/
*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List *** Archive: https://www.eprints.org/tech.php/ *** EPrints community wiki: https://wiki.eprints.org/
- References:
- [EP-tech] Export and import xml file with embeded from old eprints to new version
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- From: "Alan.Stiles [He/Him/They]" <alan.stiles@open.ac.uk>
- Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- From: David R Newman <drn@ecs.soton.ac.uk>
- Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- From: "Agung Prasetyo W." <prazetyo@gmail.com>
- [EP-tech] Export and import xml file with embeded from old eprints to new version
- Prev by Date: Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- Next by Date: [EP-tech] EPrints REST API?
- Previous by thread: Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
- Next by thread: [EP-tech] coin DOI with Datacite
- Index(es):