EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #10152


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

Re: [EP-tech] Export and import xml file with embeded from old eprints to new version

  • To: Agung Prasetyo W. <prazetyo@gmail.com>
  • Subject: Re: [EP-tech] Export and import xml file with embeded from old eprints to new version
  • From: David R Newman <drn@ecs.soton.ac.uk>
  • Date: Wed, 18 Jun 2025 20:08:42 +0100

Hi Agung,

You don't want to produce thousands of indiviudal XML files you really just want one or possibly several, as with embedded files, I could imagine the files getting quite big.  You could write some script to generate all present eprints 1-1000, 1001-2000, etc.  However, assuming you can cope with a large (many gigabytes file) and transfer it to your new server, the a single file should be fine.  I have run the following command to export all live archive items (500 in my test case, whuch was 2.6GB with embedded files):

EPRINTS_PATH/bin/export ARCHIVE_ID archive XMLFiles > EXPORT_FILENAME.xml

I then use the following command to import those same items I just exported.  There is only an XML import plugin and it works out whether files are embedded or if it should try to download them from the URLs in the XML (if exported using just the XML rather than XMLFiles export plugin).  The latter will only work if you set the --enable-web-imports.

EPRINTS_PATH/bin/import ARCHIVE_ID eprint XML EXPORT_FILENAME.xml --user 1

The --user 1 sets the owner of all the eprints to the user with ID 1.  If you want to assign their ownership to their original users, you would already have needed to recreate the user records on the importing repository.

One issue with this export and import with embedded files is you need the extra diskspace for the export file and the space the files you are about to import will take up.  If you are running on a VM, it makes sense to setup a large temporary disk and, mount that as say /import and copy the EXPORT_FILENAME.xml to /import/EXPORT_FILENAME.xml.  Then when you are done you can unmount and destroy the disk in you VM management interface.

Regards

David Newman

[1] https://wiki.eprints.org/w/API:bin/import


On 18/06/2025 15:01, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.
Hi David,

if I want to export as many as 10,000 files with the embedded option file, then where is the output location of the .xml file?
I tried using this command <EPRINTS_PATH>/bin/export archive_id archive XMLFiles 8076 8075

As for import, if I already have 10,000 .xml files for example repo-8076.xml, repo-8075.xml etc, how do I run your command?
<EPRINTS_PATH>/bin/import <ARCHIVE_ID> eprint XML eprints.xml --verbose

Regards,
Agung PW

On Wed, 18 Jun 2025 at 16:59, David R Newman <drn@ecs.soton.ac.uk> wrote:

Hi Agung,

Whether you want to import one eprint or thousands you will need to use the <eprints> tags.  This is because it needs to work for both the single and multiple case.  If you had multple eprints without any eprints tags the XML would be invalid as there would be no root element.  To save having to implement a different solution for single eprint import, this also requires the eprints tags to indicate a set of (in this case of 1) eprints is to be imported.  

If you want to import thousands of records with file data, you may be better off doing this from the command line:

<EPRINTS_PATH>/bin/import <ARCHIVE_ID> eprint XML eprints.xml --verbose

Regards

David Newman

On 18/06/2025 09:33, Agung Prasetyo W. wrote:
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.
Hi,

Maybe if it's only 1 item, it's okay to add the <eprint> and </eprint> tags. But if the data I want to export is tens of thousands, of course this is a waste of time.

Regards
Agung PW

On Wed, 18 Jun 2025 at 15:07, David R Newman <drn@ecs.soton.ac.uk> wrote:

Hi Agung,

I have seen this when I have exported a single item and try to reimport but not sure how you got this with a full archive export.  There is no <eprints> tag enclosing the set of <eprint> records.  Usually you can just edit the file to add these <eprints> tags at the top (after the <?xml ... line) and then right at the end, so it looks like:

<?xml version='1.0' encoding='utf-8'?>
<eprints xmlns='http://eprints.org/ep2/data/2.0'>
    <eprint id='https://eprints.example.org/1'>
        ...
    </eprint>
    <eprint id='https://eprints.example.org/2'>
        ....
    </eprint>
</eprints>

Regards

David Newman

On 18/06/2025 08:39, Alan.Stiles [He/Him/They] wrote:
CAUTION: This e-mail originated outside the University of Southampton.
CAUTION: This e-mail originated outside the University of Southampton.

It looks like your file that you are trying to import isn’t correctly formatted, from the line

 

Unexpected tag: expected <eprints> found <eprint>

 

If you export just one file from your existing system it will give you an example file to compare against.  I’m sure there’s an entry in the wiki for it but I can’t find it at the moment.

 

Alan

 

From: eprints-tech-request@ecs.soton.ac.uk <eprints-tech-request@ecs.soton.ac.uk> on behalf of Agung Prasetyo W. <prazetyo@gmail.com>
Date: Wednesday, 18 June 2025 at 08:03
To: Yuri <yurj@alfa.it>, eprints-tech@ecs.soton.ac.uk <eprints-tech@ecs.soton.ac.uk>
Subject: Re: [EP-tech] Export and import xml file with embeded from old eprints to new version

External email: if the sender or content looks suspicious, please click the Report Message icon, or forward it to report-phishing

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

I export from eprints 3.1.3 a xml files with EP3 XML with Files Embedded. After I get the files, I import the file on my eprints 3.4.3 with Test Without Importing options. And then I get this error message

 


 

I just tried 1 file. I feel like if I import a lot of files via the command line, it will definitely give an error too.

Need help.

 

Thank you

 

Best regards

Agung PW

 

 

On Wed, Jun 18, 2025, 13:26 Yuri <yurj@alfa.it> wrote:

Maybe you can post the steps you followed and the error?

Il 18/06/25 06:37, Agung Prasetyo W. ha scritto:

CAUTION: This e-mail originated outside the University of Southampton.

CAUTION: This e-mail originated outside the University of Southampton.

Hi,

 

I would like to ask if it is possible to import xml files from "EP3 XML with Files Embedded" from eprints version 3.1.3 to eprints 3.4.3? I have tried the xml file and got an error.

I have files of around 30,000 items, and I want to move them to the new version of eprints. Please help.

 

Thank you

 

Best regards,

Agung PW



*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/
 

*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/

*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/


*** Options: https://wiki.eprints.org/w/Eprints-tech_Mailing_List
*** Archive: https://www.eprints.org/tech.php/
*** EPrints community wiki: https://wiki.eprints.org/