EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #04462
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Bulk export/import
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] Re: Bulk export/import
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- Date: Wed, 8 Jul 2015 13:29:55 +0000
Just to clarify - export from the NEW repository and compare with the import from the old one? I’d have to create some dummy data as the new repo is empty. One other solution might be to bring all the field definitions over and see which one makes a difference. A big part of this exercise is identifying which of the field definitions we’ve created are... “delicate”. As in, which ones break our data and stop it from going into a vanilla repository. On 08/07/2015 14:04, "eprints-tech-bounces@ecs.soton.ac.uk on behalf of John Salter" <eprints-tech-bounces@ecs.soton.ac.uk on behalf of J.Salter@leeds.ac.uk> wrote: >Hi Andrew, >I suspect somewhere you've got a field that is multiple, that isn't >represented properly in the XML. >A multiple field should look a bit like this in the XML: ><fieldname> > <item>value1</value> > <item>value2</value> ></fieldname> > >A multiple- compound field is a bit more involved. These may be of use: >http://wiki.eprints.org/w/XML_Export_Format >http://wiki.eprints.org/w/Import_From_URL > >I'd start with by exporting a record from your archive in the EPrints XML >format - and compare that with the file you're trying to import. > >Cheers, >John > >-----Original Message----- >From: eprints-tech-bounces@ecs.soton.ac.uk >[mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Andrew Beeken >Sent: 07 July 2015 10:47 >To: eprints-tech@ecs.soton.ac.uk >Subject: [EP-tech] Re: Bulk export/import > >Hello all, > >Okay, so I’ve fixed the mystery of the missing field definition but I’m >now getting the following error: > >Unhandled exception in Import::XML: Can't use string (" ") as an ARRAY ref >whil? >e "strict refs" in use at >/usr/share/eprints3/perl_lib/EPrints/MetaField.pm lin? >e 2106. at /usr/lib/perl5/XML/LibXML.pm line 881. >XML::LibXML::parse_fh('XML::L? >ibXML=HASH(0x7f496679d7d0)', '*Fh::fh00001export_lirolem_XML.xml') called >at /u? >sr/lib/perl5/XML/LibXML/SAX.pm line 99 eval {...} called at >/usr/lib/perl5/XML/? >LibXML/SAX.pm line 98 >XML::LibXML::SAX::_parse('XML::LibXML::SAX=HASH(0x7f49667? >9e8a8)') called at /usr/lib/perl5/XML/LibXML/SAX.pm line 54 >XML::LibXML::SAX::_? >parse_bytestream('XML::LibXML::SAX=HASH(0x7f496679e8a8)', >'*Fh::fh00001export_l? >irolem_XML.xml') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm >line 26? >02 XML::SAX::Base::parse('XML::LibXML::SAX=HASH(0x7f496679e8a8)', >'HASH(0x7f496? >67b41c0)') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm line >2631 XML? >::SAX::Base::parse_file('XML::LibXML::SAX=HASH(0x7f496679e8a8)', >'*Fh::fh00001e? >xport_lirolem_XML.xml') called at >/usr/share/eprints3/perl_lib/EPrints/XML/L ... > > > >Any thoughts as to what could be throwing this? > >Andrew > > > >On 02/07/2015 15:41, "eprints-tech-bounces@ecs.soton.ac.uk on behalf of >Andrew Beeken" <eprints-tech-bounces@ecs.soton.ac.uk on behalf of >anbeeken@lincoln.ac.uk> wrote: > >>Ah, so that’s not a standard field - I came to the Uni after a lot of >>this >>work was done so ?I’m still trying to figure out which fields are bespoke >>and which are not. In a way I’m trying to avoid bringing bespoke things >>across with me but I think it’s clear to me now that our data simply will >>not fit a vanilla ePrints. Now I need to start a softly softly >>approach... >> >>On 02/07/2015 15:20, "eprints-tech-bounces@ecs.soton.ac.uk on behalf of >>George Mamalakis" <eprints-tech-bounces@ecs.soton.ac.uk on behalf of >>mamalos@eng.auth.gr> wrote: >> >>>It seems that on your previous system you had a field called "owner" >>>that doesn't exist in your new EPrints installation. >>> >>>Try to see how you have defined this field and copy your configuration >>>to your new EPrints installation. It'll probably be defined in: >>> >>>./archives/archname/cfg/cfg.d/eprint_fields.pl >>> >>>or in some other -custom- configuration file in >>>./archives/archname/cfg/cfg.d/. >>> >>>If it's not present in eprint_fields.pl, grep in the configuration >>>folder. >>> >>>Don't forget to reload your repository of your new installation after >>>installing the field (you'll probably also need to add phrases for your >>>new field, but that's another discussion). >>> >>>On 02/07/2015 05:03 μμ, Andrew Beeken wrote: >>>> Thanks for the advice here; I’m not looking at files for the moment as >>>>we have far too many on the server and my little local Virtual Box >>>>would >>>>crumple under their weight! >>>> >>>> Okay, so I did an export from the admin on our live box to EP3 XML and >>>>tried to import this on my Virtual Box unfortunately this fails at >>>>the >>>>first hurdle with the following errors: >>>> >>>> Invalid XML element: owner >>>> >>>> Unhandled exception in Import::XML: Can't use string (" ") as an ARRAY >>>>ref whil? >>>> e "strict refs" in use at >>>>/usr/share/eprints3/perl_lib/EPrints/MetaField.pm lin? >>>> e 2106. at /usr/lib/perl5/XML/LibXML.pm line 881. >>>>XML::LibXML::parse_fh('XML::L? >>>> ibXML=HASH(0x7fd6e81f35f0)', '*Fh::fh00001export_lirolem_XML.xml') >>>>called at /u? >>>> sr/lib/perl5/XML/LibXML/SAX.pm line 99 eval {...} called at >>>>/usr/lib/perl5/XML/? >>>> LibXML/SAX.pm line 98 >>>>XML::LibXML::SAX::_parse('XML::LibXML::SAX=HASH(0x7fd6e8c? >>>> de0a8)') called at /usr/lib/perl5/XML/LibXML/SAX.pm line 54 >>>>XML::LibXML::SAX::_? >>>> parse_bytestream('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)', >>>>'*Fh::fh00001export_l? >>>> irolem_XML.xml') called at >>>>/usr/share/eprints3/perl_lib/XML/SAX/Base.pm >>>>line 26? >>>> 02 XML::SAX::Base::parse('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)', >>>>'HASH(0x7fd6e? >>>> 9e4f930)') called at /usr/share/eprints3/perl_lib/XML/SAX/Base.pm line >>>>2631 XML? >>>> ::SAX::Base::parse_file('XML::LibXML::SAX=HASH(0x7fd6e8cde0a8)', >>>>'*Fh::fh00001e? >>>> xport_lirolem_XML.xml') called at >>>>/usr/share/eprints3/perl_lib/EPrints/XML/L … >>>> >>>> >>>> >>>> From: >>>><eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.s >>>>o >>>>t >>>>on.ac.uk>> on behalf of Adam Field >>>><af05v@ecs.soton.ac.uk<mailto:af05v@ecs.soton.ac.uk>> >>>> Reply-To: >>>>"eprints-tech@ecs.soton.ac.uk<mailto:eprints-tech@ecs.soton.ac.uk>" >>>><eprints-tech@ecs.soton.ac.uk<mailto:eprints-tech@ecs.soton.ac.uk>> >>>> Date: Thursday, 2 July 2015 13:53 >>>> To: >>>>"eprints-tech@ecs.soton.ac.uk<mailto:eprints-tech@ecs.soton.ac.uk>" >>>><eprints-tech@ecs.soton.ac.uk<mailto:eprints-tech@ecs.soton.ac.uk>> >>>> Subject: [EP-tech] Re: Bulk export/import >>>> >>>> If you export without the files, you'll get paths to files in the file >>>>object. Through cunning use of symbolic linked directories or global >>>>find and replace in the XML file, you can put the files where the XML >>>>import will find them. It's a bit hacky, but it works. >>>> >>>> >>>> -- >>>> Adam Field >>>> Business Relationship Manager and Community Lead >>>> EPrints Services >>>> >>>> On 2 Jul 2015, at 13:38, Andrew Beeken >>>><anbeeken@lincoln.ac.uk<mailto:anbeeken@lincoln.ac.uk>> wrote: >>>> >>>> That seems like a bit of a round the houses approach. I’ll dig through >>>>the >>>> source and see what I can find. >>>> >>>> On 02/07/2015 13:16, >>>>"eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.s >>>>o >>>>t >>>>on.ac.uk> on behalf of >>>> George Mamalakis" >>>><eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.s >>>>o >>>>t >>>>on.ac.uk> on behalf of >>>> mamalos@eng.auth.gr<mailto:mamalos@eng.auth.gr>> wrote: >>>> >>>> From its documentation (perldoc ./bin/export) there doesn't seem to >>>> support something like that. On the other hand, the documentation >>>> mentions the option: >>>> >>>> ' >>>> dataset: The name of the dataset to export, such as "archive", >>>> "subject" or "user". >>>> ' >>>> >>>> You could maybe "exploit" this option by moving some eprints from one >>>> dataset to another and by exporting/importing each dataset separately >>>> (and then moving the appropriate eprints where they really belong). >>>> >>>> Haven't checked the source code, though, so maybe there's another >>>> solution hidden somewhere there...:) >>>> >>>> >>>> On 02/07/2015 02:56 μμ, Andrew Beeken wrote: >>>> I wonder... Is it possible to export by type? I could perhaps export >>>> each >>>> type separately... >>>> >>>> On 02/07/2015 12:18, >>>>"eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.s >>>>o >>>>t >>>>on.ac.uk> on behalf of >>>> George Mamalakis" >>>><eprints-tech-bounces@ecs.soton.ac.uk<mailto:eprints-tech-bounces@ecs.s >>>>o >>>>t >>>>on.ac.uk> on behalf of >>>> mamalos@eng.auth.gr<mailto:mamalos@eng.auth.gr>> wrote: >>>> >>>> Ian and Andrew, >>>> >>>> I think that one can import/export specific entries -if I'm not >>>> mistaken-, but I'm not exactly sure about the syntax. If it allows for >>>> ranges, the 100.000 entries problem may be addressed by just splitting >>>> the export/import process to more than one export/import operations. I >>>> have used this syntax to select specific eprints, but my syntax was >>>> something like the following: >>>> >>>> ./bin/export archid archive XML 114 115 116 117 > /tmp/export1 >>>> >>>> which would seem very peculiar if it would have to be used for >>>> thousands >>>> of records (I assume args would overflow!:)). Nonetheless, on the >>>>worst >>>> case where ranges are not allowed, the former syntax could be used >>>> successfully within a very carefully written script. >>>> >>>> >>>> On 01/07/2015 06:37 μμ, Ian Stuart wrote: >>>> On 01/07/15 15:25, Andrew Beeken wrote: >>>> Hello all! >>>> >>>> I’m currently looking at migrating our repository to a fresh install, >>>> mainly because we have a bit of customisation to our live repo and I >>>> want to see how this process would affect the integrity of the data. >>>> Is there an easy way of importing all records from one repository, >>>> say to an XML file and then importing to the new one? >>>> In general (and as George says) the XML-with-files export is the way >>>> to >>>> go. >>>> >>>> I discovered it falls over with 100,000 records, so I just copied the >>>> database & attached a new eprints to it :D >>>> >>>> >>>> -- >>>> George Mamalakis >>>> >>>> IT and Security Officer, >>>> Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki), >>>> PhD (Aristotle Univ. of Thessaloniki), >>>> MSc (Imperial College of London) >>>> >>>> School of Electrical and Computer Engineering >>>> Aristotle University of Thessaloniki >>>> >>>> phone number : +30 (2310) 994379 >>>> >>>> >>>> >>>> *** Options: >>>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >>>> *** Archive: http://www.eprints.org/tech.php/ >>>> *** EPrints community wiki: http://wiki.eprints.org/ >>>> *** EPrints developers Forum: http://forum.eprints.org/ >>>> >>>> The University of Lincoln, located in the heart of the city of >>>>Lincoln, >>>> has established an international reputation based on high student >>>> satisfaction, excellent graduate employment and world-class research. >>>> >>>> The information in this e-mail and any attachments may be >>>>confidential. >>>> If you have received this email in error please notify the sender >>>> immediately and remove it from your system. Do not disclose the >>>>contents >>>> to another person or take copies. >>>> >>>> Email is not secure and may contain viruses. The University of Lincoln >>>> makes every effort to ensure email is sent without viruses, but cannot >>>> guarantee this and recommends recipients take appropriate precautions. >>>> >>>> The University may monitor email traffic data and content in >>>>accordance >>>> with its policies and English law. Further information can be found >>>>at: >>>> http://www.lincoln.ac.uk/legal. >>>> >>>> *** Options: >>>> http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >>>> *** Archive: http://www.eprints.org/tech.php/ >>>> *** EPrints community wiki: http://wiki.eprints.org/ >>>> *** EPrints developers Forum: http://forum.eprints.org/ >>>> >>>> >>>> -- >>>> George Mamalakis >>>> >>>> IT and Security Officer, >>>> Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki), >>>> PhD (Aristotle Univ. of Thessaloniki), >>>> MSc (Imperial College of London) >>>> >>>> School of Electrical and Computer Engineering >>>> Aristotle University of Thessaloniki >>>> >>>> phone number : +30 (2310) 994379 >>>> >>>> >>>> >>>> *** Options: >>>>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >>>> *** Archive: http://www.eprints.org/tech.php/ >>>> *** EPrints community wiki: http://wiki.eprints.org/ >>>> *** EPrints developers Forum: http://forum.eprints.org/ >>>> >>>> >>>> *** Options: >>>>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >>>> *** Archive: http://www.eprints.org/tech.php/ >>>> *** EPrints community wiki: http://wiki.eprints.org/ >>>> *** EPrints developers Forum: http://forum.eprints.org/ >>>> >>>> >>>> *** Options: >>>>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >>>> *** Archive: http://www.eprints.org/tech.php/ >>>> *** EPrints community wiki: http://wiki.eprints.org/ >>>> *** EPrints developers Forum: http://forum.eprints.org/ >>>> >>> >>> >>>-- >>>George Mamalakis >>> >>>IT and Security Officer, >>>Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki), >>>PhD (Aristotle Univ. of Thessaloniki), >>>MSc (Imperial College of London) >>> >>>School of Electrical and Computer Engineering >>>Aristotle University of Thessaloniki >>> >>>phone number : +30 (2310) 994379 >>> >>> >>> >>>*** Options: >>>http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >>>*** Archive: http://www.eprints.org/tech.php/ >>>*** EPrints community wiki: http://wiki.eprints.org/ >>>*** EPrints developers Forum: http://forum.eprints.org/ >> >> >>*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >>*** Archive: http://www.eprints.org/tech.php/ >>*** EPrints community wiki: http://wiki.eprints.org/ >>*** EPrints developers Forum: http://forum.eprints.org/ > > >*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >*** Archive: http://www.eprints.org/tech.php/ >*** EPrints community wiki: http://wiki.eprints.org/ >*** EPrints developers Forum: http://forum.eprints.org/ > >*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >*** Archive: http://www.eprints.org/tech.php/ >*** EPrints community wiki: http://wiki.eprints.org/ >*** EPrints developers Forum: http://forum.eprints.org/
- References:
- [EP-tech] Bulk export/import
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- [EP-tech] Re: Bulk export/import
- From: George Mamalakis <mamalos@eng.auth.gr>
- [EP-tech] Re: Bulk export/import
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- [EP-tech] Re: Bulk export/import
- From: George Mamalakis <mamalos@eng.auth.gr>
- [EP-tech] Re: Bulk export/import
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- [EP-tech] Re: Bulk export/import
- From: Adam Field <af05v@ecs.soton.ac.uk>
- [EP-tech] Re: Bulk export/import
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- [EP-tech] Re: Bulk export/import
- From: George Mamalakis <mamalos@eng.auth.gr>
- [EP-tech] Re: Bulk export/import
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- [EP-tech] Re: Bulk export/import
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- [EP-tech] Re: Bulk export/import
- From: John Salter <J.Salter@leeds.ac.uk>
- [EP-tech] Bulk export/import
- Prev by Date: [EP-tech] Re: Bulk export/import
- Next by Date: [EP-tech] Re: Bulk export/import
- Previous by thread: [EP-tech] Re: Bulk export/import
- Next by thread: [EP-tech] Re: Bulk export/import
- Index(es):