EPrints Technical Mailing List Archive
Message: #06567
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Dissecting the Documents folder
- To: eprints-tech@ecs.soton.ac.uk
- Subject: Re: [EP-tech] Dissecting the Documents folder
- From: Thomas Lauke <th.lauke@arcor.de>
- Date: Thu, 8 Jun 2017 15:24:44 +0200 (CEST)
Hi Andrew, > Do I ... put it in the new <eprints_root>/archives/<myarchive>/documents folder? Because I have no idea what have to be done additionally in the following I describe my successful path of the past: - Unpack your documents to /tmp/disc0/00/... e.g. (none of the thumbnails or indexcodes if crucial) - Replace the leading part of <url> appropriately, i.e. insert the physical structure, by a sed call with following lines: %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/0\1\/\2\/\3\/\4/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/0\1\/\2\/\3\/0\4/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/\1\/\2\/\3/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/\1\/\2\/0\3/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/0\1\/\2\/\3/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/0\1\/\2\/0\3/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/00\/\1\/\2/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9][0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/00\/\1\/0\2/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\/\([0-9][0-9]\)/\/tmp\/disc0\/00\/00\/00\/0\1\/\2/ %s/http:\/\/eprints.lincoln.ac.uk\/\([0-9]\)\/\([0-9]\)/\/tmp\/disc0\/00\/00\/00\/0\1\/0\2/ - Take care of the spaces in the file path: fortunately we had file names without any spaces on our linux system, thus I have NO experience :-) - Remove all <rev_number> tags by `xmlstarlet ed -d "//_:rev_number" in.xml > /tmp/out.xml` to restart the change history - Check your import file by `~/Eprints/bin/import yourRepo --parse-only --force archive XML yourInput` - Start final run by `~/Eprints/bin/import yourRepo --migration --force archive XML yourInput` - If anything fails, restart after `~/Eprints/bin/import yourRepo erase_eprints` > Which part of the xml needs rewriting to tell the import > where to look for the file? none due to your url modification/specification The numbering follows the order of entries in your import file, thus any gap will be gone, but some confusion during comparing could occur ... Hth Thomas
- Prev by Date: [EP-tech] Metadata_visibility
- Next by Date: [EP-tech] Retire Item from Review
- Previous by thread: Re: [EP-tech] Dissecting the Documents folder
- Next by thread: [EP-tech] Unspecified fields
- Index(es):