EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #06542
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Dissecting the Documents folder
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Dissecting the Documents folder
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- Date: Tue, 30 May 2017 08:29:39 +0000
Bump! Any thoughts? From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Andrew Beeken Hello! So, I’m getting to the stage where I’m needing to implement this and I’ve taken a look at a record to see how I’m going to need to rewrite the export XML to accommodate moving the documents. Take this example: <document id='http://eprints.lincoln.ac.uk/id/document/57053'> <docid>57053</docid> <rev_number>2</rev_number> <files> <file id='http://eprints.lincoln.ac.uk/id/file/321720'> <fileid>321720</fileid> <datasetid>document</datasetid> <objectid>57053</objectid> <filename>27039 Simon Burton_ Nowhere Men - a-n The Artists Information Company.pdf</filename> <mime_type>application/pdf</mime_type> <hash>939078bc226712a5e8d640ced5df31b3</hash> <hash_type>MD5</hash_type> <filesize>270630</filesize> <mtime>2017-05-10 09:42:34</mtime> </file> </files> <eprintid>27039</eprintid> <pos>1</pos> <placement>1</placement> <mime_type>application/pdf</mime_type> <format>application/pdf</format> <language>en</language> <security>staffonly</security> <main>27039 Simon Burton_ Nowhere Men - a-n The Artists Information Company.pdf</main> <content>whole_document</content> <stage>published</stage> </document> Going by the EPrints logic, I’ve located the file in <eprints_root>/archives/<myarchive>/documents/disc0/00/02/70/39/01/<document> That’s cool, I can find that. So… Do I pull the whole documents folder over to my new server and put it in the new <eprints_root>/archives/<myarchive>/documents folder? Or do I put it elsewhere? Which part of the xml needs rewriting to tell the import where to look for the file? When I’m setting the new site up I’m going to be using eprints.lincoln.ac.uk and faking the host for development
purposes so that we can retain domain integrity on the move. From:
eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Adam Field No, the subdirectories will be the same – you’ll just need to replace the bit that leads up to the documents directory. From:
<eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Andrew Beeken <anbeeken@lincoln.ac.uk> Ah, that makes sense. So will I need to factor that in when I’m doing an XML rewrite? From:
eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Adam Field it’s the eprint id, padded with zeros and then broken up into pairs of digits to make directory names. That way there’s only ever 100 directories in each directory. -- Adam From:
<eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Andrew Beeken <anbeeken@lincoln.ac.uk> You know me all too well ;) So that should work? I was only wondering because I know that under the Documents folder the structure seems rather ambiguous to me, 00 with 00, 01 and 02 and further numbers underneath that.
Is there a method to that? From:
eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of Adam Field Hi Andrew If it were anyone but you, I’d recommend doing a mysqldump and keeping all paths the same as the simplest way to migrate a repository. However, I’m sure you’ll counter that
with “…but my repository is non-standard and I’m trying to make it standard” When I’ve had to monkey around with paths in XML files, I’ve usually done it with vim and find/replace commands. Move the documents directory to the new server into a temporary
directory, then compare the path in the XML file to the path on disk. This will help you understand what string replacement you need to do. The command will be something like: :%s/\/usr\/share\/eprint3\/archives\/foo\/documents/\/home\/anbeeken\/migration\/documents/g ...it may take some time to run. -- Adam From:
<eprints-tech-bounces@ecs.soton.ac.uk> on behalf of Andrew Beeken <anbeeken@lincoln.ac.uk> Hi all, I’m looking into the best options for migrating EPrints to a new server and investigating the possibility of pulling our 43.6Gb worth of documents across so as not to embed them in XML and
create large files. I know that I can bring the Documents folder over, however I’m not sure how to interpret the folder structure to rewrite the URL’s in the XML export. Any thoughts? Andrew
*** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive:
http://www.eprints.org/tech.php/ *** EPrints community wiki:
http://wiki.eprints.org/ *** EPrints developers Forum:
http://forum.eprints.org/ *** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive:
http://www.eprints.org/tech.php/ *** EPrints community wiki:
http://wiki.eprints.org/ *** EPrints developers Forum:
http://forum.eprints.org/ *** Options:
http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive:
http://www.eprints.org/tech.php/ *** EPrints community wiki:
http://wiki.eprints.org/ *** EPrints developers Forum:
http://forum.eprints.org/ |
- References:
- Re: [EP-tech] Dissecting the Documents folder
- From: Adam Field <adam@adamfield.net>
- Re: [EP-tech] Dissecting the Documents folder
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- Re: [EP-tech] Dissecting the Documents folder
- From: Adam Field <adam@adamfield.net>
- Re: [EP-tech] Dissecting the Documents folder
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- Re: [EP-tech] Dissecting the Documents folder
- From: Adam Field <adam@adamfield.net>
- Re: [EP-tech] Dissecting the Documents folder
- From: Andrew Beeken <anbeeken@lincoln.ac.uk>
- Re: [EP-tech] Dissecting the Documents folder
- Prev by Date: Re: [EP-tech] lookup referee
- Next by Date: Re: [EP-tech] Couple of questions
- Previous by thread: Re: [EP-tech] Dissecting the Documents folder
- Next by thread: Re: [EP-tech] Dissecting the Documents folder
- Index(es):