EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #04734
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Best way to import local (or remote) files through EPrints' XML file
- To: <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] Re: Best way to import local (or remote) files through EPrints' XML file
- From: "Andy Reid" <Andy.Reid@lshtm.ac.uk>
- Date: Tue, 22 Sep 2015 16:52:22 +0100
On reflection, that may be more confusing than I first thought - I should have explained that I had to deal with manuscripts where there were several files per document - Main body, cover page, tables, figures, etc - and the file metadata was already in a database. If you are just working from one file per eprint, then you don't need the loop for building up the series of files, obviously. But I thought the XML templates might be useful. Andy >>> "Andy Reid" <Andy.Reid@lshtm.ac.uk> 22 September 2015 16:18 >>> Hi George, Here's a chunk of PHP I put together recently to generate the documents/files section of an Eprints XML upload. This puts all the files into one document tag, which may or may not be waht you want to do - I'm not sure exactly how standard the mainfile configuration is on our system, but it seems to only allow the one file per document to be downloaded. So either you need to enclose each file in a separate document tag, or as I eventually did, zip all the files and push that up as one file. This version is the 'one document, many files' approach, but I can send you the Zip version as well if you like. <?php function eprints_xml_OAfiles($row){ # $row is the metadata values for this record global $link; global $dataset; $docroot=$_SERVER['DOCUMENT_ROOT']; $filebase="$docroot/publications/administration/.... <where the files live > /"; $pub_id = $row['pub_id']; $oaPub_ID = $row['oaPub_ID']; $PM= $row['pubmedid'] ; $files_query = "SELECT oaPub_ID, `oaManuscript_ID`, `oaPub_ID`, `content_oaManuscript`, # Manuscript `file_oaManuscript`, `file_oaManuscript_mimetype`, `URL_oaManuscript`, `upload_oaManuscript`, `notes_oaManuscript`, `modified_oaManuscript` FROM oaManuscript2 WHERE oaPub_ID = $oaPub_ID and upload_oaManuscript = 1 "; # ORDER BY surname"; $files_result = mysql_query ($files_query,$link) or die ("Query failed:$files_query"); $filesXML=""; while ($f = mysql_fetch_array($files_result)) { #build file metadata and base64 data $record=print_r($f,TRUE); echo "<!-- $record -->"; $filename = $f['file_oaManuscript']; $mimetype = $f['file_oaManuscript_mimetype']; if(FALSE === ($STUFF=file_get_contents($filebase.$filename))){die("\n\nfailed to get file: $filebase$filename");} $base64=chunk_split(base64_encode($STUFF)); $hash=md5($base64); $filesize = strlen($STUFF); $file_modified= $f['modified_oaManuscript']; $filesXML .= " <file> <datasetid>document</datasetid> <filename>$filename</filename> <mime_type>$mimetype</mime_type> <hash>$hash</hash> <hash_type>MD5</hash_type> <filesize>$filesize </filesize> <mtime>$file_modified</mtime> <data encoding='base64'>"; $filesXML .= $base64; #.=chunk_split(base64_encode(file_get_contents($fileURLbase.$filename))); $filesXML .= "</data> </file>"; }# ends while ($row2 = mysql_fetch_array($coded_result)) return $cit = <<<EOC <documents> <document> <mime_type>$maintype</mime_type> <format>text</format> <language>en</language> <security>public</security> <license>cc_by</license> <main>$mainfile</main> <content>accepted</content> <files> $filesXML </files> </document> </documents> EOC; } ?> >>> George Mamalakis <mamalos@eng.auth.gr> 22 September 2015 14:40 >>> Hi everybody! I'm very close to finishing my EPrints configuration + migration from DSpace. The main thing that remains to be done, is the data migration part. I've written a python script that generates an EPrints XML file based on a DSpace csv file, that I'll upload to EPrints Wiki when it'll be done. In order to complete it, I need to add the file, and I am not aware as to what syntax I should use. I have a local folder whose subfolders contain all DSPace files, where each subfolder name is the record id. Therefore, my folder structure is somewhat like this: /home/data/dspace/{record_id} where {record_id} is the DSpace id of the specific record. What are the minimum XML attributes that have to be added in my XML file in order for EPrints to import the files? And how would an example XML entry look like based on our example folder structure? Thanks all in advance! -- George Mamalakis IT and Security Officer, Electrical and Computer Engineer (Aristotle Univ. of Thessaloniki), PhD (Aristotle Univ. of Thessaloniki), MSc (Imperial College of London) School of Electrical and Computer Engineering Aristotle University of Thessaloniki phone number : +30 (2310) 994379 *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ *** EPrints developers Forum: http://forum.eprints.org/ |
- References:
- [EP-tech] Intermittent altmetric box?
- From: Meghan Jones <M.Jones3@brighton.ac.uk>
- [EP-tech] Best way to import local (or remote) files through EPrints' XML file
- From: George Mamalakis <mamalos@eng.auth.gr>
- [EP-tech] Re: Best way to import local (or remote) files through EPrints' XML file
- From: "Andy Reid" <Andy.Reid@lshtm.ac.uk>
- [EP-tech] Intermittent altmetric box?
- Prev by Date: [EP-tech] Re: Best way to import local (or remote) files through EPrints' XML file
- Next by Date: [EP-tech] Import Medatada and Files
- Previous by thread: [EP-tech] Re: Best way to import local (or remote) files through EPrints' XML file
- Next by thread: [EP-tech] Antwort: Intermittent altmetric box?
- Index(es):