EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #05933
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Problem depositing larger documents via SWORD 2.0
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: Re: [EP-tech] Problem depositing larger documents via SWORD 2.0
- From: Andy Reid <Andy.REID@lshtm.ac.uk>
- Date: Thu, 15 Sep 2016 12:51:33 +0000
Hi Willem, I’m not using eprints_wrapper as such, but a similar homemade process in PHP using base64_encode and the PHPcurl library, to push files
to the SWORD 2.0 portal on eprints. I just tested with a 5MB zip file and the encoding and upload took about 4s. I don’t know offhand the spec of the virtual server it is running on, but I think it has 2GB RAM, running SUSE linux. Likewise I’m unsure of
the spec at the eprints end, but it’s also a VM. However it crashed on a 26MB file. I tried again with 3 x 8mb files and it worked fine, in about 10s. Not sure if this helps, but it does suggest that base64 processing is not a problem in itself, time-wise, with average hardware at
either end. The only obvious difference I can spot is that mine uses chunk_split to break up the base64 into lines, but how I arrived at that I can’t remember. Might be worth a try, works for me. Andy ======================= Base64 encoding fragment =========================== while ($f = mysql_fetch_array($files_result)) { #build file metadata and base64 data $filenum++; $filename = $f['file_oaManuscript']; $filenamesafe= htmlspecialchars($filename ); #Who puts ampersands in filenames!! $mimetype = $f['file_oaManuscript_mimetype'];
$maintype=$mimetype;
$mainfile=$filenamesafe; if(FALSE === ($STUFF=file_get_contents($filebase.$filename))){die("\n\nfailed to get file: $filebase$filename");} $base64=chunk_split(base64_encode($STUFF)); $hash=md5($base64);
$filesize = strlen($STUFF); $file_modified= $f['modified_oaManuscript'];
$filesXML = "
<file>
<datasetid>document</datasetid>
<filename>$filenamesafe</filename> <mime_type>$mimetype</mime_type> <hash>$hash</hash> <hash_type>MD5</hash_type> <filesize>$filesize </filesize> <mtime>$file_modified</mtime>
<data encoding='base64'>"; $filesXML .= $base64; $filesXML .= "</data> </file>"; ==========CURL FRAGMENT========================================================================================================= curl_setopt($ch, CURLOPT_URL, "http://researchonline.lshtm.ac.uk/id/contents"); curl_setopt($ch, CURLOPT_HEADER, 1); $pkgheader=Array('X-Packaging: http://eprints.org/ep2/data/2.0', 'Content-Type: text/xml', 'Metadata-Relevant: true', 'X-Verbose: true' , 'In-Progress: false'); # TRUE => user inbox; FALSE => review
curl_setopt($ch,CURLOPT_HTTPHEADER,$pkgheader); $html_in="http://pubdb.lshtm.ac.uk/publications/OAmgr/OAmgr_upload/eprints_xml.php?filter=oaPub_ID&value=$oaPub_ID"; #fetches eprints
XML $data=""> curl_setopt($ch, CURLOPT_POST,1); curl_setopt($ch, CURLOPT_POSTFIELDS, $data); curl_setopt($ch, CURLOPT_RETURNTRANSFER, TRUE); ($result=curl_exec($ch) )|| die( "curl_exec failed: ". curl_error($ch)); From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of John Salter Hi Willem, I’ve had a quick look at the php code. It’s base64 encoding the file, and adding it to the EPrintsXML it generates in a <document> element. The encoding (and decoding at the other end) takes some time – and is probably not the correct process for larger files. This is the process that I think *should* be used in this scenario: but I’m not sure if the EPrintsWrapper class can do this… Others on this list have more SWORD experience than me – hopefully someone will be able to provide a bit more advice. Cheers, John From:
eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk]
On Behalf Of W. Struiksma Hi all, I'm currently having problems depositing larger documents (> 5 MB) via SWORD 2.0. I'm using a PHP script that uses EPrintsWrapper.php. In this script the EPrints XML (including document) is
posted via cURL. The deposit takes a very long time (8 minutes for 26 MB) and the Apache process goes to a 100% processor capacity. Has anyone experienced the same behaviour before? What can I do about it? We use EPrints 3.3.13. |
- References:
- [EP-tech] Problem depositing larger documents via SWORD 2.0
- From: "W. Struiksma" <w.struiksma@rug.nl>
- Re: [EP-tech] Problem depositing larger documents via SWORD 2.0
- From: John Salter <J.Salter@leeds.ac.uk>
- [EP-tech] Problem depositing larger documents via SWORD 2.0
- Prev by Date: Re: [EP-tech] Problem depositing larger documents via SWORD 2.0
- Next by Date: Re: [EP-tech] Problem depositing larger documents via SWORD 2.0
- Previous by thread: Re: [EP-tech] Problem depositing larger documents via SWORD 2.0
- Next by thread: Re: [EP-tech] Problem depositing larger documents via SWORD 2.0
- Index(es):