EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #03346

[EP-tech] Injecting gigabyte-scale files into EPrints archive - impossible?

To: eprints-tech@ecs.soton.ac.uk
Subject: [EP-tech] Injecting gigabyte-scale files into EPrints archive - impossible?
From: Florian Heß <hess@ub.uni-heidelberg.de>
Date: Fri, 01 Aug 2014 11:25:53 +0200

Hello developers and users,

again I'm sorry I have to consult you concerning a problem we've runinto and couldn't solve ourselves.

We need to attach a big file to a document, i.e. one of 3g in size. Welimited web upload to 100m by webserver configuration in order that wekeep control of large file uploads. To get bigger file into the archivewe successfully use the following command:


/usr/bin/perl ~eprints/bin/toolbox $repo addFile \
   --document $docid --filename $filename < /path/to/existing/file

(Besides, is there a convenient way of getting the document id? It israther tedious to upload a placeholder file so we can manually seek andgrab a doc id by Firebug extension; after running the command, we openthe EPrint file dialog in the document metadata to switch the main fileand delete the placeholder.)

I narrowed this method down to a line of code inEPrints::Toolbox::get_data() that I question is scalable for thesedimensions (given our hardware memory space):


    join("", <STDIN>)

builds, in EPrints 3.3.10, a monstrous perl scalar that certainly isperpetually expanded and moved around in memory to fit in. I wonder ifthere is a way I can move the file to the expected place myself andadjust the file record in the EPrint database. Tried this already but atlast I ended up downloading the tiny placeholder file again. I deletedthe file in the console (rm), but then EPrints system threw "couldn'tread file contents". So, somewhere things still were arranged for theold file. The browser displays, though, the right filename in the modaldialog offering to save or to open the file with a program whatsoever.

The toolbox command was appallingly running more than two hours andgorging swap space like there was no tomorrow, then we killed it. Itconsumed 2% of CPU in average, status flag was "D" most of the time (manps: "uninterruptable sleep (usually IO)"). It appeared to me it wasconstantly swapping.

Today I tried the toolbox addDocument command which doesn't seem to saveme work after all, it just requires xml data. But with<url>file:///path/of/file/to/import</url>, it runs out of disk spaceagain while "downloading" that url in /tmp.Wish I could pass a path of a file to be copied directly, isn't thatpossible somehow?



Kind regards
Florian


--
UB Heidelberg (Altstadt)
Plöck 107-109, 69117 HD
Abt. Informationstechnik
http://www.ub.uni-heidelberg.de/

Prev by Date: [EP-tech] Re: Unsubscribe
Next by Date: [EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?
Previous by thread: [EP-tech] Unsubscribe
Next by thread: [EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?
Index(es):
- Date
- Thread