EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #03351
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?
- From: Yuri <yurj@alfa.it>
- Date: Mon, 04 Aug 2014 10:13:09 +0200
The only option seems to enlarge the tmp :-) Il 01/08/2014 15:31, Florian Heß ha scritto:
Am 01.08.2014 11:52, schrieb Yuri:There's no official documentation about toolbox, it should be documented better. Can't you just use import with this options: --enable-import-ids By default import will generate a new eprintid, or userid for each record. This option tells it to use the id spcified in the imported data. This is generally used for importing into a new repository from an old one. --enable-file-imports Allow the imported data to import files from the local filesystem. This can obviously be seen as a security hole if you don't trust the data you are importing. This sets the "enable_file_imports" configuration option for this session only. after you've exported the eprints, modified the document section and reimporting it?Thanks, Yuri ... I've gone that way already I am afraid. If the system didn't try to upload, it wouldn't cry "not enough diskspace left on device". So that nothing remains untried, I run: bin/import $repo --enable-import-fields --enable-file-imports document XML $xmlfile Error! Unhandled exception in Import::XML: Can't write to '/tmp/E2FCKTjvNh': Auf dem Gerät ist kein Speicherplatz mehr verfügbar at /usr/share/perl5/LWP/Protocol.pm line 115. at /usr/lib/perl5/XML/LibXML/SAX.pm line 80 at .../eprints/bin/../perl_lib/EPrints/XML/LibXML.pm line 137 (The above error message is just german here) Even dropped "file://" prefix hoping that would make the system run a plain filesystem operation (as the above docs imply), but it still uses LWP. It said "Download (0b)", and when I cp'd the file where it is expected it still "failed to get file contents". I finally solved this by studying the sources and then manually inserting values (FILEID,0,"Storage::Local") into files_copies_pluginid and values (FILEID,0,FILENAME) into files_copies_sourceid database table. It works now like a charm, but hacking the database should not be necessary, promise I will use the API in the future. ;-)Another option is to use a Perl Library for efficient file handling and change the code where it does join("", <STDIN>)Still from get_data() is expected a string. This maybe wouldn't be the only place to change. The function should return a reference to a scalar, something like \do{ local $/; scalar <STDIN> }, which I did not test however. This is known as the file-slurping idiom in perl. But this code is still dangerous, simply - i.e. erroneously - attach a neverending story to standard input and your system will have a hard time to provide infinite memory. Kind regards FlorianIl 01/08/2014 11:25, Florian Heß ha scritto:Hello developers and users, again I'm sorry I have to consult you concerning a problem we've run into and couldn't solve ourselves. We need to attach a big file to a document, i.e. one of 3g in size. We limited web upload to 100m by webserver configuration in order that we keep control of large file uploads. To get bigger file into the archive we successfully use the following command: /usr/bin/perl ~eprints/bin/toolbox $repo addFile \ --document $docid --filename $filename < /path/to/existing/file (Besides, is there a convenient way of getting the document id? It is rather tedious to upload a placeholder file so we can manually seek and grab a doc id by Firebug extension; after running the command, we open the EPrint file dialog in the document metadata to switch the main file and delete the placeholder.) I narrowed this method down to a line of code in EPrints::Toolbox::get_data() that I question is scalable for these dimensions (given our hardware memory space): join("", <STDIN>) builds, in EPrints 3.3.10, a monstrous perl scalar that certainly is perpetually expanded and moved around in memory to fit in. I wonder if there is a way I can move the file to the expected place myself and adjust the file record in the EPrint database. Tried this already but at last I ended up downloading the tiny placeholder file again. I deleted the file in the console (rm), but then EPrints system threw "couldn't read file contents". So, somewhere things still were arranged for the old file. The browser displays, though, the right filename in the modal dialog offering to save or to open the file with a program whatsoever. The toolbox command was appallingly running more than two hours and gorging swap space like there was no tomorrow, then we killed it. It consumed 2% of CPU in average, status flag was "D" most of the time (man ps: "uninterruptable sleep (usually IO)"). It appeared to me it was constantly swapping. Today I tried the toolbox addDocument command which doesn't seem to save me work after all, it just requires xml data. But with <url>file:///path/of/file/to/import</url>, it runs out of disk space again while "downloading" that url in /tmp. Wish I could pass a path of a file to be copied directly, isn't that possible somehow? Kind regards Florian*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ *** EPrints developers Forum: http://forum.eprints.org/
- References:
- [EP-tech] Injecting gigabyte-scale files into EPrints archive - impossible?
- From: Florian Heß <hess@ub.uni-heidelberg.de>
- [EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?
- From: Yuri <yurj@alfa.it>
- [EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?
- From: Florian Heß <hess@ub.uni-heidelberg.de>
- [EP-tech] Injecting gigabyte-scale files into EPrints archive - impossible?
- Prev by Date: [EP-tech] Re: File Upload customisation
- Next by Date: [EP-tech] Conversion of images, videos into preview/thumbnail versions
- Previous by thread: [EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?
- Next by thread: [EP-tech] Re: Injecting gigabyte-scale files into EPrints archive - impossible?
- Index(es):