EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #02087
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Memory usage in 3.2, Sword 1.3 and epdata packages
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Memory usage in 3.2, Sword 1.3 and epdata packages
- From: Tim Brody <tdb2@ecs.soton.ac.uk>
- Date: Fri, 12 Jul 2013 10:51:12 +0100
Correct. In 3.2 the HTTP post is all worked on in memory. In 3.3 XML data are streamed and will be written to disk as it arrives. /Tim. On Fri, 2013-07-12 at 08:26 +0100, Ian Stuart wrote: > With no real knowledge, and certainly no investigation.... I would > suspect the problem is actually with how the base64 files are handled, > rather then being an EPrints memory leak per sae. > > From the SWORD importers I've written, the process seems to be to > 1) read in the deposit > 2) unpack the deposit (zip into disk space, XML into memory) > 3) create the eprint object > 4) attach the files > 5) write everything out > > So I would suspect that what's happening is that all your base64 files > are created (in memory) from the XML (which is also in memory) > > On 12/07/13 03:57, Mark Gregson wrote: > > We’re using SWORD with epdata packages to deposit documents and > > multimedia into our repository (3.2). This works fine for small file > > sizes but at CPU and memory increases quickly until with a ~200MB file > > the httpd process consumes all available memory and dies. This is on a > > RHEL5 64bit box with 8GB memory with a separate DB server. > > > > Clearly, the epdata format is not the most appropriate for this size > > file due to the increased file size as a result of the base64 encoding > > and because the document is embedded within the XML. Changing package > > format may alleviate/resolve the problem but as this is definitely going > > to be a challenge in our environment I’m hoping it will be easier to > > deal with the issue within EPrints. > > > > Note, I’ve already ascertained that is not related to libxm2’s > > XML_PARSE_HUGE option being disabled, the failure occurs trying to run df. > > > > I’m about to start hunting for memory leaks and then doing additional > > memory profiling. If anyone has any suggestions about likely locations > > for memory leaks in the code, information about expected memory usage > > for SWORD with epdata packages, data from previous profiling, etc, it > > would be very valuable. > >
Attachment:
signature.asc
Description: This is a digitally signed message part
- References:
- [EP-tech] Memory usage in 3.2, Sword 1.3 and epdata packages
- From: Mark Gregson <mark.gregson@qut.edu.au>
- [EP-tech] Re: Memory usage in 3.2, Sword 1.3 and epdata packages
- From: Ian Stuart <Ian.Stuart@ed.ac.uk>
- [EP-tech] Memory usage in 3.2, Sword 1.3 and epdata packages
- Prev by Date: [EP-tech] EPrints for research and Open Educational Resources...?
- Next by Date: [EP-tech] Re: Memory usage in 3.2, Sword 1.3 and epdata packages
- Previous by thread: [EP-tech] Re: Memory usage in 3.2, Sword 1.3 and epdata packages
- Next by thread: [EP-tech] Re: Memory usage in 3.2, Sword 1.3 and epdata packages
- Index(es):