EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #01034


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Poor performance due to cachemap, non-SQL joins


Hi,

In EPrints 3.0.5 (old, I know) I see very poor performance when a user ticks the checkbox to view their eprints in live archive. Apparently what happens is that IDs of all eprints from the archive are first inserted into one of the dynamically created cache tables (this means tens of thousands of individual INSERTs at a time, which seems like great waste - the INSERTs are not even batched). Afterwards, only the user's own eprints are displayed (let's say one or two of them).

I also noticed that joins (as in "database joins") are performed on huge arrays in Perl code, which are scanned sequentially, rather than at the SQL level. This contributes greatly to the sluggishness of generate_views (2-3 days in an installation with 70000 eprints).

I suppose that these issues are known. But I searched in trac.eprints.org, and haven't any conclusive answers to whether they still exist in the current version? Trying to make a stronger case for an upgrade...

Regards,
Jan Ploski