EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #01681
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Eprints critical block on cache insert
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Eprints critical block on cache insert
- From: Tim Brody <tdb2@ecs.soton.ac.uk>
- Date: Fri, 08 Mar 2013 16:11:14 +0000
The reason EPrints uses table joins is to make use of the database optimiser. Back in v2 it used to do what you said, but that is unscalable for many trivial cases e.g. q: date > 1900 and author=(Smith, John) A naive execution of this would end up creating a temp table with an entry for every record in the database (~1 second per 1000 records), then join that against a much smaller set of {Smith, John}. That led v2 instances to spend an age doing what looked like a simple query, but because of some term matching the entire set it was very inefficient (as well as hammering IO on the temp tables). What the database optimiser is supposed to do is optimise this out by doing the smaller set first (Smith, John) then matching that against the larger set (>1900). So EPrints now has a reasonable solution for the general case but can get out of hand when given a complex query. The answer to that lies in using Xapian, Solr or similar tool that has been designed to deal with general textual searching. All the best, Tim. On Fri, 2013-03-08 at 16:52 +0100, Jan Ploski wrote: > Better yet, instead of producing monster (sub)queries EPrints should use > multiple SELECT INTOs with temporary tables to store intermediate > results to obtain acceptable performance in general case. > > Tim Brody wrote: > > Hi, > > > > These are normally coming out of a very big search query, where you get > > a load of table joins for each term. > > > > Perhaps the internal search should have a sanity check to throw an error > > (or truncate) if too many terms are sent at it. > > > > /Tim. > > > > On Fri, 2013-03-08 at 13:56 +0100, Paolo Tealdi wrote: > >> On 03/08/2013 01:05 PM, Lee Paton wrote: > >> > >> Hi Lee. > >> > >> No. I changed again the global parameter setting it to 0 and mysql is working like a charme since the changing. Considering the freezing frequency, i think that > >> it should already have happened at least one time. I controlled the search a few times and the system answers more or less in 8-9 seconds. > >> > >> > >> Best regards, > >> Paolo Tealdi > >> > >> > >> > >>> Hi Paolo > >>> > >>> I've been experiencing the same problem and your solution has solved it on our dev system > >>> > >>> Have you run into any issues on your server since you made the change? > >>> > >>> Thanks > >>> > >>> Lee > >>> > >>> Lee Paton > >>> Information Services > >>> Cardiff University > >>> 40-42 Park Place > >>> Cardiff > >>> CF10 3BB > >>> > >>> > >>> > >>> From: Paolo Tealdi<paolo.tealdi@polito.it> > >>> To: "<eprints-tech@ecs.soton.ac.uk>"<eprints-tech@ecs.soton.ac.uk> > >>> Date: 12/02/2013 08:38 > >>> Subject: [EP-tech] Eprints critical block on cache insert > >>> Sent by: eprints-tech-bounces@ecs.soton.ac.uk > >>> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > >>> > >>> > >>> > >>> Dear all, > >>> > >>> i'm debugging some misterious mysql hanging that sometimes happen on our server (2-3 times a week). They happen during the night and when i can check the server i find the eprint server almost swaped out with a > >>> VERY big number of httpd process, all the mysql connection used and tipically a huge cache insert (the attachment is the last insert found) apparently blocking the mysql server. > >>> Doing a SHOW FULL PROCESSLIST\G i found the older active thread > >>> > >>> Id: 961782 > >>> User: eprints > >>> Host: giasone.polito.it:35530 > >>> db: eprints3310 > >>> Command: Query > >>> Time: 50077 > >>> State: statistics > >>> Info: ... the select in attachment ... > >>> > >>> You can notice that the process state is in "statistics" status. > >>> > >>> Googling i found that it seems that, if in a select there are involved a big number of tables, mysql query optimizer can block itself indefinitely analyzing that transaction, in the statistic state. > >>> It seems that setting optimizer_search_depth variable to a low value this problem disappear. > >>> > >>> The default is > >>> > >>> optimizer_search_depth = 62 > >>> > >>> i put it to > >>> > >>> set global optimizer_search_depth = 5; > >>> > >>> and seems to resolve the blocking issue. > >>> A side effect seems to be a slight speedup in general when you're doing advanced searches (those query can be "important") . > >>> > >>> Anybody has found this problem ? > >>> > >>> Best regards, > >>> Paolo Tealdi > >>> > >>> -- > >>> Ing. Paolo Tealdi Area IT - Politecnico Torino > >>> Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906799 > >>> Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY > >>> Skype : tealdi.paolo > >>> Please consider your environmental responsibility before printing this e-mail > >>> [attachment "select_bloccata_07022013.sql" deleted by Lee Paton/scolgp/CardiffUniversity] *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech > >>> *** Archive: http://www.eprints.org/tech.php/ > >>> *** EPrints community wiki: http://wiki.eprints.org/ > >>> > >>> > >>> > >>> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech > >>> *** Archive: http://www.eprints.org/tech.php/ > >>> *** EPrints community wiki: http://wiki.eprints.org/ > >>> > >> > >> > > > > > > > > *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech > > *** Archive: http://www.eprints.org/tech.php/ > > *** EPrints community wiki: http://wiki.eprints.org > *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech > *** Archive: http://www.eprints.org/tech.php/ > *** EPrints community wiki: http://wiki.eprints.org/
Attachment:
signature.asc
Description: This is a digitally signed message part
- References:
- [EP-tech] Eprints critical block on cache insert
- From: Paolo Tealdi <paolo.tealdi@polito.it>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Lee Paton <PatonL@cardiff.ac.uk>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Paolo Tealdi <paolo.tealdi@polito.it>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Tim Brody <tdb2@ecs.soton.ac.uk>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Jan Ploski <jpl@plosquare.com>
- [EP-tech] Eprints critical block on cache insert
- Prev by Date: [EP-tech] Re: Eprints critical block on cache insert
- Next by Date: [EP-tech] ePrints SWORD API
- Previous by thread: [EP-tech] Re: Eprints critical block on cache insert
- Next by thread: [EP-tech] Re: Eprints critical block on cache insert
- Index(es):