EPrints Technical Mailing List Archive
Message: #01685
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Eprints critical block on cache insert
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Eprints critical block on cache insert
- From: Jan Ploski <jpl@plosquare.com>
- Date: Fri, 08 Mar 2013 19:32:24 +0100
I'm not sure we're talking about the same thing: mysql> create temporary table tmp_eprintid(eprintid int(11)) engine=memory; Query OK, 0 rows affected (0.00 sec) mysql> insert into tmp_eprintid select eprintid from eprint; Query OK, 76880 rows affected (0.05 sec) Records: 76880 Duplicates: 0 Warnings: 0As you can see, raw insert performace is nowhere close to as bad as you describe. If, on the other hand, you do 76880 inserts (which I've seen EPrints 3.0.x do for the cache tables), well...
Tim Brody wrote:
The reason EPrints uses table joins is to make use of the database optimiser. Back in v2 it used to do what you said, but that is unscalable for many trivial cases e.g. q: date> 1900 and author=(Smith, John) A naive execution of this would end up creating a temp table with an entry for every record in the database (~1 second per 1000 records), then join that against a much smaller set of {Smith, John}. That led v2 instances to spend an age doing what looked like a simple query, but because of some term matching the entire set it was very inefficient (as well as hammering IO on the temp tables). What the database optimiser is supposed to do is optimise this out by doing the smaller set first (Smith, John) then matching that against the larger set (>1900). So EPrints now has a reasonable solution for the general case but can get out of hand when given a complex query. The answer to that lies in using Xapian, Solr or similar tool that has been designed to deal with general textual searching. All the best, Tim. On Fri, 2013-03-08 at 16:52 +0100, Jan Ploski wrote:Better yet, instead of producing monster (sub)queries EPrints should use multiple SELECT INTOs with temporary tables to store intermediate results to obtain acceptable performance in general case. Tim Brody wrote:Hi, These are normally coming out of a very big search query, where you get a load of table joins for each term. Perhaps the internal search should have a sanity check to throw an error (or truncate) if too many terms are sent at it. /Tim. On Fri, 2013-03-08 at 13:56 +0100, Paolo Tealdi wrote:On 03/08/2013 01:05 PM, Lee Paton wrote: Hi Lee. No. I changed again the global parameter setting it to 0 and mysql is working like a charme since the changing. Considering the freezing frequency, i think that it should already have happened at least one time. I controlled the search a few times and the system answers more or less in 8-9 seconds. Best regards, Paolo TealdiHi Paolo I've been experiencing the same problem and your solution has solved it on our dev system Have you run into any issues on your server since you made the change? Thanks Lee Lee Paton Information Services Cardiff University 40-42 Park Place Cardiff CF10 3BB From: Paolo Tealdi<paolo.tealdi@polito.it> To: "<eprints-tech@ecs.soton.ac.uk>"<eprints-tech@ecs.soton.ac.uk> Date: 12/02/2013 08:38 Subject: [EP-tech] Eprints critical block on cache insert Sent by: eprints-tech-bounces@ecs.soton.ac.uk --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Dear all, i'm debugging some misterious mysql hanging that sometimes happen on our server (2-3 times a week). They happen during the night and when i can check the server i find the eprint server almost swaped out with a VERY big number of httpd process, all the mysql connection used and tipically a huge cache insert (the attachment is the last insert found) apparently blocking the mysql server. Doing a SHOW FULL PROCESSLIST\G i found the older active thread Id: 961782 User: eprints Host: giasone.polito.it:35530 db: eprints3310 Command: Query Time: 50077 State: statistics Info: ... the select in attachment ... You can notice that the process state is in "statistics" status. Googling i found that it seems that, if in a select there are involved a big number of tables, mysql query optimizer can block itself indefinitely analyzing that transaction, in the statistic state. It seems that setting optimizer_search_depth variable to a low value this problem disappear. The default is optimizer_search_depth = 62 i put it to set global optimizer_search_depth = 5; and seems to resolve the blocking issue. A side effect seems to be a slight speedup in general when you're doing advanced searches (those query can be "important") . Anybody has found this problem ? Best regards, Paolo Tealdi -- Ing. Paolo Tealdi Area IT - Politecnico Torino Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906799 Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY Skype : tealdi.paolo Please consider your environmental responsibility before printing this e-mail [attachment "select_bloccata_07022013.sql" deleted by Lee Paton/scolgp/CardiffUniversity] *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/
-- Dipl.-Inform. (FH) Jan Ploski IT-Beratung und Software-Entwicklung Buergereschstr. 79 - 26123 Oldenburg - Germany Phone/Fax: +49 441 3407490 / +49 441 3407491 E-Mail: jpl@plosquare.com URL: http://www.plosquare.com
- References:
- [EP-tech] Eprints critical block on cache insert
- From: Paolo Tealdi <paolo.tealdi@polito.it>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Lee Paton <PatonL@cardiff.ac.uk>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Paolo Tealdi <paolo.tealdi@polito.it>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Tim Brody <tdb2@ecs.soton.ac.uk>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Jan Ploski <jpl@plosquare.com>
- [EP-tech] Re: Eprints critical block on cache insert
- From: Tim Brody <tdb2@ecs.soton.ac.uk>
- [EP-tech] Eprints critical block on cache insert
- Prev by Date: [EP-tech] Re: ePrints SWORD API
- Next by Date: [EP-tech] Re: ePrints SWORD API
- Previous by thread: [EP-tech] Re: Eprints critical block on cache insert
- Next by thread: [EP-tech] "Event Tracking" in Google Analytics
- Index(es):