EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #01685


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] Re: Eprints critical block on cache insert


I'm not sure we're talking about the same thing:

mysql> create temporary table tmp_eprintid(eprintid int(11)) engine=memory;
Query OK, 0 rows affected (0.00 sec)

mysql> insert into tmp_eprintid select eprintid from eprint;
Query OK, 76880 rows affected (0.05 sec)
Records: 76880  Duplicates: 0  Warnings: 0

As you can see, raw insert performace is nowhere close to as bad as you describe. If, on the other hand, you do 76880 inserts (which I've seen EPrints 3.0.x do for the cache tables), well...

Tim Brody wrote:
The reason EPrints uses table joins is to make use of the database
optimiser. Back in v2 it used to do what you said, but that is
unscalable for many trivial cases e.g.

q: date>  1900 and author=(Smith, John)

A naive execution of this would end up creating a temp table with an
entry for every record in the database (~1 second per 1000 records),
then join that against a much smaller set of {Smith, John}. That led v2
instances to spend an age doing what looked like a simple query, but
because of some term matching the entire set it was very inefficient (as
well as hammering IO on the temp tables).

What the database optimiser is supposed to do is optimise this out by
doing the smaller set first (Smith, John) then matching that against the
larger set (>1900).

So EPrints now has a reasonable solution for the general case but can
get out of hand when given a complex query. The answer to that lies in
using Xapian, Solr or similar tool that has been designed to deal with
general textual searching.

All the best,
Tim.

On Fri, 2013-03-08 at 16:52 +0100, Jan Ploski wrote:
Better yet, instead of producing monster (sub)queries EPrints should use
multiple SELECT INTOs with temporary tables to store intermediate
results to obtain acceptable performance in general case.

Tim Brody wrote:
Hi,

These are normally coming out of a very big search query, where you get
a load of table joins for each term.

Perhaps the internal search should have a sanity check to throw an error
(or truncate) if too many terms are sent at it.

/Tim.

On Fri, 2013-03-08 at 13:56 +0100, Paolo Tealdi wrote:
On 03/08/2013 01:05 PM, Lee Paton wrote:

Hi Lee.

No. I changed again the global parameter setting it to 0 and mysql is working like a charme since the changing. Considering the freezing frequency, i think that
it should already have happened at least one time. I controlled the search a few times and the system answers more or less in 8-9 seconds.


Best regards,
Paolo Tealdi



Hi Paolo

I've been experiencing the same problem and your solution has solved it on our dev system

Have you run into any issues on your server since you made the change?

Thanks

Lee

Lee Paton
Information Services
Cardiff University
40-42 Park Place
Cardiff
CF10 3BB



From: Paolo Tealdi<paolo.tealdi@polito.it>
To: "<eprints-tech@ecs.soton.ac.uk>"<eprints-tech@ecs.soton.ac.uk>
Date: 12/02/2013 08:38
Subject: [EP-tech]  Eprints critical block on cache insert
Sent by: eprints-tech-bounces@ecs.soton.ac.uk
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



Dear all,

i'm debugging some misterious mysql hanging that sometimes happen on our server (2-3 times a week). They happen during the night and when i can check the server i find the eprint server almost swaped out with a
VERY big number of httpd process, all the mysql connection used  and tipically a huge cache insert (the attachment is the last insert found) apparently blocking the mysql server.
Doing a SHOW FULL PROCESSLIST\G i found the older active thread

        Id: 961782
      User: eprints
      Host: giasone.polito.it:35530
        db: eprints3310
Command: Query
      Time: 50077
     State: statistics
      Info:  ... the select in attachment ...

You can notice that the process state is in "statistics" status.

Googling i found that it seems that, if in a select there are involved a big number of tables, mysql query optimizer can block itself indefinitely analyzing that transaction, in the statistic state.
It seems that setting  optimizer_search_depth variable  to a low value this problem disappear.

The default is

optimizer_search_depth = 62

i put it to

set global optimizer_search_depth = 5;

and seems to resolve the blocking issue.
A side effect seems to be a slight speedup in general when you're doing advanced searches (those query can be "important") .

Anybody has found this problem ?

Best regards,
Paolo Tealdi

--
Ing. Paolo Tealdi         Area IT - Politecnico Torino
Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906799
Indirizzo/Address : C.so Duca degli Abruzzi,  24 - 10129 Torino - ITALY
Skype : tealdi.paolo
Please consider your environmental responsibility before printing this e-mail
[attachment "select_bloccata_07022013.sql" deleted by Lee Paton/scolgp/CardiffUniversity] *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/



*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/






*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/



*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/


--
Dipl.-Inform. (FH) Jan Ploski
IT-Beratung und Software-Entwicklung
Buergereschstr. 79 - 26123 Oldenburg - Germany
Phone/Fax: +49 441 3407490 / +49 441 3407491
E-Mail: jpl@plosquare.com
URL: http://www.plosquare.com