EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #02710
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Simple search - length of query
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Simple search - length of query
- From: Jan Ploski <jpl@plosquare.com>
- Date: Wed, 05 Mar 2014 14:51:19 +0100
I ran it on 5.1.72 with optimizer_search_depth=0 (although I also tried with default - no problems). The query takes 1s on a cold db here. My cardinalities look different, of course, but not extraordinarily so (35,416801,5418424,80872,677303). John Salter wrote: > Hi Jan/Seb, > It looks like my indexes are the same as yours (see below). > Which version of MySQL are you running? What is your ' optimizer_search_depth' set to? > > Seb, > When I took a snapshot of the process', the query was in the 'statistics' stage - for a long time - so I'll see if tweaking the optimizer_search_depth helps! > Longer term we'll be switching to Xapian :o) > > Cheers, > John > > > mysql> SHOW INDEXES FROM eprint__rindex\G > *************************** 1. row *************************** > Table: eprint__rindex > Non_unique: 0 > Key_name: PRIMARY > Seq_in_index: 1 > Column_name: field > Collation: A > Cardinality: 49 > Sub_part: NULL > Packed: NULL > Null: > Index_type: BTREE > Comment: > *************************** 2. row *************************** > Table: eprint__rindex > Non_unique: 0 > Key_name: PRIMARY > Seq_in_index: 2 > Column_name: word > Collation: A > Cardinality: 1238267 > Sub_part: NULL > Packed: NULL > Null: > Index_type: BTREE > Comment: > *************************** 3. row *************************** > Table: eprint__rindex > Non_unique: 0 > Key_name: PRIMARY > Seq_in_index: 3 > Column_name: eprintid > Collation: A > Cardinality: 14859215 > Sub_part: NULL > Packed: NULL > Null: > Index_type: BTREE > Comment: > *************************** 4. row *************************** > Table: eprint__rindex > Non_unique: 1 > Key_name: eprint__rindex_eprintid_2 > Seq_in_index: 1 > Column_name: eprintid > Collation: A > Cardinality: 15994 > Sub_part: NULL > Packed: NULL > Null: > Index_type: BTREE > Comment: > *************************** 5. row *************************** > Table: eprint__rindex > Non_unique: 1 > Key_name: eprint__rindex_eprintid_2 > Seq_in_index: 2 > Column_name: field > Collation: A > Cardinality: 154783 > Sub_part: NULL > Packed: NULL > Null: > Index_type: BTREE > Comment: > 5 rows in set (0.00 sec) > > -----Original Message----- > From: eprints-tech-bounces@ecs.soton.ac.uk [mailto:eprints-tech-bounces@ecs.soton.ac.uk] On Behalf Of Jan Ploski > Sent: 05 March 2014 12:20 > To: eprints-tech@ecs.soton.ac.uk > Cc: John Salter > Subject: [EP-tech] Re: Simple search - length of query > > Could it be that you're missing some essential indexes on eprint__rindex? > > Your example query is very hard on my 3.0.5 database while it returns > quickly on 3.3.12. The difference is that the eprint__rindex primary key > in 3.3.12 consists of (field, word, eprintid) and there is also a > non-unique index on (eprintid, field). In the 3.0.5 database I only have > a non-unique index on eprintid. > > John Salter wrote: >> Hi, >> Yesterday we had a bit of an issue with our repository when someone pasted a full citation into the simple search box. >> This produced an impressive SQL query that locked things up and made users unhappy... >> >> Is there a way to sanitise what a 'simple' search might try to handle? e.g. would restricting it to a certain number of words be acceptable? >> Would the Xapian search handle a request like the one below any better? >> >> Details below/attached if you're interested! >> Cheers, >> John >> >> GET /cgi/search/simple?full=%E2%80%98Families%2C+Domesticity+and+Intimacy%3A+Changing+Relationships+in+Changi >> ng+Times%E2%80%99%2C+in+Richardson%2C+D%2C+and+Robinson%2C+V.+%28eds%29+Introducing+Women%27s+Studies%2C+third+edition.+Basingstoke%3A+Palgrave%2C+2008+pp.+1 >> 25-143.+&_action_search=Search&_order=bytitle&basic_srchtype=ALL&_satisfyall=ALL >> >> searchexp created in cache table: >> 0|1|-date/creators_name/title|archive|-|full:abstract/creators_name/date/documents/title:ALL:IN:▒Families, Domesticity and Intimacy%3A Changing Relationships in Changing Times▒, in Richardson, D, and Robinson, V. (eds) Introducing Women's Studies, third edition. Basingstoke%3A Palgrave, 2008 pp. 125-143. |-|eprint_status:eprint_status:ANY:EQ:archive|metadata_visibility:metadata_visibility:ANY:EQ:show >> >> The SQL generated by search is attached (get ready for this - it's a thing of beauty ;o) - you can see why it took a while to run! >> >> >> >> >> *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >> *** Archive: http://www.eprints.org/tech.php/ >> *** EPrints community wiki: http://wiki.eprints.org/ >> *** EPrints developers Forum: http://forum.eprints.org/ > *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech > *** Archive: http://www.eprints.org/tech.php/ > *** EPrints community wiki: http://wiki.eprints.org/ > *** EPrints developers Forum: http://forum.eprints.org/ > > *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech > *** Archive: http://www.eprints.org/tech.php/ > *** EPrints community wiki: http://wiki.eprints.org/ > *** EPrints developers Forum: http://forum.eprints.org/
- References:
- [EP-tech] Simple search - length of query
- From: John Salter <J.Salter@leeds.ac.uk>
- [EP-tech] Re: Simple search - length of query
- From: Jan Ploski <jpl@plosquare.com>
- [EP-tech] Re: Simple search - length of query
- From: John Salter <J.Salter@leeds.ac.uk>
- [EP-tech] Simple search - length of query
- Prev by Date: [EP-tech] Spam filtering on request emails
- Next by Date: [EP-tech] Re: CentOS 5 installation?
- Previous by thread: [EP-tech] Re: Simple search - length of query
- Next by thread: [EP-tech] Spam filtering on request emails
- Index(es):