EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #05267
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Xapian indexing
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Xapian indexing
- From: Lessard Josée <josee.lessard@cirad.fr>
- Date: Mon, 14 Dec 2015 14:34:26 +0100
Hello
David, Many thanks for your solution. I added the value "text_index => 1 " when type field=> "namdeset". All fields are automatically indexed. /opt/www/eprints-3.3.12/archives/agritrop/cfg/cfg.d/ eprint_field.pl Thank you for your help Happy Holidays Josée Le 09/12/2015 19:34, David R Newman a
écrit :
Hi Josée, Turns out to be a really simple answer to this question but a rather long way round to discovering it. By default namedset fields have text_index set to 0. Therefore if only namedset fields are changed the EPrint will not be queued for re-indexing, even if the field in question will be re-indexed if you change a non-namedset field at the same time. The solution is to add a: text_index => 1 to the namedset field you want to be indexed. I suspect the reason that namedset is non indexed because it is not the value you see in the select box that will be added to the index but the underlying value in the namedset file, which often not the same. Also search on such a short term is likely to return quite a few results where this value matches but on another indexed field. Therefore, I think text_index is turned off by default because it is unlikely doing a free text search on a namedset value is going to return you the set of results you are expecting. In some cases it may be appropriate, at which point you should set text_index to 1 for this field. Regards David Newman On Wed, 2015-12-09 at 17:12 +0000, David R Newman wrote:Hi Josée, I am currently looking into this issue as well as I have identified a situation where a small percentage of EPrints cannot be found when you individual search on their title. I have script for automating testing this on multiple EPrints at once, which I can make available. On the specific issue you describe, I can replicate the same issue on a 3.3.14 version of EPrints. I have yet to dig down into what is causing it not being put in the indexer queue but I do not think it will be too difficult to figure out. I found that if I subsequently change another non-namedset field it will schedule for re-index both that field and the namedset field I had previously changed. I am not certain if your issue relates the problem I mentioned initially as I think the problem is non-Xapian dependent, as it is not until the indexing task is run later by the indexer, does it know whether it will indexed using Xapian or just to the database. Regards David Newman On Wed, 2015-12-02 at 07:55 +0100, Lessard Josée wrote:Hello, we use Xapian for our simple search. The Xapian indexing is correct when a reference is validated in the archive (eprint_status:buffer => archive) But, if the correction is made on a "namedsets" field, the document indexing is not launched! If the modification is made on a "type text" field, indexing is launched. Have you ever had this problem reported? How to make sure re-indexing is launched on any field type modifications? Sorry for my English. Sincerly Josée Lessard eprint_search_simple.pl $c->{search}->{simple} = { search_fields => [ { id => 'q', meta_fields => [ 'documents', 'eprintid', 'title', 'abstract', 'date', 'type', 'statut_indexation', 'indexeur', ... ] }, ], preamble_phrase => 'cgi/search:preamble', title_phrase => 'cgi/search:simple_search', citation => 'result', page_size => 20, order_methods => { 'byyear' => '-date/creators_name/title', 'byyearoldest' => 'date/creators_name/title', 'byname' => 'creators_name/-date/title', 'bytitle' => 'title/creators_name/-date', 'bytype' => 'type/-date/title', 'byti' => '-full_text_status/-date/title', }, default_order => 'byyear', show_zero_results => 1, }; /opt/www/eprints-3.3.12/archives/agritrop/cfg/namedsets/statut_indexation a_classer a_indexer a_indexer_indexeur en_cours_d_indexation a_indexer_electronique a_indexer_papier document_a_numeriser notice_indexee __________________________________ Correction eprints Résultat : title "Publications et travaux du SAR 1996" eprint_status "archive" statut_indexation "en_cours_d_indexation" Indexation Xapian : * title:1996 * title:du * title:et * title:publications * title:sar * title:travaux * statut_indexation:notice_indexee * lastmod:20150909 -- -- Josée Lessard Documentaliste Cirad-Dgdrs-Délégation à l'information scientifique et technique TA 183/05 - Avenue Agropolis - 34398 Montpellier Cedex 5 (Tél: +33 4 67 61 57 37) *** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ *** EPrints developers Forum: http://forum.eprints.org/*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ *** EPrints developers Forum: http://forum.eprints.org/*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech *** Archive: http://www.eprints.org/tech.php/ *** EPrints community wiki: http://wiki.eprints.org/ *** EPrints developers Forum: http://forum.eprints.org/ --
Josée Lessard Documentaliste Cirad-Dgdrs-Délégation
à l'information scientifique et technique TA 183/05 - Avenue Agropolis - 34398 Montpellier Cedex 5 (Tél: +33 4 67 61 57 37) |
- References:
- [EP-tech] Xapian indexing
- From: Lessard Josée <josee.lessard@cirad.fr>
- [EP-tech] Re: Xapian indexing
- From: David R Newman <drn@ecs.soton.ac.uk>
- [EP-tech] Re: Xapian indexing
- From: David R Newman <drn@ecs.soton.ac.uk>
- [EP-tech] Xapian indexing
- Prev by Date: [EP-tech] Basic authentication popup
- Next by Date: [EP-tech] Sitemap changed?
- Previous by thread: [EP-tech] Re: Xapian indexing
- Next by thread: [EP-tech] EPrints breaks when a deposit is retired
- Index(es):