EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #00997
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Full text indexing document in Xapian search
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Full text indexing document in Xapian search
- From: Paolo Tealdi <paolo.tealdi@polito.it>
- Date: Fri, 31 Aug 2012 12:10:48 +0200
On 08/30/2012 03:02 PM, Tim Brody wrote:
On Thu, 2012-08-30 at 14:12 +0200, Paolo Tealdi wrote:Dear all, i'm upgrading from 3.2.4 to 3.3.10 and evaluating the new features of 3.3.10 version. I've installed Xapian search and i think that now simple search is quicker than 3.2.4 one. Nevertheless, i think that fulltext index is not present in Xapian search. Am i right ? How can i decide the fields list indexed in simple search (Xapian in my case) ?Xapian should search all fields, including the documents, if EPrints can convert the document to plain text. The indexing code is in lib/cfg.d/search_xapian.pl. There isn't much help for you debugging what has gone wrong with indexing. Best I can suggest is adding this just above "replace_document_by_term": my $i = $doc->termlist_begin; print "$i, " while ++$i ne $doc->termlist_end; print "\n"; Then: ./bin/epadmin reindex [archiveid] eprint [eprintid] For an eprint that isn't matching. Will show you exactly what's getting indexed for a given eprint.
Hi Tim, thank you for your answer.i debugged that file as you told me. As you told Xapian::Search indexes all fields including documents. I noticed that Xapian search doesn't use the same separators as normal indexing program: this means that potentially you can have many different words between the two indexing space (probably this isn't a big problem for english language, but for instance for italian is) . Do you think that it could be possible avoid this problem ? I searched for Xapian documentation and i didn't find anything on splitting words ...
I partially resolved with this (brutal) line : $buffer =~ s/$EPrints::Index::FREETEXT_SEPERATOR_REGEXP/ /g; put before "index_text" line in lib/cfg.d/search_xapian.pl Best regards, Paolo Tealdi -- Ing. Paolo Tealdi Area IT - Politecnico Torino Telefono/Phone : +39-011-0906714 , FAX : +39-011-0906799 Indirizzo/Address : C.so Duca degli Abruzzi, 24 - 10129 Torino - ITALY Skype : tealdi.paolo Please consider your environmental responsibility before printing this e-mail
- References:
- [EP-tech] Full text indexing document in Xapian search
- From: Paolo Tealdi <paolo.tealdi@polito.it>
- [EP-tech] Re: Full text indexing document in Xapian search
- From: Tim Brody <tdb2@ecs.soton.ac.uk>
- [EP-tech] Full text indexing document in Xapian search
- Prev by Date: [EP-tech] Re: import "feature"
- Next by Date: [EP-tech] Re: import "feature"
- Previous by thread: [EP-tech] Re: Full text indexing document in Xapian search
- Next by thread: [EP-tech] Re: How to customise Division listings
- Index(es):