EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #07196
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] A specific eprint doesn't get indexed ,
- To: eprints-tech@ecs.soton.ac.uk
- Subject: Re: [EP-tech] A specific eprint doesn't get indexed ,
- From: David R Newman <drn@ecs.soton.ac.uk>
- Date: Sat, 3 Mar 2018 00:53:20 +0000
Hi Avi, I have noted this issue happening quite a lot as well. I have
tracked it down to an issue indexing PDF documents where the
extracted word to be indexed contains non-ascii characters. If
the whole word is non-ascii characters, basically the empty string
gets indexed, if there is more than one word that is all non-ascii
characters, then it fails with the error you see below, as it
cannot index the empty string twice for the same EPrint and field
(i.e. documents). This is because the eprint__rindex table has
three fields that make up a primary key, field, word and eprintid.
As the middle one is not set that is is why you see documents--91
rather than something like documents-word-91 in your error
message. As far as I can tell, this just effects this one badly encoded word from getting indexed rather than preventing all indexing for the whole EPrint. I have tested this by writing a script to completely de-index an EPrint and then ran reindex, I could see the records disappeared from the eprint__rindex table and then reappear again after the reindex. I am going to see if I can get the encoding issue sorted out, as
this is likely to be problematic for people who are indexing
publications with non-Latin alphabets. However, this is never
straightforward, based on past experience. Regards On 02/03/2018 10:53, Stenger, Avischai
wrote:
|
- References:
- [EP-tech] A specific eprint doesn't get indexed ,
- From: "Stenger, Avischai" <avischai.stenger@ulb.tu-darmstadt.de>
- [EP-tech] A specific eprint doesn't get indexed ,
- Prev by Date: [EP-tech] A specific eprint doesn't get indexed ,
- Next by Date: Re: [EP-tech] A specific eprint doesn't get indexed ,
- Previous by thread: [EP-tech] EPrints/CRIS
- Next by thread: [EP-tech] DOI handling in orcid_support_advance
- Index(es):