EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #02079
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] advanced search doesn't work with utf-8 characters
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] advanced search doesn't work with utf-8 characters
- From: Dobrica Pavlinusic <dpavlin@rot13.org>
- Date: Fri, 5 Jul 2013 15:43:27 +0200
I have problem with utf-8 characters in advanced search. None of queries which contain utf-8 characters (in Croatia we have few of them: šđčćž) produce any results. I have read through wiki and this mail list and figured out that $EPrints::Index::FREETEXT_CHAR_MAPPING might be to blame. I added mapping for our characters but it didn't help (it would be nice to have full support for all characters without need to edit eprints source). Digging around through eprints source code, I noticed that my queries are split on utf-8 characters. If I uncomment line in Eprints::Search with $self->get_conditions->describe I can see following behaviour: 1. search query: "Agić" (utf-8 as last char) AND( =($archive.metadata_visibility,"show") ... eprint, =($archive.eprint_status,"archive") ... eprint, index($archive.creators_name,"agi") ... eprint__rindex ) As you can see, utf-8 character gets dropped and this doesn't produce any results. I did check in eprint__rindex table and I do have "agić" in there. 2. search query: "Bolanča" (utf-8 is next-to last char) AND( =($archive.metadata_visibility,"show") ... eprint, =($archive.eprint_status,"archive") ... eprint, AND( grep($archive.creators_name,"%[bolan]%[a]%-%") ... eprint__index_grep, AndSubQuery( index($archive.creators_name,"bolan") ... eprint__rindex, index($archive.creators_name,"a") ... eprint__rindex ) ) ) This is even worse, because it split search query into two queries on utf-8 character. I spent last three days inserting warns here-and-there in source code in an effort to find out where this splitting is happending, but I have hit the brick wall with this problem. I would appriciate any info or pointers how to resolve this problem. -- Dobrica Pavlinusic 2share!2flame dpavlin@rot13.org Unix addict. Internet consultant. http://www.rot13.org/~dpavlin
- Prev by Date: [EP-tech] Re: Weird "session" problem within EPrints 3.3.11
- Next by Date: [EP-tech] BATCH IMPORTING RECORDS INTO EPRINTS
- Previous by thread: [EP-tech] Weird "session" problem within EPrints 3.3.11
- Next by thread: [EP-tech] BATCH IMPORTING RECORDS INTO EPRINTS
- Index(es):