EPrints Technical Mailing List Archive
Message: #02081
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: advanced search doesn't work with utf-8 characters
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] Re: advanced search doesn't work with utf-8 characters
- From: Tommy Ingulfsen <tommy@library.caltech.edu>
- Date: Mon, 8 Jul 2013 16:23:28 +0000
I think you may have come across the same problem that is described in this thread: http://www.eprints.org/tech.php/thread-17424.html Maybe you can try Tim's patch and see if that works for you? tommy On 7/5/13 6:43 AM, "Dobrica Pavlinusic" <dpavlin@rot13.org> wrote: >I have problem with utf-8 characters in advanced search. None of queries >which contain utf-8 characters (in Croatia we have few of them: šđčćž) >produce any results. > >I have read through wiki and this mail list and figured out that >$EPrints::Index::FREETEXT_CHAR_MAPPING might be to blame. I added >mapping for our characters but it didn't help (it would be nice to have >full support for all characters without need to edit eprints source). > >Digging around through eprints source code, I noticed that my queries >are split on utf-8 characters. If I uncomment line in Eprints::Search >with $self->get_conditions->describe I can see following behaviour: > >1. search query: "Agić" (utf-8 as last char) > >AND( > =($archive.metadata_visibility,"show") ... eprint, > =($archive.eprint_status,"archive") ... eprint, > index($archive.creators_name,"agi") ... eprint__rindex >) > >As you can see, utf-8 character gets dropped and this doesn't produce >any results. I did check in eprint__rindex table and I do have "agić" in >there. > >2. search query: "Bolanča" (utf-8 is next-to last char) > >AND( > =($archive.metadata_visibility,"show") ... eprint, > =($archive.eprint_status,"archive") ... eprint, > AND( > grep($archive.creators_name,"%[bolan]%[a]%-%") ... >eprint__index_grep, > AndSubQuery( > index($archive.creators_name,"bolan") ... >eprint__rindex, > index($archive.creators_name,"a") ... >eprint__rindex > ) > ) >) > >This is even worse, because it split search query into two queries on >utf-8 character. > >I spent last three days inserting warns here-and-there in source code in >an effort to find out where this splitting is happending, but I have hit >the brick wall with this problem. > >I would appriciate any info or pointers how to resolve this problem. > >-- >Dobrica Pavlinusic 2share!2flame >dpavlin@rot13.org >Unix addict. Internet consultant. >http://www.rot13.org/~dpavlin > >*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >*** Archive: http://www.eprints.org/tech.php/ >*** EPrints community wiki: http://wiki.eprints.org/
- Prev by Date: [EP-tech] BATCH IMPORTING RECORDS INTO EPRINTS
- Next by Date: [EP-tech] Re: advanced search doesn't work with utf-8 characters
- Previous by thread: [EP-tech] BATCH IMPORTING RECORDS INTO EPRINTS
- Next by thread: [EP-tech] Re: advanced search doesn't work with utf-8 characters
- Index(es):