EPrints Technical Mailing List Archive
Message: #01491
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: international character search problem
- To: "eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
- Subject: [EP-tech] Re: international character search problem
- From: Tommy Ingulfsen <tommy@library.caltech.edu>
- Date: Thu, 24 Jan 2013 17:20:55 +0000
Hi and thanks for putting up a patch on Git so quickly. I'm sorry to say that I ran into another problem when I patched our server with the new perl_lib/EPrints/MetaField/Name.pm. Previously, the regular expression that splits up initials was located after the test for whether we're doing a simple search (as opposed to an advanced search) - this is the new version of the code I'm talking about: # split up initials $v2 =~ s/([\p{Uppercase}])/ $1/g; # name searches are case sensitive $v2 = "\L$v2"; if( $search_mode eq "simple" ) { return EPrints::Search::Condition->new( $indexmode, $dataset, $self, $v2 ); } Now, if I do a simple search for e.g. "James", the splitting up of initials above causes a search for " James" to be performed, which doesn't work so well. I'm not entirely sure what the intention of all of the code is, so I don't have a fix for this myself yet. There was another, unrelated, issue I came across while debugging. In the table eprint__rindex, I noticed that some of the non-ASCII characters in creators_name are stored correctly - e.g. "zenginoğlu". But then there are some authors whose names don't come through right. For example, when I entered a new paper written by "Magó", the creators_name is stored as "mago" in eprint__rindex.word. Another example I found is "Eötvös", which is stored as "eoetvoes". I haven't looked into this one in detail myself yet, so I don't have any pointers as to what the cause may be. Anyway, the first search issue is more pressing for us, so if anyone on the list has any ideas for a robust solution that would be great. Regards Tommy, Caltech On 1/17/13 4:38 AM, "Tim Brody" <tdb2@ecs.soton.ac.uk> wrote: >On Thu, 17 Jan 2013 00:46:37 +0000, Tommy Ingulfsen ><tommy@library.caltech.edu> wrote: >> I may have found a bug in EPrints 3.3.10. One of the authors in our >> repository is Anıl Zenginoğlu (if the name doesn't come out right in >> email, his homepage is http://www.tapir.caltech.edu/~anil/). Searching >> for the surname works fine with the simple search, but with the advanced >> search we don't get any results. I believe the problem is with line 230 >in >> perl_lib/EPrints/MetaField/Name.pm: >> >> # remove not a-z characters (except ,) >> $v2 =~ s/[^a-z,]/ /ig; >> >> That code splits up "zenginoğlu" to "zengino lu". A possible solution >may >> be >> >> use utf8; >> … >> $v2 =~ s/[^\p{L},]/ /ig; >> … >> >> Maybe someone with a strong encodings-fu can comment? > >Hi, > >I've written a fix here: >https://github.com/eprints/eprints/issues/13 > >-- >All the best, >Tim. >*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech >*** Archive: http://www.eprints.org/tech.php/ >*** EPrints community wiki: http://wiki.eprints.org/
- Prev by Date: [EP-tech] Re: SWORD and Eprints 3.2 - **SOLVED**
- Next by Date: [EP-tech] Document security requirements
- Previous by thread: [EP-tech] SWORD and Eprints 3.2
- Next by thread: [EP-tech] Document security requirements
- Index(es):