EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #05424
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Antwort: Searching fails when database field contains Å (utf8 %c3%85)
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Antwort: Searching fails when database field contains Å (utf8 %c3%85)
- From: martin.braendle@id.uzh.ch
- Date: Thu, 18 Feb 2016 10:11:45 +0100
Hi,
we can reproduce the behavior:
Advanced search (which goes to the SQL index): Ågren, ågren, "Ågren" and "ågren" all fail
Quick search (which goes to the Xapian index:) both creators_name:ågren and creators_name:Ågren find results   (creators_name is the field name we use for authors)
perl_lib/EPrints/Index/Tokenizer.pm contains a translation list that maps Unicode characters to ASCII - Å is missing there. Maybe this is the clue?
Best regards,
Martin
--
Dr. Martin Brändle
Zentrale Informatik
Universität Zürich
Stampfenbachstr. 73
CH-8006 Zürich
 Christer Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem when querying names in the advanced sear
Christer Enkvist ---17/02/2016 17:20:34---Hello all! I have encountered a weird UTF-8 related problem when querying names in the advanced sear
Von:	Christer Enkvist <christer.enkvist@slu.se>
An:	"eprints-tech@ecs.soton.ac.uk" <eprints-tech@ecs.soton.ac.uk>
Datum:	17/02/2016 17:20
Betreff:	[EP-tech] Searching fails when database field contains Å (utf8 %c3%85)
Gesendet von:	eprints-tech-bounces@ecs.soton.ac.uk
Hello all!
I have encountered a weird UTF-8 related problem when querying names in the advanced search. If the name of an author contains Å, like Ångström, (UTF-8 %c3%85, A with a ring above) then querying will fail. I have not seen the problem for any other character, e.g. no problem with ”å” (a with ring above), %c3%a5, or any other non A-Z letter such as ä,Ä,ö, or Ö. The problem is when the database entry itself contains an Å, which is typically when the character is the first in the name like Ångström or in a hyphened name like Per-Åke.
Furthermore, if the queryterm contains an “Å” then it will fail. A few examples:
Mårten – works
mårten – works
MåRTEN -- works
MÅRTEN -- fails
mÅrten -- fails
The query field is (normally) case insensitive so it shouldn’t matter if I write “ångström” or “Ångström”. However, hit or miss in this case depends on if the database have an Å and/or the query term contains an Å as it seems like Eprints cannot handle “Å”. Always, displays correct and is correctly written into the database. Only problem is the advanced search.
Should add that querying the database using SQL works without any problems (incl all upper/lower combinations). Any ideas what may be wrong with Eprints and where to start looking?
Regards,
Christer
Christer Enkvist, Ph D
System Administrator/System Librarian
Division of Scholarly Communication
Swedish University of Agricultural Sciences
Uppsala, Sweden
Telephone: 018-671042
*** Options: http://mailman.ecs.soton.ac.uk/mailman/listinfo/eprints-tech
*** Archive: http://www.eprints.org/tech.php/
*** EPrints community wiki: http://wiki.eprints.org/
*** EPrints developers Forum: http://forum.eprints.org/
- References:
- [EP-tech] Searching fails when database field contains Å (utf8 %c3%85)
- From: Christer Enkvist <christer.enkvist@slu.se>
 
 
- [EP-tech] Searching fails when database field contains Å (utf8 %c3%85)
- Prev by Date: [EP-tech] Antwort: Search by creators name not working when name has an apostrophe, e.g. O'Brien
- Next by Date: [EP-tech] Antwort: Antwort: Searching fails when database field contains Å (utf8 %c3%85)
- Previous by thread: [EP-tech] Searching fails when database field contains Å (utf8 %c3%85)
- Next by thread: [EP-tech] Antwort: Antwort: Searching fails when database field contains Å (utf8 %c3%85)
- Index(es):
