EPrints Technical Mailing List Archive
Message: #04298
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
[EP-tech] Re: Normalize characters for correct sorting
- To: eprints-tech@ecs.soton.ac.uk
- Subject: [EP-tech] Re: Normalize characters for correct sorting
- From: Ian Stuart <Ian.Stuart@ed.ac.uk>
- Date: Tue, 09 Jun 2015 11:34:51 +0100
Ah - OK.... yes, I had a similar problem a few years agoIt looks like http://search.cpan.org/~kiz/MathML-Entities-Approximate-0.20/lib/MathML/Entities/Approximate.pm should be updated, and it could be used by the Tokenizer :)
On 09/06/15 09:59, pgasinos pgs wrote:
Hi Ian I probably didn't make myself clear what the real problem is. In English you don't have the same vowel with and without accent. It is only matter of correct spelling. So it is the same letter and has to be normalized to be sorted correctly. If you see Tokenizer.pm (/perl_lib/EPrints/Index/Tokenizer.pm) does the same for indexing. Kostas 2015-06-09 10:57 GMT+03:00 Ian Stuart <Ian.Stuart@ed.ac.uk <mailto:Ian.Stuart@ed.ac.uk>>: I suspect this is a Perl problem rather than an EPrints problem..... I would expect Perl to sort by Unicode Value (so 0386 before 0391) On 09/06/15 08:40, pgasinos pgs wrote: > Is there any configuration file(s) in Eprints that someone can normalize > utf-8 characters so they are sorting correctly in non English languages? > For example the Unicode entities: Ƃ GREEK CAPITAL LETTER ALPHA > WITH TONOS and > Ƈ GREEK CAPITAL LETTER ALPHA are the same and they have to be > sorted together, not in separate lists. > The vowels are even more complicated. All below, are the same letter and > they have to be in the same list: > υ υ GREEK SMALL LETTER UPSILON > ύ ύ GREEK SMALL LETTER UPSILON WITH TONOS > ϋ ϋ GREEK SMALL LETTER UPSILON WITH DIALYTIKA > ΰ ΰ GREEK SMALL LETTER UPSILON WITH DIALYTIKA AND TONOS
-- Ian Stuart. Developer: ORI, RJ-Broker, and OpenDepot.org Bibliographics and Multimedia Service Delivery team, EDINA, The University of Edinburgh. http://edina.ac.uk/ This email was sent via the University of Edinburgh. The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
- References:
- [EP-tech] Normalize characters for correct sorting
- From: pgasinos pgs <pgasinos@gmail.com>
- [EP-tech] Re: Normalize characters for correct sorting
- From: Ian Stuart <Ian.Stuart@ed.ac.uk>
- [EP-tech] Re: Normalize characters for correct sorting
- From: pgasinos pgs <pgasinos@gmail.com>
- [EP-tech] Normalize characters for correct sorting
- Prev by Date: [EP-tech] Re: Normalize characters for correct sorting
- Next by Date: [EP-tech] Re: IRStats2: activity overview vs. date report
- Previous by thread: [EP-tech] Re: Normalize characters for correct sorting
- Next by thread: [EP-tech] Re: European Cookie Law
- Index(es):