EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #10236
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Alphabetically sort names with special characters
- To: eprints-tech@ecs.soton.ac.uk
- Subject: Re: [EP-tech] Alphabetically sort names with special characters
- From: Andrew M <eprints-tech@unitedgames.co.uk>
- Date: Wed, 10 Sep 2025 07:50:43 +0100
CAUTION: This e-mail originated outside the University of Southampton. Quoting Andrew M <eprints-tech@unitedgames.co.uk>: Since the script was getting butchered in email form, I've thrown it online here: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.andrewjamesmehta.com%2Ffiles%2Feprints%2FUnicodeSortExample.pm&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7C033e0f71f42746175ca008ddf0365d8e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930839288538770%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=eoHIyCf85S5PjenKDLSofZaRMEhu5Y3uNgFPj6FtgTM%3D&reserved=0 However, the main part was: sub unicode_sort { my $self = shift; my @configuration_to_ignore_case_and_diacritics = (level => 1); return Unicode::Collate->new(@configuration_to_ignore_case_and_diacritics)->sort(@ARG); } As written about in the Perl Unicode cookbook: https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Fperlunicook%23%25E2%2584%259E-36%3A-Case-and-accent-insensitive-Unicode-sort&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7C033e0f71f42746175ca008ddf0365d8e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930839288555619%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=qC0kpHcW7CrKIXGM4V1vUk11sF2C5kmwmNxuTczfj7Q%3D&reserved=0 This is Perl, and not EPrints of course, so the next stage is to figure out where such improved sorts need to be used in EPrints, or if there is already an option in EPrints for them.
CAUTION: This e-mail originated outside the University of Southampton. CAUTION: This e-mail originated outside the University of Southampton. There was no need for the "our" before $a and $b in that code example. Apologies. Was messing around with different things and left that in. Quoting Andrew M <eprints-tech@unitedgames.co.uk>:Was intrigued by this, and had a moment of spare time, so wrote a short script, that attempts three different sorts: Default sort, Default unicode case folding case-insensitive sort, ...and since the second made no difference, I hit the online cookbook... https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Fperlunicook%23%25E2%2584%259E-36%3A-Case-and-accent-insensitive-Unicode-sort&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7C033e0f71f42746175ca008ddf0365d8e%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930839288569987%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=Khg%2BJ8uIr7H7pYxm%2FondYjm0ODIxBBv8mpZCWCXolwY%3D&reserved=0 and learned about the default unicode case-and-accent-insensitive sort. So now we know how to do the correct kind of sort in Perl....next we'd need to know where in the EPrints codebase to apply the fix. Where are you seeing the wrong order appearing? In what context do you wish for the order to be changed in? Of course there may also be a simple EPrints option that switches to more correct ordering, so I probably should have checked the EPrints wiki before looking up the Perl solution. Attempting to copy and paste the short experimental script I just wrote - hope it doesn't get butchered in email form: ====================
Quoting Will Hughes <w.p.hughes@reading.ac.uk>:CAUTION: This e-mail originated outside the University of Southampton. CAUTION: This e-mail originated outside the University of Southampton. Hi Hopefully a quick question with an easy answer: How do we get alphabetic sorting to list accented characters at an appropriate point in an alphabetic list? The default behaviour seems to use UniCode values or something, as accented characters appear at the end of the alphabet. For example, when I see this kind of sequence from Eprints: * Church, B * Lee, K * Ågren, R * Çınar, D I feel that it should (probably) be: * Ågren, R * Church, B * Çınar, D * Lee, K Is there a simple setting to implement sorting in a way that respects accented characters? (and will these characters reproduce accurately after emailing! Image attached just in case) Best wishes Will Will Hughes Emeritus Professor of Construction Management and Economics School of the Built Environment University of Reading, PO Box 219, Whiteknights Reading, RG6 6DF, UK
- Follow-Ups:
- Re: [EP-tech] Alphabetically sort names with special characters
- From: Andrew M <eprints-tech@unitedgames.co.uk>
- Re: [EP-tech] Alphabetically sort names with special characters
- References:
- [EP-tech] Alphabetically sort names with special characters
- From: Will Hughes <w.p.hughes@reading.ac.uk>
- Re: [EP-tech] Alphabetically sort names with special characters
- From: Andrew M <eprints-tech@unitedgames.co.uk>
- Re: [EP-tech] Alphabetically sort names with special characters
- From: Andrew M <eprints-tech@unitedgames.co.uk>
- [EP-tech] Alphabetically sort names with special characters
- Prev by Date: Re: [EP-tech] Alphabetically sort names with special characters
- Next by Date: Re: [EP-tech] Alphabetically sort names with special characters
- Previous by thread: Re: [EP-tech] Alphabetically sort names with special characters
- Next by thread: Re: [EP-tech] Alphabetically sort names with special characters
- Index(es):