EPrints Technical Mailing List Archive
See the EPrints wiki for instructions on how to join this mailing list and related information.
Message: #10235
< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First
Re: [EP-tech] Alphabetically sort names with special characters
- To: eprints-tech@ecs.soton.ac.uk
- Subject: Re: [EP-tech] Alphabetically sort names with special characters
- From: Andrew M <eprints-tech@unitedgames.co.uk>
- Date: Tue, 09 Sep 2025 23:12:51 +0100
CAUTION: This e-mail originated outside the University of Southampton. Was intrigued by this, and had a moment of spare time, so wrote a short script, that attempts three different sorts: Default sort, Default unicode case folding case-insensitive sort, ...and since the second made no difference, I hit the online cookbook... https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Fperlunicook%23%25E2%2584%259E-36%3A-Case-and-accent-insensitive-Unicode-sort&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca2a714e4fef24bff968708ddeff336b1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930551409417862%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=D7UwnbMnfBGkPpHOP1v1mDHR23TgOKcFM8CTuOaUW3U%3D&reserved=0 and learned about the default unicode case-and-accent-insensitive sort. So now we know how to do the correct kind of sort in Perl....next we'd need to know where in the EPrints codebase to apply the fix. Where are you seeing the wrong order appearing? In what context do you wish for the order to be changed in? Of course there may also be a simple EPrints option that switches to more correct ordering, so I probably should have checked the EPrints wiki before looking up the Perl solution. Attempting to copy and paste the short experimental script I just wrote - hope it doesn't get butchered in email form: ==================== #!/usr/bin/env perl package UnicodeSortExample; # Used throughout: use strict; use warnings; use v5.16; # enables 'fc' keyword, as well as unicode_strings and say and other useful things. See https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Ffeature%23FEATURE-BUNDLES&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca2a714e4fef24bff968708ddeff336b1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930551409435167%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=wxeqw4x8ISHVJcoXcEimINHjjqqp%2BlOJ5w9qCpCV14E%3D&reserved=0 use utf8; use English; # Allows use of $ARG instead of $_. See https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Fperlvar&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca2a714e4fef24bff968708ddeff336b1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930551409448579%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=XXd3OWlArwwFuh10zfxjsDmrEFWndVWMHQNrLpC%2FtU0%3D&reserved=0 use Unicode::Collate; # Allows sorting in Unicode. Included in core Perl since Perl 5.8 # Global Encoding Settings: my $encoding_layer; SET_ENCODING_LAYER_AT_COMPILE_TIME: BEGIN { my $encoding_to_use = 'UTF-8'; # Change this to desired encoding value. $encoding_layer = ":encoding($encoding_to_use)"; # This is what actually gets used for the layer. }; use open ':std' , "$encoding_layer"; # :std affect is global. binmode STDIN , $encoding_layer; binmode STDOUT , $encoding_layer; binmode STDERR , $encoding_layer; $ENV{'PERL_UNICODE'} = 'AS'; # A = Expect @ARGV values to be UTF-8 strings. # S = Shortcut for I+O+E - Standard input, output and error, will be UTF-8. # ENV settings are global for current thread and any forked processes. =pod Pod Documentation for UnicodeSortExample.pm =encoding utf8 =cut =pod FILENAME, VERSION, SYNOPSIS, DESCRIPTION, VERSION =head2 FILENAME UnicodeSortExample.pm - Experimenting with alphabetical sorting. =head2 VERSION This is Version v1.0.0. =cut our $VERSION = 'v1.0.0'; =pod SYNOPSIS, DESCRIPTION =head2 SYNOPSIS # Run file at the command line: perl ./UnicodeSortExample.pm =head2 DESCRIPTION Modulino for experiments with alphabetical sorting. =cut UnicodeSortExample->start() unless caller; =pod SUBROUTINES =head2 SUBROUTINES =cut =head3 UnicodeSortExample->start() Sets input (a hardcoded unsorted list). Does processing (calls a variety of sort methods and saves the results to a series of lists). Displays output (describes what each list is, and then displays it). =cut sub start { # Initial Values: my $self = shift; my @unsorted_list = ( 'Lee, K', 'Church, B', 'Çınar, D', 'Ågren, R', ); # Processing: my @default_sort = $self->default_sort(@unsorted_list); my @case_folding_sort = $self->case_folding_sort(@unsorted_list); my @unicode_sort = $self->unicode_sort(@unsorted_list); # Output: say ''; say 'Unordered Input:'; say ''; say "* $ARG\n" for @unsorted_list; say 'Applying default alphabetical sort:'; say ''; say "* $ARG\n" for @default_sort; say 'Applying case folded alphabetical sort:'; say ''; say "* $ARG\n" for @case_folding_sort; say 'Applying Unicode sort:'; say ''; say "* $ARG\n" for @unicode_sort; } =head3 $self->default_sort(@unordered_list); Takes a list, and returns it sorted, according to the standard alphabetical sort described at: L<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Ffunctions%2Fsort&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca2a714e4fef24bff968708ddeff336b1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930551409462477%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=T5mDyAZDrPlkttjsORgeYy1V9tS5hhMBhHIAjEyVXgo%3D&reserved=0> =cut sub default_sort { my $self = shift; return (sort {our $a cmp our $b} @ARG); } =head3 $self->case_folding_sort(@unordered_list); Takes a list, and returns it sorted, according to the standard case insensitive alphabetical sort described at: L<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Ffunctions%2Fsort&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca2a714e4fef24bff968708ddeff336b1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930551409475528%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=h29BXvIvZYkSSfsddjCxJOe%2BdF2%2BFaMbEve8UilHtZI%3D&reserved=0> =cut sub case_folding_sort { my $self = shift; return (sort {fc(our $a) cmp fc(our $b)} @ARG); # fc folds cases across all unicode. So comparisons are always case insensitive across all unicode types. }; =head3 $self->unicode_sort(@unordered_list); Takes a list, and returns it sorted, according to the standard unicode case and accent insensitive alphabetical sort described at: L<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2Fperlunicook%23%25E2%2584%259E-37%3A-Unicode-locale-collation&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca2a714e4fef24bff968708ddeff336b1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930551409488681%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=hgZweCBJpIZH3NkddS7zN3vlWoEmkHVD5UxG1VCagIU%3D&reserved=0> and elaboration can be found at: L<https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fperldoc.perl.org%2FUnicode%3A%3ACollate&data=05%7C02%7Ceprints-tech%40ecs.soton.ac.uk%7Ca2a714e4fef24bff968708ddeff336b1%7C4a5378f929f44d3ebe89669d03ada9d8%7C0%7C0%7C638930551409501769%7CUnknown%7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C80000%7C%7C%7C&sdata=EdVIEifKy0pWl2KaDGS9daq3T0Bzl%2FRsDrusC5KiRu4%3D&reserved=0> or L<Unicode::Collate>. =cut sub unicode_sort { my $self = shift; my @configuration_to_ignore_case_and_diacritics = (level => 1); return Unicode::Collate->new(@configuration_to_ignore_case_and_diacritics)->sort(@ARG); } =head2 AUTHOR (en-GB) Andrew Mehta =cut =head2 COPYRIGHT AND LICENSE (en-GB) Copyright ©2025, Andrew Mehta. This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5.42.0. For more details, see the full text of the licenses via L<perlartistic> and L<perlgpl>. This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. =cut 1; __END__ Quoting Will Hughes <w.p.hughes@reading.ac.uk>:
CAUTION: This e-mail originated outside the University of Southampton. CAUTION: This e-mail originated outside the University of Southampton. Hi Hopefully a quick question with an easy answer: How do we get alphabetic sorting to list accented characters at an appropriate point in an alphabetic list? The default behaviour seems to use UniCode values or something, as accented characters appear at the end of the alphabet. For example, when I see this kind of sequence from Eprints: * Church, B * Lee, K * Ågren, R * Çınar, D I feel that it should (probably) be: * Ågren, R * Church, B * Çınar, D * Lee, K Is there a simple setting to implement sorting in a way that respects accented characters? (and will these characters reproduce accurately after emailing! Image attached just in case) Best wishes Will Will Hughes Emeritus Professor of Construction Management and Economics School of the Built Environment University of Reading, PO Box 219, Whiteknights Reading, RG6 6DF, UK
- Follow-Ups:
- Re: [EP-tech] Alphabetically sort names with special characters
- From: Andrew M <eprints-tech@unitedgames.co.uk>
- Re: [EP-tech] Alphabetically sort names with special characters
- References:
- [EP-tech] Alphabetically sort names with special characters
- From: Will Hughes <w.p.hughes@reading.ac.uk>
- [EP-tech] Alphabetically sort names with special characters
- Prev by Date: Re: [EP-tech] Alphabetically sort names with special characters
- Next by Date: Re: [EP-tech] Alphabetically sort names with special characters
- Previous by thread: [EP-tech] Alphabetically sort names with special characters
- Next by thread: Re: [EP-tech] Alphabetically sort names with special characters
- Index(es):