EPrints Technical Mailing List Archive

See the EPrints wiki for instructions on how to join this mailing list and related information.

Message: #00385


< Previous (by date) | Next (by date) > | < Previous (in thread) | Next (in thread) > | Messages - Most Recent First | Threads - Most Recent First

[EP-tech] URI::Escape error in URI::_query


URI::_query cannot handle wide characters (including non-ASCII Unicode), throwing up the following error:

Use of uninitialized value within %URI::Escape::escapes in substitution iterator at /usr/local/eprints/perl_lib/URI/_query.pm line 16, <$fh> line 287.

I added a trivial patch to said file, based on the difference between URI::Escape's uri_escape and uri_escape_utf8 methods (see below).  My question is: is it reasonable for the query() method to assume all inputs are octet strings instead of character strings?  And if so: how do we police all entrypoints to ensure data sanity?

===================================================================
--- perl_lib/URI/_query.pm	(revision 4369)
+++ perl_lib/URI/_query.pm	(working copy)
@@ -13,6 +13,7 @@
 	my $q = shift;
 	$$self = $1;
 	if (defined $q) {
+ 	    utf8::encode($q);
 	    $q =~ s/([^$URI::uric])/$URI::Escape::escapes{$1}/go;
 	    $$self .= "?$q";
 	}
===================================================================

Cheers
-- 
Matthew Kerwin | Web Developer | TILS | Digital Repository Team | Level 2, I Block, Kelvin Grove | ph 3138 3910 | matthew.kerwin@qut.edu.au | CRICOS No 00213J