From: Jakub Narebski Date: Fri, 3 Feb 2012 12:44:54 +0000 (+0100) Subject: gitweb: Allow UTF-8 encoded CGI query parameters and path_info X-Git-Tag: v1.7.9.2~24^2 X-Git-Url: https://git.tokkee.org/?a=commitdiff_plain;h=84d9e2d50ca9fbcf34e31cb74797fc182187c7b5;p=git.git gitweb: Allow UTF-8 encoded CGI query parameters and path_info Gitweb forgot to turn query parameters into UTF-8. This results in a bug that one cannot search for a string with characters outside US-ASCII. For example searching for "Michał Kiedrowicz" (containing letter 'ł' - LATIN SMALL LETTER L WITH STROKE, with Unicode codepoint U+0142, represented with 0xc5 0x82 bytes in UTF-8 and percent-encoded as %C5%82) result in the following incorrect data in search field MichaÅ\202 Kiedrowicz This is caused by CGI by default treating '0xc5 0x82' bytes as two characters in Perl legacy encoding latin-1 (iso-8859-1), because 's' query parameter is not processed explicitly as UTF-8 encoded string. The solution used here follows "Using Unicode in a Perl CGI script" article on http://www.lemoda.net/cgi/perl-unicode/index.html: use CGI; use Encode 'decode_utf8; my $value = params('input'); $value = decode_utf8($value); Decoding UTF-8 is done when filling %input_params hash and $path_info variable; the former requires to move from explicit $cgi->param(