Keep Server Online
If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.
or
A donation makes a contribution towards the costs, the time and effort that's going in this site and building.
Thank You! Steffen
Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
| |
|
Topic: Apache 2.4 - FCGI - Perl - Broken Encoding |
|
Author |
|
j_d
Joined: 25 Oct 2018 Posts: 2
|
Posted: Thu 25 Oct '18 14:25 Post subject: Apache 2.4 - FCGI - Perl - Broken Encoding |
|
|
My setup:
Apache 2.4.29 running on a Linux machine, with fcgid loaded, and an index.pl which is running. The script itself uses:
Code: |
use utf8;
binmode(STDOUT,':utf8');
|
I also have a MySQL server running, version 5.7.24. Under personal loses I've not only managed to get rid of most traces of latin1 in my database, but actually replace it with the proper MySQL UTF-8 encoding:
Code: |
mysql> SHOW variables LIKE 'character%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8mb4 |
| character_set_connection | utf8mb4 |
| character_set_database | utf8mb4 |
| character_set_filesystem | binary |
| character_set_results | utf8mb4 |
| character_set_server | utf8mb4 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
|
In my Perl script I'm using the DBI module in order to connect to my database, and because I need proper UTF-8 support in that one (lc('ẞ') needs to equal 'ß') I'm passing the option "mysql_enable_utf8 => 1" to the connection method.
Let's first see if my script returns the proper encoding headers:
Code: |
$ perl index.pl
Content-Type: text/html; charset=UTF-8
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body>weiß</body>
</html>
|
'weiß' is selected from the database here - so that looks good. Now let's print out the content on the browser:
Well, that was unexpected. For some reason my browser displays mojibake even though I make sure to tell the browser that my character set is UTF-8 (and it's actually recognised correctly, I just checked).
Hey, maybe Apache is doing something weird with my response? Let's find out what it does with characters that aren't in western 8-bit encodings, like ... Japanese?
Code: |
$ perl index.pl
Content-Type: text/html; charset=UTF-8
<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
</head>
<body><span id="文字化け">weiß</span></body>
</html>
|
OK, nothing too unexpected on my UTF-8 console. Let's have a look at it in the browser:
Whoah! Just by introducing non-ISO-8859-1 characters to my HTML output my browser suddenly displays my German umlauts correctly. And since I cannot reproduce the issue with my script, couldn't it be Apache itself who's looking at the response, saying "if there's only ISO-8859-1 characters in it I'm going to assume ISO-8859-1 encoding and re-encode it even if it doesn't fit at all"?
It should be noted that "�" is pretty much the kind of mojibake you get when you try to print an UTF-8 character on a device that expects ISO-8859-1 characters. I already set "AddDefaultCharset utf-8" to my /etc/apache2/apache2.conf, but that didn't fix the issue. Now I could just add Japanese characters to my body at all times in order to force Apache to not try to act smart and failing at it miserably - but c'mon, that can't seriously be the solution?! |
|
Back to top |
|
j_d
Joined: 25 Oct 2018 Posts: 2
|
Posted: Tue 30 Oct '18 12:04 Post subject: |
|
|
Well, I found out what the exact problem was. Due to the XS part of the FCGI module using tied handles, as this post suggests, binmode actually did jack-effing-shit - and Jack is out of town. As such the code would:
- check if the internal UTF-8 flag of the value it was supposed to print was set
- try to "downgrade" the string to ISO-8859-1
- and gob some retarded error message in my logs if it failed to do so miserably.
The error message in particular was:
Code: |
Use of wide characters in FCGI::Stream::PRINT is deprecated and will stop wprking in a future version of FCGI (sic!)
|
, and that let me to this commit. The offending code:
Code: | if (DO_UTF8(ST(n)) && !sv_utf8_downgrade(ST(n), 1) && ckWARN_d(WARN_UTF8)) |
This code checks if the internal UTF-8 flag is set, tries to downgrade it to ISO-8859-1, and complains like a crybaby if it can't do so, like when there are characters in it that do not belong to ISO-8859-1 (like Japanese characters).
My theory: UTF-8 came out in 1993, and Perl gloriously overslept its coming until 2000. Then version 5.6 came out and added some, but not proper support for UTF-8. Two years later version 5.8 came out, with some actual UTF-8 support. The FCGI module was written in 2003, and the programmer didn't give two f's about different encodings and just wanted to prevent people to do sane things. In 2010 some Chinese Wanna-Be-Programmer wanted to earn some Open-Source cred by adding useless crap to various projects so that they'd be able to say with a straight face that they've been active in the open source community, for instance when asked so for a job interview. As such he modified the warning instead of getting rid of it entirely.
What a fuckup.
My solution now is to channel all outputs to my own printing routine, where the UTF-8 flag is cleared, the string is printed, and the flag is reset, if need be:
Code: |
sub my_print($)
{
my($string) = @_;
my $is_utf8 = is_utf8(${$string});
_utf8_off(${$string}) if($is_utf8);
print ${$string};
_utf8_on(${$string}) if($is_utf8);
}
|
And the people who're going to suggest to encode my strings instead of setting and clearing flags can go burn in hell for all I care. The thread can be closed. I'm done. |
|
Back to top |
|
|
|
|
|
|