A simplified example of my app:
I have a HTML form as the output of a PHP script that gets a text from a database and fills an input of that form with it. There I can edit the text that on form submit is sent to a PHP script via a jQuery AJAX call. Through PHP the text is saved in the database and then the saved value is retrieved in PHP and sent in the JSON result of the AJAX call.
The character encoding of the HTML page is ISO-8859-1:
Let’s say the he HTML form looks like this:
<form id="my_form">
<input type="text" id="txtId" name="txt" value="" />
<input type="submit" name="btn" value="save">
</form>
On form submit this AJAX call is made:
$.ajax({
type: "POST",
url: "my_script.php",
data: $("#my_form").serialize(),
success: function (jsonObj) {
if(!jsonObj) {
return;
}
if("txt" in jsonObj) {
$("#txtId").val(jsonObj[txt]);
}
return false;
},
error: showError,
dataType: "json"
});
In PHP, after saving the text in the database and retrieving the saved text, I add it to an associative array which I convert into a JSON object displayed as the response of the AJAX call:
$item['txt'] = $value; //$value is the text saved in the database
header("Content-type: application/json");
echo json_encode($item);
At first I submited the form with exactly the text that came from PHP at page load. It looked fine, bun when reloading the page, a weird text filled my form input. I had the text a×b and now I got a×b.
I submitted the form again with text a×b and I studied the AJAX call response with Firebug. In both Console and Net tabs of Firebug, under Post tab of the call everything looked fine, but under Response tab I got “txt”:”a\u00d7b” instead of “txt”:”a×b”.
It looked like the text somwhere on the way back to the form got encoded in a weird manner. The × character is a Windows-1252 encoded character, not an UTF-8 encoded character and I should discover where the encoding of the text changed.
I submitted the correct text again and outputted the text saved in the database without json_encode-ing it:
echo $item['txt'];
In Firebug, in the Response tab of the AJAX call, even though under the Console tab of Firebug the text looked fine, under the Net tab it appeared like this: a×b.
Again I submitted the correct text and outputted the value that came via AJAX:
echo $_POST['txt'];
And again I obtained the correct text under the Console tab and the incorrect one under the Net tab in Firebug, which meant that the encoding broke before the text got to the server.
Then, under Headers tab of the call, I noticed among the Request Headers: Content-Type application/x-www-form-urlencoded; charset=UTF-8 and I thought maybe setting the character encoding of the jQuery.ajax call to ISO-8859-1 instead of UTF-8 would solve my problem:
$.ajax({
type: "POST",
url: "my_script.php",
data: $("#my_form").serialize(),
contentType: "application/x-www-form-urlencoded;charset=ISO-8859-1",
success: function (jsonObj) {
if(!jsonObj) {
return;
}
if("txt" in jsonObj) {
$("#txtId").val(jsonObj[txt]);
}
return false;
},
error: showError,
dataType: "json"
});
But the result remained the same and even more, the Content-Type header did not change either. After lots of thinking and testing, I came to these conclusions:
1. If the data parameter of the jQuery.ajax call is not empty and the type parameter is set to “POST”, the character encoding of the request remains UTF-8 no matter what, so (if I want my encoding to take effect) what I would normaly put in the data parameter I should add to the query string of the url of the AJAX call and not specify or leave the data parameter empty (setting the value of empty string to the data parameter).
2. Explicitly setting the character encoding of the AJAX request to ISO-8859-1 didn’t help at all with my problem.
3. jQuery serialize function `messes up` special characters that are not UTF-8 encoded, because it uses JavaScript function encodeURIComponent which UTF-8-encodes special characters, so make sure to UTF-8-decode the texts in the server script when using jQuery serialize or JavaScript encodeURIComponent function in an AJAX call.
So I left the JavaScript code as it initially was (without specifying the contentType parameter to the jQuery.ajax call) and, in the PHP code, decoded the string before saving it in the database:
$txt = utf8_decode($_POST['txt']);
By now I have the correct text saved in the database, but another problem arises: the text in the response of the AJAX call is null. But why?
PHP function json_encode only works with UTF-8 encoded characters, that’s why. So I should have may own JSON-maker function:
function make_json($item) {
foreach($item as $key => $value) {
if(is_array($value)) {
$arr[] = '"'.$key.'":'.make_json($item[$key]);
} else {
$arr[] = '"'.$key.'":"'.str_replace(array("\\", "\""), array("\\\\", "\\\""), $value).'"';
}
}
return '{'.implode(",",$arr)."}";
}
And the code in the PHP script becomes:
header("Content-type: application/json");
echo make_json($item);
But now I get this weird result: a�b (diamond shaped character with question mark inside instead of special characters). This time in the Net tab of Firebug everything looks fine, while the diamond shaped characters appear in the Console tab and in the HTML page.
I solved it by explicitly setting (in PHP) the character encoding of the response of the AJAX call to ISO-8859-1 (thanks to this post):
header("Content-type: application/json; charset=ISO-8859-1");
echo make_json($item);