The restaurant website updates were looking good, including the little ñ in jalapeño and the é in sauté. That is, everything looked good until it was deployed to production.
Doh! How do we fix these strange characters showing up in our web page?
First, Some Basic Definitions
- Code point: A numerical ID of a character in a character set.
- Character set: or more specifically, a coded character set is a set of character symbols and their associated code points.
- Character encoding: The method of mapping of a set of characters to their code points. You can dig deeper into character encoding on Wikipedia.
- Collation: a set of rules for comparing characters in a character set. For example, if we want
A = a, then we use a case-insensitive collation, such as that defined in MySQL as
utf8_general_ci(ci = case-insensitive).
- UTF-8: Unicode Transformation Format. A character encoding that uses one to four 8-bit units (bytes) for storing characters. This includes the mapping of most of the characters known around the globe.
Using UTF-8 will solve most character encoding issues you may come across. The previously and widely used ISO-8859-1 (aka latin1) encodings are a subset of UTF-8, so the code points are the same. ISO-8859-1 encoded characters will display properly using UTF-8.
Browser Uses the Wrong Character Set
The easiest way is to check if your browser is using the wrong character set is to test with the W3C Validator. Enter your website address then scroll to the bottom of the results page to see your character encoding. If it shows UTF-8, skip to the section Wrong Character Set Used in the Database.
If the W3C Validator shows a character set other than UTF-8 or if you can’t use the validator because you’re developing locally or working on a password protected website, you’ll have to focus on two things. First is the HTML code, second is the HTTP headers.
Fix the HTML
View the source code and look for one of two different attributes of a meta tag. The attribute differs based on which version of HTML your page references. Learn how to tell which version of HTML you’re using here.
For HTML 5, ensure this tag references UTF-8:
If you’re maintaining a page in HTML 4, look at the meta tag that looks like this and ensure it says UTF-8.
<meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
If your page is missing the appropriate meta tag, add it. Test to see if this worked. If not, your webserver could be sending the wrong content type header. We solve this issue next.
Fix the HTTP headers
First, let’s check the headers. Below are two methods of checking HTTP headers.
You can type the following at the command line if you have curl installed:
curl –I http://www.my-website.com/my-page
You can check using the Chrome browser as shown in this animation.
Content-Type charset is missing, the above HTML tags are enough to fix the issue of the browser using the wrong character set to render the page. If the
charset is missing and your characters are still broken, scroll down to the in the section Wrong character set used in the database.
If the content type charset is set to something other than UTF-8, you’ll need to change this. Here’s several ways to resolve this.
Option 1: Update your Webserver Configuration
Most shared hosts support
.htaccess files. This file lives in your document root (ex:
public_html). If it doesn’t exist, create it and add this line:
If you happen to be using NGINX on a VPS, add this line to your
server declaration of your
nginx.conf or similar configuration file.
Option 2: Update your PHP Configuration
If you happen to be using PHP for your website, you can change your configuration to send the proper character set HTTP header. PHP versions starting with 5.6 automatically send the UTF-8 charset HTTP header.
The directive to use is
default_charset. This can be set in a configuration file or your PHP code.
Add the following to the main
php.ini file or a user-based
php.ini file in your document root.
default_charset = utf-8;
Or add this to your PHP code.
<?php ini_set('default_charset', 'utf-8'); ?>
Wrong Character Set Used in the Database
If your content is being stored in the database, this is another area to check for compatibility.
I had my problem while using MySQL so I’ll be referencing MySQL specific features here. These ideas translate to other DBMS.
Bad Storage Encoding
First, ensure the data isn’t corrupt by selecting the data in a quality SQL client. If the characters look broken in your table, then there’s a chance your table or column isn’t set to store the character set that was inserted.
An easy way of checking the character set is with the following query:
show create table my_table;
The character set is listed at the bottom of the definition as
DEFAULT CHARSET = xxx.
We can change the character set of the table and individual columns after the table is created and populated. Changing the character set won’t fix your broken characters but it will prevent it from happening on future inserts and updates.
Bad Transfer Encoding
If you’ve reached this point, the problem is likely caused by incompatible encoding in transit from the server to the client.
The following queries pinpoint areas that could cause character encoding issues:
show GLOBAL VARIABLES LIKE 'character_set_client'; show GLOBAL VARIABLES LIKE 'character_set_connection'; show GLOBAL VARIABLES LIKE 'character_set_results';
Or more succinctly …
show GLOBAL VARIABLES LIKE ‘character_set_%’;
If any of these show anything other than your expected character set (ex: utf8), you found your problem. This is where I find my problem for a client hosted at InMotion Hosting.
My local machine is set to use utf8 which is why characters look fine locally but not when on production.
You can set these variables independently but MySQL makes this easier for us by allowing the shortcut…
set names ‘utf8’;
This setting is only valid for the duration of the connection and so has to be sent with each new connection.
PHP allows us to set this when creating a new PDO object.
$db = new PDO( 'mysql:host=myhost;dbname=mydb', 'login', 'password', [PDO::MYSQL_ATTR_INIT_COMMAND => 'SET NAMES “UTF8”’] );
mysqli_set_charset ($link , ‘UTF8’ );
I hope you found this article helpful. If so, please share! Have a suggestion to make this article better? Let me know in a comment below.