Strange characters in the generated HTML files

04/05/2005

A couple of days ago I had problems with the generation of the HTML files, because there were strange characters in the generated files. These characters were not from the usual code range. And both MS Internet Explorer and Firefox were not able to show the generated pages correctly.

Test Scenario

After some tests the problem could be narrowed down very well. In the end, the test scenario just consisted of an index template including a module template. Exactly at the position where the MTInclude was coded, the characters appeared in the output files. Moreover, I realized that those characters were not inserted in each and every case. Sometimes they appeared and caused problems, sometimes they could not be found.

I quickly found out that the problems obviously had something to do with modifying the templates with an external editor. If I saved the templates from within Movable Type, anything was fine. However, editing the templates, which were correctly linked to an external file, with an external editor, caused problems.

Solution

The cause of these problems was the so-called BOM (byte order mark), which is inserted by some editors at the beginning of a saved text file. I did not know the BOM at all and so I had to do some searching in the internet.

Short explanation: there is the Unicode code page. It is a standard defining how numbers are mapped to characters. Simply said it is a continuation of old standards like ASCII and ANSI, extended for being able to support all imaginable characters. The Unicode can be represented by different means, as for example UTF-8, UTF-16 or UTF-32. In these sets there is a special character which may appear at the beginning of a text file. This BOM is a byte sequence, which is not used as an ordinary character. If it exists in a file, a program reading the file can find out how the file is encoded.

Some editors do not write a BOM. Others optionally write a BOM. However, Windows 2003 Server NOTEPAD, which I used for some small changes, always writes a BOM when saving as an UTF-8 file.

The scenario which lead to the problems had been as follows.

In Movable Type I created a module template. As this template contained some Umlauts, Movable Type saved the file with the UTF-8 encoding.

This module template was included by other templates. As long as I only modified the template via the user interface there were no problems.

However, when modifying the template with the help of NOTEPAD, a BOM was inserted. Movable Type read that template when rebuilding the site, but it ignored the fact that the initial bytes were a special byte sequence. Movable Type simply copied the template into the generated HTML file.

So a BOM made its way into the middle of an HTML file, where it caused the problems.

Of course the solution is simple. You just have to avoid an editor which unconditionally writes the BOM. For example, with Edit Plus the BOM can optionally be switched on and off.

Improvement Suggestion

I think it is an error, if Movable Type does not recognize the BOM at the beginning of a text file. The generation should realize that the included file is encoded with UTF-8 and should regard the specific characteristics of these files. I have thus reported the problem to Six Apart.

mgs | 04/05/2005

Feedback is welcome!

What do you think about this entry? Was it interesting or boring? I would like to hear your comments. If the text was helpful, please consider setting a link to http://www.movable-type-weblog.com/.

No spam please!

For protecting this weblog I have installed the MT-Approval Plugin. You have to view a new comment in preview mode, before it is saved on the server. Moreover, I will view your comment manually, before it is published. You can find more information on the subject in the entry Weblog Spamming Basics.

With an active TypeKey session, your comment will be published immediately.

Post a new comment

TypeKey has temporarily been disabled at this location. Please create your comment without using TypeKey or log in from the preview dialog.




Remember Me?