Umlauts in Filenames

03/21/2005

Movable Type uses an entry's title for creating the individual archive filename. However, umlauts are not supported very well. But it would be a good feature, if the two-character substitutes (for example "ae" instead of "ä") were used in naming the file.

The dirifyplus Plugin

In Filename with Underscore or Dash? I described the dirifyplus Plugin and showed, how it can be used for creating Google-friendly filenames. The plugin just has the small disadvantage: umlauts are not supported.

The source code for the plugin is available. I do not know Perl at all, but as a software developer this task can certainly be solved.

Extending the Plugin

Open the file »dirifyplus.pl« with an editor. At the very beginning you will see the following lines.

my $t = $_[1];
my $a = substr($t, 0, 1);
my $b = substr($t, 1, 1);
my $c = substr($t, 2, 1);

my $s = $_[0];

If you have some programming knowledge, you will probably recognize the two variables $s and $t being initialized with the function's arguments. Obviously $s contains the string which is to be converted into the filename, and $t contains the argument describing the conversion process. Next to these assignments the variable $s is subsequently modified (not shown here). Therefore this is a good place to execute our own umlaut-replacements.

If you scroll down you will see lots of samples for the way Perl does string replacements. No need to search for a Perl manual.

Immediately behind the above lines insert the following lines.

$s =~ s!ä!ae!g; ## ä --> ae
$s =~ s!ö!oe!g; ## ö --> oe
$s =~ s!ü!ue!g; ## ü --> ue

$s =~ s!Ä!Ae!g; ## Ä --> Ae
$s =~ s!Ö!Oe!g; ## Ö --> Oe
$s =~ s!Ü!Ue!g; ## Ü --> Ue

$s =~ s!ß!ss!g; ## ß --> ss

With those seven assignments all umlauts will be replaced by their two-letter substitutes.

Save the resulting file »dirifyplus.pl« with the original filename. Pay attention to saving the file as an UTF-8 file. I recommend saving the file without the BOM. Please have a look at Strange characters in the generated HTML files for details about the BOM.

mgs | 03/21/2005

Feedback is welcome!

What do you think about this entry? Was it interesting or boring? I would like to hear your comments. If the text was helpful, please consider setting a link to http://www.movable-type-weblog.com/.

No spam please!

For protecting this weblog I have installed the MT-Approval Plugin. You have to view a new comment in preview mode, before it is saved on the server. Moreover, I will view your comment manually, before it is published. You can find more information on the subject in the entry Weblog Spamming Basics.

With an active TypeKey session, your comment will be published immediately.

Post a new comment

TypeKey has temporarily been disabled at this location. Please create your comment without using TypeKey or log in from the preview dialog.




Remember Me?


Comment

Christian | May 17, 2005 11:47 AM

There's a bug in your code:

$s =~ s!ä!ae!;

should be

$s =~ s!ä!ae!g;.

The “g� modifier (g for global) will make the regex replace _any_ occurence of your search string. The way you wrote it, only the first “ä� will be replaced.

Another option is to simple use Lingua::DE::ASCII (http://search.cpan.org/~bigj/Lingua-DE-ASCII-0.11/ASCII.pm).

Comment

mgs [TypeKey Profile Page] | May 17, 2005 01:19 PM

Thank's a lot for that comment. I modified the code in this article.

I should really start learning Perl.

Michael G. Schneider