Categories
Posts in this category
Sat, 26 Jul 2008
Dumping UTF-8 Data
Permanent link
The other day I wrote Perl6::Str, and
a small script that I called utf8-dump helped a lot during
debugging:
$ echo Überhacker | utf8-dump
\N{LATIN CAPITAL LETTER U WITH DIAERESIS}berhacker
It replaces all non-ASCII-characters with their Unicode name, in a form
that can be used in Perl 5 double quoted strings if use charnames
qw(:full) is loaded first.
And this is how the script looks:
#!/usr/bin/perl use strict; use warnings; use charnames (); use Encode qw(decode_utf8); while (<>){ $_ = decode_utf8($_); s{([^\0-\177])}{N_escape($1)}eg; print; } sub N_escape { return '\N{' . charnames::viacode(ord($_[0])) . '}'; }