Sat, 26 Jul 2008
Dumping UTF-8 Data
Permanent link
The other day I wrote Perl6::Str, and
a small script that I called utf8-dump
helped a lot during
debugging:
$ echo Überhacker | utf8-dump \N{LATIN CAPITAL LETTER U WITH DIAERESIS}berhacker
It replaces all non-ASCII-characters with their Unicode name, in a form
that can be used in Perl 5 double quoted strings if use charnames
qw(:full)
is loaded first.
And this is how the script looks:
#!/usr/bin/perl use strict; use warnings; use charnames (); use Encode qw(decode_utf8); while (<>){ $_ = decode_utf8($_); s{([^\0-\177])}{N_escape($1)}eg; print; } sub N_escape { my $n = charnames::viacode(ord($_[0])); return defined($n) ? "\\N{$n}" : sprintf('\x{%x}', ord($_[0])); }
(Update 2010-04-19:) Added \x{...}
escapes for characters
which viacode
doesn't like.