Provides functionality to transliterate all UTF-8 characters to ASCII in pure PostgreSQL.
Example:
SELECT system.transliterate_to_ascii('Příliš žluťoučký kůň úpěl ďábelské ódy'); -- will print "Prilis zlutoucky kun upel dabelske ody"
SELECT system.transliterate_to_ascii('stößt'); -- will print "stosst"
SELECT system.transliterate_to_ascii('鍖椾喊'); -- will print "Chen Zhan Han"
SELECT system.transliterate_to_ascii('Питер. Лето. Любов'); -- will print "Piter. Leto. Liubov"
SELECT system.transliterate_to_ascii('10°C'); -- will print "10degC"First, import dist/transliterate-structure.sql file, which create schema system with table transliterate_to_ascii_rules and function transliterate_to_ascii.
Second, import dist/transliterate-data.sql file with rules to transliterate all UTF-8 characters to ASCII.
Optional third, import dist/transliterate-webalize.sql file with DB function webalize() that provide converting string for nice URLs
psql dbname < dist/transliterate-structure.sql
psql dbname < dist/transliterate-data.sql
psql dbname < dist/transliterate-webalize.sqlNow, you can use function system.transliterate_to_ascii in your SQL queries or in PL/pgSQL functions, and you will always get pure ASCII string.
-- in query
SELECT system.transliterate_to_ascii('Příliš žluťoučký kůň úpěl ďábelské ódy');
-- in Pl/pgSQL
CREATE FUNCTION lower_unaccent(in_string character varying)
RETURNS character varying AS
$BODY$
RETURN lower(system.transliterate_to_ascii(in_string));
$BODY$
LANGUAGE plpgsql IMMUTABLE;In system.transliterate_to_ascii_rules table are all transliterations form UTF-8 chars to ASCII chars. You can update existing rules, if you're not satisfied with the original one.
Existing rules are taken from the great Perl library Text-Unidecode by Sean M. Burke. You can use PHP script bin/build-sql to generate actual rules dist/transliterate-data.sql from library source.
Copy all lib/Text/Unidecode/*.pm files to data/Unidecode and run bin/build-sql. You will get new rules definition and log in data directory.
Database function system.transliterate_to_ascii is written in PL/pgSQL and just quick replace all UTF-8 characters in a given string by rules from system.transliterate_to_ascii_rules table.
This mechanism is great for example for creating URL slugs from strings. Just import into DB also file dist/transliterate-webalize.sql.
Then create URL slugs like this:
SELECT system.webalize('Příliš žluťoučký kůň úpěl ďábelské ódy'); -- will print "prilis-zlutoucky-kun-upel-dabelske-ody"
SELECT system.webalize('stößt'); -- will print "stosst"
SELECT system.webalize('鍖椾喊'); -- will print "chen-zhan-han"
SELECT system.webalize('Питер. Лето. Любов'); -- will print "piter-leto-liubov"
SELECT system.webalize('10°C'); -- will print "10-c"
SELECT system.webalize('@utonomous'); -- will print "autonomous"The easiest way to get Perl library source is in Debian like Linux system with cpan command.
cpan Text::UnidecodeSources are placed in /usr/share/perl5/Text directory. Just copy *.pm from /usr/share/perl5/Text/Unidecode directory to data/Unidecode and run bin/build-sql (you will need PHP > 8.0 installed on the system).