From 40c66067fb5eef002be164b35aa3637403fc1e48 Mon Sep 17 00:00:00 2001 From: Corinna Vinschen Date: Wed, 13 May 2009 15:11:39 +0000 Subject: [PATCH] * pathnames.sgml (pathnames-unusual): Talk about using UTF-8 in C locale. * setup2.sgml (setup-locale-problems): Ditto. --- winsup/doc/ChangeLog | 6 ++++++ winsup/doc/pathnames.sgml | 5 +++++ winsup/doc/setup2.sgml | 11 +++++++---- 3 files changed, 18 insertions(+), 4 deletions(-) diff --git a/winsup/doc/ChangeLog b/winsup/doc/ChangeLog index 12050a5ba..86ca7d75b 100644 --- a/winsup/doc/ChangeLog +++ b/winsup/doc/ChangeLog @@ -1,3 +1,9 @@ +2009-05-13 Corinna Vinschen + + * pathnames.sgml (pathnames-unusual): Talk about using UTF-8 in C + locale. + * setup2.sgml (setup-locale-problems): Ditto. + 2009-05-06 Corinna Vinschen * faq-setup.xml: Fix entry explaing how the homedir is evaluated diff --git a/winsup/doc/pathnames.sgml b/winsup/doc/pathnames.sgml index ad1468462..0a9766c6a 100644 --- a/winsup/doc/pathnames.sgml +++ b/winsup/doc/pathnames.sgml @@ -368,6 +368,11 @@ filename because the question mark will not translate back to the original Chinese character, but to a simple question mark instead. This in turn results in strange "File not found" messages. +In the default "C" locale, Cygwin creates filenames using +the UTF-8 charset. This will always result in some valid filename by +default, but again might impose problems when switching to a non-"C" +or non-"UTF-8" charset. + To avoid this scenario altogether, always use UTF-8 as the character set. diff --git a/winsup/doc/setup2.sgml b/winsup/doc/setup2.sgml index a1175939b..3ed1f2ad2 100644 --- a/winsup/doc/setup2.sgml +++ b/winsup/doc/setup2.sgml @@ -317,12 +317,15 @@ variable hasn't been set before starting this process, Cygwin has to make an educated guess which charset to use to convert the environment itself. The only reproducible way to do that in the absence of LC_ALL, LC_CTYPE, or LANG, -is to use the current Windows ANSI codepage. +is to use the "C" locale. The default conversion in the "C" locale +used by Cygwin internally is UTF-8. So, in the absence of any +internationalization environment variable, the environment will be converted +to UTF-8. As long as the environment only contains ASCII characters, this is -no problem. But if it contains native characters, and you're planning -to use, say, UTF-8, the environment will result in invalid characters in -the UTF-8 charset. This would be especially a problem in variables like +no problem at all. But if it contains native characters, and you're planning +to use, say, GBK, the environment will result in invalid characters in +the GBK charset. This would be especially a problem in variables like PATH. Per POSIX, the name of an environment variable should only