-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Describe the bug
Downloaded & built the C++ sdk but the FileSystemUtilsTest
fails.
This is because DirectoryTreeTest::SetUp()
tries to create a directory in the user's home folder but messes up character conversion in AWS::FileSystem::CreateDirectoryIfNotExists()
.
This conversion happens in StringUtils::ToWString()
which assumes that its char* input is UTF-8, c.f. MultiByteToWideChar(CP_UTF8, ...
.
The problem is that AWS::Environment::GetEnv()
uses _dupenv_s()
which returns characters in the system code page. My name contains an 'ö', which is 0xF6 in Windows_1252, or 0x00F6 in UTF-16, but 0xC3 0xB6 in UTF-8.
The AWS::String returned from Aws::FileSystem::GetHomeDirectory()
contains a single 0xF6 byte for that 'ö', which matches my Windows_1252 code page.
Naturally, the AWS::String containing Windows_1252 cannot be interpreted as UTF-8 without errors.
Regression Issue
- Select this option if this issue appears to be a regression.
Expected Behavior
All tests should run on a clean build of the C++ sdk.
Current Behavior
Several tests involving the user name (e.g. home directory) fail on a windows system with windows_1252 code page if the user has special characters in their name.
Reproduction Steps
auto homeDirectory = Aws::FileSystem::GetHomeDirectory();
ASSERT_FALSE(homeDirectory.empty());
auto dir1 = Aws::FileSystem::Join(homeDirectory, "dir1");
bool dir1Created = Aws::FileSystem::CreateDirectoryIfNotExists(dir1.c_str());
ASSERT_TRUE(dir1Created); // will fail if user has umlauts in their name and uses win_1252 code page
Possible Solution
The solution depends on what AWS::String
is expected to hold:
- if
AWS::String
is expected to always hold UTF-8 characters, thenAWS::Environment::GetEnv()
needs an additional step to verify that the current code page is UTF-8 (GetACP() == CP_UTF8
) or convert the retrieved string into UTF-8 before returning it - if
AWS::String
is expected to always contain the "native" character set, thenStringUtils::ToWString()
must not blindly assume UTF-8 but use the actual code page, e.g.MultiByteToWideChar( GetACP(), ...
- otherwise,
AWS::String
needs to know its encoding and I assume lots of places have to be adapted...
Additional Information/Context
No response
AWS CPP SDK version used
1.11.626
Compiler and Version used
MSVC 2022 (64-Bit), Version 17.12.3
Operating System and version
Windows 11 Pro, 10.0.26100