Skip to content

Aws::Environment::GetEnv() ignores windows system code page which fails SDK-tests for german users #3520

@maximilian-loehr-at-ipolog-ai

Description

Describe the bug

Downloaded & built the C++ sdk but the FileSystemUtilsTest fails.

This is because DirectoryTreeTest::SetUp() tries to create a directory in the user's home folder but messes up character conversion in AWS::FileSystem::CreateDirectoryIfNotExists().
This conversion happens in StringUtils::ToWString() which assumes that its char* input is UTF-8, c.f. MultiByteToWideChar(CP_UTF8, ....

The problem is that AWS::Environment::GetEnv() uses _dupenv_s() which returns characters in the system code page. My name contains an 'ö', which is 0xF6 in Windows_1252, or 0x00F6 in UTF-16, but 0xC3 0xB6 in UTF-8.
The AWS::String returned from Aws::FileSystem::GetHomeDirectory() contains a single 0xF6 byte for that 'ö', which matches my Windows_1252 code page.
Naturally, the AWS::String containing Windows_1252 cannot be interpreted as UTF-8 without errors.

Regression Issue

  • Select this option if this issue appears to be a regression.

Expected Behavior

All tests should run on a clean build of the C++ sdk.

Current Behavior

Several tests involving the user name (e.g. home directory) fail on a windows system with windows_1252 code page if the user has special characters in their name.

Reproduction Steps

auto homeDirectory = Aws::FileSystem::GetHomeDirectory();
ASSERT_FALSE(homeDirectory.empty());

auto dir1 = Aws::FileSystem::Join(homeDirectory, "dir1");
bool dir1Created = Aws::FileSystem::CreateDirectoryIfNotExists(dir1.c_str());
ASSERT_TRUE(dir1Created); // will fail if user has umlauts in their name and uses win_1252 code page

Possible Solution

The solution depends on what AWS::String is expected to hold:

  • if AWS::String is expected to always hold UTF-8 characters, then AWS::Environment::GetEnv() needs an additional step to verify that the current code page is UTF-8 (GetACP() == CP_UTF8) or convert the retrieved string into UTF-8 before returning it
  • if AWS::String is expected to always contain the "native" character set, then StringUtils::ToWString() must not blindly assume UTF-8 but use the actual code page, e.g. MultiByteToWideChar( GetACP(), ...
  • otherwise, AWS::String needs to know its encoding and I assume lots of places have to be adapted...

Additional Information/Context

No response

AWS CPP SDK version used

1.11.626

Compiler and Version used

MSVC 2022 (64-Bit), Version 17.12.3

Operating System and version

Windows 11 Pro, 10.0.26100

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugThis issue is a bug.needs-triageThis issue or PR still needs to be triaged.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions