Skip to content

working-tree-encoding=UTF-16 checks out UTF-16BE #1995

Closed
@alegrigoriev

Description

@alegrigoriev
  • I was not able to find an open or closed issue matching what I'm seeing

Setup

  • Which version of Git for Windows are you using? Is it 32-bit or 64-bit?
    64 bit 2.19.1.windows.1
$ git --version --build-options

git version 2.19.1.windows.1
cpu: x86_64
built from commit: 11a3092e18f2201acd53e45aaa006f1601b6c02a
sizeof-long: 4
sizeof-size_t: 8
  • Which version of Windows are you running? Vista, 7, 8, 10? Is it 32-bit or 64-bit?
    Windows 10.1803.17134 x64
$ cmd.exe /c ver

Microsoft Windows [Version 10.0.17134.472]
  • What options did you set as part of the installation? Or did you choose the
    defaults?
    Default
# One of the following:
> type "C:\Program Files\Git\etc\install-options.txt"
> type "C:\Program Files (x86)\Git\etc\install-options.txt"
> type "%USERPROFILE%\AppData\Local\Programs\Git\etc\install-options.txt"
$ cat /etc/install-options.txt

Editor Option: Nano
Custom Editor Path:
Path Option: Cmd
SSH Option: OpenSSH
CURL Option: OpenSSL
CRLF Option: CRLFAlways
Bash Terminal Option: MinTTY
Performance Tweaks FSCache: Enabled
Use Credential Manager: Enabled
Enable Symlinks: Disabled
Enable Builtin Rebase: Disabled
Enable Builtin Stash: Disabled
  • Any other interesting things about your environment that might be related
    to the issue you're seeing?

Don't think so.

Details

  • Which terminal/shell are you running Git from? e.g Bash/CMD/PowerShell/other

Bash

Edit your .gitattributes file to assign "working-tree-encoding=UTF-16" attribute to some existing text file, and do a forced checkout of that file. Inspect the checked out file in a binary editor (for example, open as binary in Visual Studio).
  • What did you expect to occur after running these commands?

The file should be written as UTF-16LE with BOM.

  • What actually happened instead?

The file is written as UTF-16BE with BOM. This makes "working-tree-encoding" attribute pretty much useless, while it could potentially be very valuable to support UTF-16/UCS-2 files under Windows.

Not all tools under Windows understand UTF-16BE even with BOM. MSVC CRT doesn't. Visual Studio doesn't recognize those files as text (perhaps because it's using MSVC CRT to open them).

More information: The problem seems to be a general problem caused by libiconv devs decision to always produce UTF-16BE+BOM for UTF-16, without taking the BYTE_ORDER into account. iconv supplied with Git for Windows package exhibits same behavior. Existing precompild builds of ivonv/libiconv/libgettext for Windows (supplied by Michele Locati at https://mlocati.github.io/articles/gettext-iconv-windows.html) also exhibit same behavior.

BUT NEVERTHELESS, iconv installed with Centos 7.4 produces UTF-16LE+BOM, and Git 2.20 built at it from sources does that, as well. This means there may be a patch to force libiconv to the desired behavior of producing UTF-16LE on little-endian machines.

  • If the problem was occurring with a specific repository, can you provide the
    URL to that repository to help us with testing?

Not specific to a repository

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions