Files that have Unicode chars only after the 1024th byte get their Unicode mangled

The situation is that I had a file whose first 1024 bytes didn't contain a Unicode character, but then a character after the 1024th byte did. An example of such a file is:

http://exercism.io/submissions/1e341848768141cf8eba94c6af6e55a7

Submitting this file mangles the Unicode character.

In contrast, this file has Unicode in the first 1024 bytes, so it is good (even the Unicode that appears after the first 1024 bytes is good)

http://exercism.io/submissions/dce4e3ddf0294034ad987ce7b86cdb38

(These are just example submissions in Hello World, but this affected my submission for a real exercise too, Counter in xgo)

I tracked this down to `readFileAsUTF8String` in `api/iteration.go`. This uses the https://godoc.org/golang.org/x/net/html/charset#DetermineEncoding function to determine the encoding, which reads the first 1024 bytes.

I'm not really sure what's the right solution here. I know that function was created for https://github.com/exercism/cli/pull/182 to solve https://github.com/exercism/exercism.io/issues/2303 so there obviously is a legitimate reason behind all this, I guess maybe now we just need to figure out how to deal with this case as well. I don't yet have a good solution, so I'll file this first and sleep on it for a bit.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Files that have Unicode chars only after the 1024th byte get their Unicode mangled #309

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Files that have Unicode chars only after the 1024th byte get their Unicode mangled #309

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions