Skip to content

dgraph live RDF parser does not support \u escape sequences in facets #4157

@adg

Description

@adg

Dgraph's RDF parser used in dgraph live does not appear to understand \uXXXX escape sequences inside facets.

The dgraph documentation says that it uses the RDF N-Quad spec, which specifies support for the \uXXXX escape sequences, but dgraph's implementation does not appear to respect it.

I tried adding these test cases to the chunker pacakage's TestLex, and the second one fails:

diff --git a/chunker/rdf_parser_test.go b/chunker/rdf_parser_test.go
index f2c45df5..7c733bbd 100644
--- a/chunker/rdf_parser_test.go
+++ b/chunker/rdf_parser_test.go
@@ -503,6 +503,28 @@ var testNQuads = []struct {
                },
                expectedErr: false,
        },
+       {
+               input: `<alice> <lives> "wonderland" (friend="hatter").`,
+               nq: api.NQuad{
+                       Subject:     "alice",
+                       Predicate:   "lives",
+                       ObjectId:    "",
+                       ObjectValue: &api.Value{Val: &api.Value_DefaultVal{DefaultVal: `wonderland`}},
+                       Facets:      []*api.Facet{{Key: "friend", Value: []byte("hatter"), Tokens: []string{"\001hatter"}}},
+               },
+               expectedErr: false,
+       },
+       {
+               input: `<alice> <lives> "wonderland" (friend="hatter \u0045") .`,
+               nq: api.NQuad{
+                       Subject:     "alice",
+                       Predicate:   "lives",
+                       ObjectId:    "",
+                       ObjectValue: &api.Value{Val: &api.Value_DefaultVal{DefaultVal: `wonderland`}},
+                       Facets:      []*api.Facet{{Key: "friend", Value: []byte("hatter E"), Tokens: []string{"\001hatter E"}}},
+               },
+               expectedErr: false,
+       },
        {
                input:       `<alice> <lives> "\u004 wonderland" .`,
                expectedErr: true, // should have 4 hex values after \u

The failure:

--- FAIL: TestLex (0.00s)
    rdf_parser_test.go:1008: 
        	Error Trace:	rdf_parser_test.go:1008
        	Error:      	Received unexpected error:
        	            	while lexing <alice> <lives> "wonderland" (friend="hatter \u0045"). at line 1 column 37: Not a valid escape char: 'u'
        	            	github.com/dgraph-io/dgraph/lex.(*Lexer).ValidateResult
        	            		/Users/adg/t/dgraph/dgraph/lex/lexer.go:200
        	            	github.com/dgraph-io/dgraph/chunker.ParseRDF
        	            		/Users/adg/t/dgraph/dgraph/chunker/rdf_parser.go:84
        	            	github.com/dgraph-io/dgraph/chunker.TestLex
        	            		/Users/adg/t/dgraph/dgraph/chunker/rdf_parser_test.go:1000
        	            	testing.tRunner
        	            		/Users/adg/go/src/testing/testing.go:909
        	            	runtime.goexit
        	            		/Users/adg/go/src/runtime/asm_amd64.s:1357
        	Test:       	TestLex
        	Messages:   	Got error for input: "<alice> <lives> \"wonderland\" (friend=\"hatter \\u0045\")."
FAIL

This manifests itself in dgraph live in that if you pass it an RDF of the form

<alice> <lives> "wonderland" (friend="hatter \u0045") .

it will fail with the above error.

Metadata

Metadata

Assignees

Labels

area/parsingIssues related to the parser or lexer.priority/P0Critical issue that requires immediate attention.status/acceptedWe accept to investigate/work on it.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions