Skip to content

Dgraph exports incorrect data in JSON and RDF formats. #3610

@danielmai

Description

@danielmai

If you suspect this could be a bug, follow the template.

What version of Dgraph are you using?

master dbd7540

Have you tried reproducing the issue with latest release?

Yes. This issue does not happen in v1.0.15.

Steps to reproduce the issue (command/config used to run Dgraph).

  • Run 1 Dgraph Zero and bulk load the 21-million movie data set (21million.rdf.gz and 21million.schema).
  • Run 1 Dgraph Alpha with the bulk loaded data.
  • Run a JSON export:
curl localhost:8080/admin/export?format=json
  • Run an RDF export:
curl localhost:8080/admin/export

Use live loader or bulk loader to re-import the results back to Dgraph. They don't work since the exports output invalid triples. These exports are incredibly messed up.

Actual result

Trying to load the JSON export shows this error:

2019/06/27 15:44:46 Expected JSON map start. Found: ,

The beginning of the json export a single line with just a comma,

$ zcat g01.json.gz | head
[
,
  {"uid":"0x45","wpt_description@en":"Tite Kubo"},
  {"uid":"0x45","rottentomatoes_id@fi":"Tite Kubo"},
  {"uid":"0x45","produced_by@zh":"久保带人"},
  {"uid":"0x45","produced_by@hu":"Kubo Tite"},
  {"uid":"0x45","produced_by@ca":"Tite Kubo"},
  {"uid":"0x45","produced_by@ko":"쿠보 타이토"},
  {"uid":"0x45","produced_by@pt":"Tite Kubo"},
  {"uid":"0x45","produced_by@no":"Tite Kubo"},

Irrespective of the line with just the comma, when counting the number of records between the RDF and JSON exports there's missing triples in the JSON export.

And the JSON export has triples that don't make sense based on the initial data set, like these:

Schema:

name:string @index(hash,term,trigram,fulltext) @lang . 
cinematography:[uid] . 

Export data, where name is a uid and cinematography is a language string:

  {"uid":"0x20572","name":[{"uid":"0x279e38"}]},
  ...
  {"uid":"0x7ef","cinematography@en":"You'll never laugh as long and as loud again as long as you live! The laughs come so fast and so furious you'll wish it would end before you collapse!"},

The RDF export has lines like these:

<0x3f522> <name<0x1e16e4>\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00> <0x4f48d> .
...
<0x13b174> <written_by\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00> <0x1226fc> .

Expected behaviour

The export data should be valid inputs to import back into Dgraph.

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugSomething is broken.status/acceptedWe accept to investigate/work on it.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions