-
Notifications
You must be signed in to change notification settings - Fork 579
Description
When creating or reading literals with XSD's gYear as datatype, these get stored as Python's datatime.date. While I can understand the choice for this approach, it nevertheless changes the semantics of the literals and incorrectly increases their precision by specifying a month and day.
To reproduce:
> import rdflib
RDFLib Version: 5.0.0
> gyear = rdflib.Literal("1970", datatype=rdflib.XSD.gYear)
> gyear.toPython()
datetime.date(1970, 1, 1)
> str(gyear)
'1970-01-01'
The above representation causes several problems. Firstly, when decoupling the values from the Literal class, e.g. using gyear.toPython()
, str(gyear)
, or gyear.value
, it is no longer possible to discern that the value is of datatype year, and that the month and day representations are meaningless. Secondly, when serializing the graph, the month and day representation are included in the literal, which therefore violate the definition of the value space:
gYear uses the date/timeSevenPropertyModel, with ·month·, ·day·, ·hour·, ·minute·, and ·second· required to be absent [1].
This serialization problem is an oversight I assume, and can easily be remedied by returning only the year representation when the datatype is gYear. The other issue is a bit more challenging, since I assume you want to keep the convenience of Python's datetime module. Perhaps it would be an idea to only return a datetime.date object if object.toPython
is called, and to return just the year when str(object)
or object.value
are used.
[1] https://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/datatypes.html#gYear