Skip to content

Conversation

gcxRun
Copy link

@gcxRun gcxRun commented Jan 1, 2016

No description provided.

@cowtowncoder
Copy link
Member

What specific problem is fixed here? A unit test would show intended behavior improvement.

But further, I don't think EMBEDDED_OBJECT is acceptable here as it can not be uniquely identified as Object Id -- embedded objects can be any opaque things, although by default just used for binary values. But it can be used for various other uses for data of whose type is known to processing code. Object Id, on the other hand, is metadata.

@cowtowncoder
Copy link
Member

Put another way: I don't doubt there could be a problem to fix, but without a way to verify it, I can't quite see how exactly this should work. Current code tries to keep track of it with _hasNativeId/_objectId combination; perhaps that does not work always. But I'd need to understand how fix should handle cases not handled yet.

@gcxRun
Copy link
Author

gcxRun commented Jan 1, 2016

I am bridging Jackson and the Mongo java driver 3.0 Codec framework. For this, I am implementing a JsonParser by wrapping a Mongo Bson reader. Things work quite well so far, but when I use the polymorphic facility of Jackson, the TokenBuffer comes into action and has it is not storing any token for ObjectId, I can't have polymorphism and ObjectId support.
You can have a look at : https://github.com/gcxRun/jackson-mongo-codec , but I thought the test case would be better.
I agree with you, that the EMBEDDED_OBJECT is not capturing the original intent (ObjectId) but there is a mismatch between json/bson.

@cowtowncoder
Copy link
Member

Right but what I do not quite understand is as to why the existing facility that TokenBuffer has for handling Object Ids does not work or can not work for BSON use case. Use of EMBEDDED_OBJECT does not appear quite safe for other uses.
So the test case included does indeed show that your change does what you set out to do, but does not quite recreate the problem. Or put another way, seems to verify that the work-around that is added works the way work-around needs to, but that does not quite help understanding the bigger picture.

@gcxRun
Copy link
Author

gcxRun commented Jan 2, 2016

It looks to me that the TokenBuffer should behave on ObjectId as it behaves for embedded objects. It misses a token indeed to disambiguate the two. My assumption is that when i am adding to the generator a FIELD and then and ObjectId, I am expecting to be able to parse a FIELD token and then to get an other token (EMBEDDED ?) to tell that what comes next is an ObjectId.
I am looking for some kind of symmetry with embedded object.
In my real use case, the TokenBuffer is associated to an ObjectCodec (ObjectMapper) and i a invoking writeObjectId on the TokenBuffer from there.

@gcxRun
Copy link
Author

gcxRun commented Jan 2, 2016

The following sequence works.

    TokenBuffer buf = new TokenBuffer(null, true);

    Object id = new Object();
    buf.writeObjectId(id);
    //field needs to come after to post objectid into segment by side effect
    buf.writeFieldName("_id"); 

    JsonParser p = buf.asParser();

    assertToken(JsonToken.FIELD_NAME, p.nextToken());
    assertEquals("_id", p.getCurrentName());
    assertEquals(id, p.getObjectId());

It looks like the contract of the JsonParser is to attach an ObjectId to a field but not to consider it as an independant token (like an embedded object) . It looks odd. At least in my case i don't manage writing the FIELD to the JsonGenerator. This is done by Jackson and i get control back only through my custom JsonSerializer so that very sequence can not be reproduced.

@cowtowncoder
Copy link
Member

@gcxRun You may be misunderstanding concept of embedder object here: it is very different from that of Object Id, and two are not related in any way.

EMBEDDED_OBJECT is meant to be part of data; but something that has no direct counterpart in JSON. It is sort of an opaque container for either something that has more specific type, but can be serialized as JSON; or something that is only used with other backend formats.

Object Id, however, is piece of metadata attached to values. Or, rather, is for some data formats: currently YAML has this concept; and I am guessing BSON possibly as well. For data formats that do not have native Object Ids, there is actual data entry (like String value) that is used instead.
Of these two kinds of Object Ids, "as regular data" values are embedded just as regular data and TokenBuffer does not really know or care. Handling of object ids that is explicit is strictly for native object ids.

For Token Buffer, then, support is needed for as-metadata-object-ids, which are meant to be attached to other tokens. They are not represented as value tokens themselves, and use of EMBEDDED_OBJECT is not appropriate.
The way associated Object Ids are handled is not very pretty nor stable, and I am not claiming it is optimal or would work for your case. But there is a reason why it does not use inline token; it is trying to work similar to the source parser's functionality, to try to reproduce reading.

My problem here is that I only see your work-around, but not the problem you are trying to solve.
And work-around itself looks dangerous, and something that could introduce problems to YAML format module, which would be the only component to currently make use of this functionality. This means that none of existing jackson-databind unit tests would necessarily fail, but ones for YAML format would.

It would be good to understand more about the original problem you have with BSON.

@gcxRun
Copy link
Author

gcxRun commented Jan 4, 2016

My goal is to bridge Jackson and the Mongo driver v3. This is close to the mapping approach offered by Jongo or MongoJack.
They rely on bson4jackson which for parsing takes a raw bson buffer from mongo, and parses the bytes while implementing a Jackson parser. For serialization, it is implementing a JsonGenerator and writes to a raw bson buffer sent back to Mongo.
The v3 driver is providing an extensible codec framework and allow leveraging the internal mongo bson parser and writer. It makes room for a more direct adaption to Jackson which is what i want to try.

I understand more now, what you mean by ObjectId. Looking at the bson spec, ObjectId is just a specific type, that does not exist in json, pretty much like date. This is not an attribute attached to an other value.
You can have more than one ObjectId per objects, have arrays of ObjectId, etc ...
I made an attempt and stopped using the Jackson ObjectId, and i am now considering the bson:ObjectId as an embedded object, pretty much like the bson:datetime type.
I have a different issue with the TokenBuffer now. My custom serializer can not use jsonParser.writeObject as the TokenBuffer is created with a non null _objectCodec : this leads to infinite loop/stack overflow. If I remove and restore the _objectCodec, around using writeObject, then things "work like a charm". But it is odd, and interestingly there is comment from you dated from 28-May-2014, right on the implementation detail this is using.

So let's forget about the ObjectId mismatch. Embedded object is probably a more natural fit. But why does the TokenBuffer relies on the _objectCodec. As the TokenBuffer caches token from another parser, the delegation to the custom codec has already been used, hasn't it ?

@cowtowncoder
Copy link
Member

@gcxRun Ah. Thank you for explanation -- I can see why you were thinking that approach, since for your case such ids are embedded opaque objects, and not like what jackson core models as "object ids". They just have same name, but different semantics. So we are on agreement here.

As to ObjectCodec. In general, I think use of writeObject() should be minimized, as it is just a short-cut to let developers avoid passing both a generator and ObjectMapper along. In most cases, one should only operate on one anyway; and if both are used, that should be ObjectMapper (or, ObjectWriter).
I think this actually may be similar problem with naming: there is an overload here, between "simple" POJOs, which should be serialized (by ObjectMapper, using JsonSerializers etc) and "embedded objects", which are to be carried along as objects of some unknown type, and can only be handled by format-specific generator, like BSON generator.
writeObject(), as is, is for handling former type, and that is done by delegating call back to ObjectMapper.writeValue(...), essentially.

As _objectCodec being passed; it may well be wrong. It is bit difficult to reason as to what it ought to be; if I remember correctly, its passed more to allow configuration settings to be passed, but there isn't anything big it should be used for. At best it would try to carry along codec that source had.

So... I guess it would be good to try another approach here too. TokenBuffer should have a way to append VALUE_EMBEDDED_OBJECTs, even if JsonGenerator does not. Although perhaps JsonGenerator should have such a write method? Even if default implementation would have to throw UnsupportedOperationException?

@gcxRun
Copy link
Author

gcxRun commented Jan 5, 2016

I gave a try to adding a writeEmbeddedObject method to JsonGenerator (along with its canWriteEmbeddedObject companion). TokenBuffer then can explicitely push embedded object value, its parser then emits embedded objet tokens. I don't have to pop/push the ObjectCodec from my custom serializer.
Do you want me to suggest a PR with that design in core , and a new one for databind ?

@cowtowncoder
Copy link
Member

@gcxRun Another alternative would be that if you are sure you are dealing with TokenBuffer, just case it, and call method directly. At this point that'd be the best way to get things working. I am not against addition of writeEmbeddedObject(), but it's very close to 2.7.0 release and I am not sure if this can get in that. For 2.8 we'd have more time to add write method(s) and think it well through.

@gcxRun
Copy link
Author

gcxRun commented Jan 5, 2016

I can't be sure if i am dealing with a TokenBuffer, I've seen it used only when polymorphism kicks in and jackson starts parsing fields before knowing the class to construct. I could check and then cast of course, but this is close to the saving and restoring of the ObjectCodec I am doing right now : the workaround works as long the implementation does not change.
I'll look at the 2.7/2.8 timeline and will bug you again with PRs then.

@cowtowncoder
Copy link
Member

@gcxRun Sounds good, let's do that then. I will file a jackson-core issue anyway, as a reminder, link back to here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants