-
Notifications
You must be signed in to change notification settings - Fork 1.2k
[PROPOSAL] Khronos Avatar Extensions - Phase 1 #2512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Updated skeleton_biped Updated mapping README updates Fixing contributors list as well as some logical errors in the extension schemas
|
In the Summary section where VRM Consortium is used add " (VRMC)" so usage later in the document is clear. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where are the schemas and example files? They are missing from this PR.
I was under the impression that the Khronos avatar extensions were going to be a superset of the features provided by the VRMC extensions, but most of the good stuff from VRMC is missing here, such as spring bones and constraints.
This extension defines a 1-to-1 mapping between avatars and scenes. Most glTF implementations will only load a single scene per file. This definition explicitly prohibits having 2 avatars in a scene, and glTF does not provide a mechanism for one scene to be used in another scene. Now, in practice, people using glTF for interchange of 3D models will typically only have one avatar per file, and load multiple glTF files for multiple avatars - but I was under the impression that Khronos wants to allow single glTF files to be used as a last-mile delivery format, where the whole scene, including potentially multiple avatars, is all represented in one file. So, the question that need answering: Is it a stated goal of KHR_avatar to only allow one avatar per glTF scene/file, and is this goal isolated from the goals of KHR_interactivity and such?
Also, see #1542 - support for multiple scenes in a glTF file is extremely rare, and folks like @donmccurdy have mentioned that this could be removed if there was another compatibility breakage. Even if that won't ever happen, I would recommend avoiding building atop this "feature". Khronos's own implementations do not handle multiple scenes per file correctly - such as the Blender importer, which imports multiple scenes as collections, even though Blender itself has a multiple scenes feature.
| Expression types include: | ||
|
|
||
| - **Emotions**: `happy`, `angry`, `surprised`, etc. | ||
| - **Visemes**: `aa`, `ee`, `th`, `oo`, etc. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not sufficient to just define "etc". The extension should define a large list of interoperable names, or else the extension does not do much to further interoperability.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expression are allowed to define anything that is in the categories of morphshape, joints or textures so we put the vrm defaults here. You are correct. https://github.com/vrm-c/vrm-specification/blob/master/specification/VRMC_vrm-1.0/expressions.md#lip-sync-procedural
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I personally don't want to define the possible visemes. For example in Godot Engine, we decided to use unified expressions. https://docs.vrcft.io/docs/tutorial-avatars/tutorial-avatars-extras/unified-blendshapes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the same problem we're currently facing across the industry. We don't have standards or prevalent shared vocabularies. Once we have wider adoption, I truly believe that those that use these extensions can come together to establish those vocabularies.
For now though, establishing it without getting feedback from groups that'd use it would end in frustration. I'd much rather try to establish something that is flexible, interoperable, AND can be used for when the community comes together to form those.
For now though, the expression and joint mapping extensions are meant to provide mechanisms to map creator expressions to endpoint expressions. This extension is more to denote what an expression 'is' (animation/channel-wise). Once you have the creator/producing pipeline's concept of what the relative expressions are, mapping them to an endpoints desired/expected set becomes easier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If anyone is interested in further discussion on viseme blend shape naming standardization, discuss here: meshula/LabRCSF#5
| ### 3. Region Metadata for Accessibility | ||
|
|
||
| ```json | ||
| "tags": ["left_hand"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent casing.
What, exactly, is the list of allowed strings? If the answer is "anything!" then it's not very useful for interoperability, because implementations won't know which strings to listen for. What if one app uses leftHand and another uses left_hand and another uses handLeft? Then none of those will be able to read each other's data when reading a glTF file exported by the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to use the rules of jsonld for defining schema-driven metadata, but we could also look at https://www.w3.org/WAI/standards-guidelines/aria/
See also json schema
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My answer here would be the same as for your question/comment on the expression joint readme.
|
|
||
| | Property | Type | Description | | ||
| |----------------|-------------------|-----------------------------------------------------------------------------| | ||
| | `joints` | object | Mapping from canonical biped joint names to node indices in the glTF file. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "canonical biped joint names" need to be explicitly defined in full, for every valid name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fallback Skeleton Mappings
Should we fallback to any known skeleton mappings when the skeleton_mapping extension is used? Is VRM's Humanoid something the community feels okay with ratifying, or is the Unity origin something that makes that less desirable? Should we fall back to the OpenXR Skeleton definition?
This is the open question of fallback definition contention mentioned earlier in the readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think canonical here was likely bad verbiage on my end. Going to remove it for now; because I don't believe these extensions should define sets of fixed-name joints quite yet.
| These standard rigs are typically defined by the consuming platform, runtime, or service provider. Each standard rig: | ||
| - Defines a fixed joint name vocabulary and hierarchy. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The KHR avatar extensions should be what defines the fixed name vocabulary, to enable interoperability.
What do you do if a glTF file has "hand_left" and the KHR_avatar_skeleton_mapping says this maps to "handLeft" but then another platform expects "LeftHand" and another expects "HandL"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a vigor debate about the fallback skeleton versus vendor specific skeletons. See readme.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think I agree with this. I think that the vocabularies are by nature going to be fragmented by runtime, and there's little-to-no chance that we'd be able to (in the first phase of these extensions) convince international communities and members of industry.
I think longer term, yes, we will absolutely be able to define a vocabulary with help from the greater community. I think that comes after initial adoption and engagement with the greater community overall.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The existing fragmentation of vocabularies is precisely why I insist that Khronos should define it, in cooperation with international communities and the industry of course.
At a minimum, I suggest performing a case study to take a look at the names used in various engines and applications. In the meantime, I've been using the same bone names as RPM for my own character models, but it's arbitrary, and it would be nice to have an industry standard set of names once and for all. Also, ideally, allow excluding an explicit skeleton name map when the names match, so we don't need "Hips": "Hips" or similar if the model author chooses to name the bones in a way that already matches the standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_ in cooperation with international communities and the industry of course._
I think this will happen. Just not with the timing of the phase 1 extensions. We have to get them to come to the table for those conversations to happen; and I believe establishing these extensions creates that opportunity.
Bluntly, even with a case study in-hand right now, we'd still need to have extensive collaborations across the board to get to something everything is happy with. It's going to take time, and I believe we're (Khronos and the greater community) is going to make the effort to make it happen. It just won't be in time to align with these initial extensions.
As for your recommendation; that's perfectly reasonable. I believe we can make that assertion/recommendation for run times to adopt that behavior in the current state of the extension, without a standard in-place.
|
|
||
| ## Overview | ||
|
|
||
| The `KHR_avatar_virtual_joints` extension introduces *virtual joints*—custom transform nodes that exist relative to the avatar’s skeletal hierarchy but are **not part of the skinned joint structure**. These virtual transforms serve as semantic attachment or control points for systems like look-at targeting, item equipping, IK hints, and seating positions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For seating, I would love to see the KHR avatar extensions include/adopt/reference OMI_seat, which defines control points for chairs or other seats, allowing for avatars to sit on those objects. We could perhaps bring this in as KHR_avatar_seat to keep with the KHR_avatar_* naming convention (though it's not part of an avatar itself, just something in the world that an avatar interacts with - though I should also mention, it can have interaction with other categories of extensions, like being used by OMI_vehicle as a pilot seat). I am the author of OMI_seat, I would happily grant Khronos all rights to copy the extension text.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree, isn't possible for OMI_seat to define a series of KHR_avatar_virtual_joints that are used internally to solve for seating?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it would be possible to define it that way, saying that for example a virtual joint named "SittingSeatKnee" or some similar name on an avatar should be used to aim towards the seat's knee control point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take a look and bring it to an upcoming working group meeting to discuss! Thanks!
This is a point of contention as I want to be able to export an entire scene with let's say 15 avatars. |
For openness and collaboration, I and others have encouraged the early publishing of this draft pull request despite its incompleteness. I hope that you understand that reaching for the better design is better done early, and together rather than avoiding superficial errors. Edited: There are parts of the story that are personal information. It is also not cool to ask for more interaction and then complain about the work's lower quality that is tradeoff for rapid response. Note that some of the contributors to Khronos Group aren't paid and/or volunteering time and effort. |
Yeah, frankly I just quickly installed VSCode on my personal computer to crunch this out while recovering from a recent procedure. You're not wrong; I should have been more diligent with formatting. Let me get to that in the next day, apologies. |
I agree with @fire's comment above about it being useful for some use cases to have multiple characters in a single glTF scene. There could be 1, 2, 15, 317, or any other amount. These characters don't necessarily need to all be human-controlled "avatars", they could be NPCs in a scene. Such NPCs have similar requirements, like retargeting, as mentioned. Or, they could be characters that human players switch between. In either case, "character" seems fitting to me. |
- Added Nick Burkard to the list of contributors - Changed Verbiage where needed to ensure there's no implication of a vocabulary definition as part of the phase 1 set of extensions. - Resolved as many formatting concerns as possible
I think a discussion around scene usage is probably due; especially given your concerns around it initially. I was a bit hesitant around its usage; I suspect this will be something that we need to get takes from the wider community on. I agree I can see the use-case here; but if it's going to cause problems for adoption or tooling it might end up being a non-starter. |
Frankly, I've been out of practice with git for far too long, and have forgotten that git commits for a PR are more...eternal than my current active workflow.
I believe we are currently defining feature sets that can be specified with a basis. Features like spring bones or toon materials need more discussion. |
extensions/2.0/Khronos/KHR_avatar_expression_procedural/README.md
Outdated
Show resolved
Hide resolved
|
|
||
| ## Overview | ||
|
|
||
| The `KHR_avatar_mesh_annotation` extension enables arbitrary per-mesh metadata annotations for avatar models. This provides a generalized way for creators and tools to semantically tag portions of geometry for gameplay, rendering, accessibility, customization, or runtime logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm yet to grasp the intention of the extension, in contrast with KHR_avatar_mesh_annotation_rendering which has a clear behavior purpose. We probably need more real-world examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add more examples here in the near future. I've seen specific runtime needs across numerous implementations where things like mesh annotations would have assisted. It's not a top-priority extension by all means, but this does assist where runtime logic would benefit from mesh annotations/tagging.
Top-of-the-head example around accessories; there may be experiences or sub-instances within an experience that changes how the user/character/avatar interacts or perceives the experience. More specifically, say you have a character or avatar in a desert experience where the developers, for some reason, adjust shader parameters based on whether the character has sunglasses. With a metadata label denoting what a submesh or given mesh is (in this case, sunglasses), it removes a few steps for avatar creators and developers to enable scenarios like this.
That being said; this then of course leads to a "Not every experience shares a vocabulary" scenario; but at least this helps start that conversation in terms of asset labeling.
| "source": "myRig_leftFoot", | ||
| "targets": [ | ||
| { "joint": "leftFoot", "weight": 0.8 }, | ||
| { "joint": "leftToeBase", "weight": 0.2 } | ||
| ] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid such one-to-many mapping might not be supported by the humanoid skeleton system in Unity. I feel like it needs a component like constraints or custom scripts, and it probably overcomplicates the implementation. I want to check the demand volume before having this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, we should only have one-to-one mapping.
As an example use case, digitigrade legs on a furry character have 3 segments, but humanoid rigs have legs with 2 segments. There is a simple solution commonly used for this case: having a separate set of bones with no mesh attached, use those for the humanoid skeleton, and then use constraints to copy the transforms of the humanoid bones to the real bones, with the upper and lower leg segments both copying the humanoid thigh, where the humanoid thigh's bone length is the sum of the upper and lower leg segments. You might look at this and think that it would be nice to be able to semantically map both leg segments as the humanoid thigh, but in practice that doesn't add value. IK systems expect to work with 2-segmented humanoid legs, and full body tracking systems use real human legs as their input data, which are 2-segmented. I don't know if more complexity in the specification could improve upon what already works as a general solution. Adding the fake bones as glTF nodes and adding the constraints would only be a few hundred bytes of JSON in the glTF, and keeps implementations simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@0b5vr Agree that we should identify the demand volume. Let me add it to the open questions in the next day or so.
I don't think we should necessarily limit ourselves based on whether engines support this natively or not at the time of publishing. Agreed that it then leads to custom scripts/constraints; but if we provide utility that then leads to adoption, we hopefully can generate enough interest to get native support for it added.
Right now, there are several platforms where getting a custom/bespoke avatar to adhere to platform expectations typically requires switching to the rig hierarchy of that platform. This is also the case in many smaller experiences. If a given character/avatar was designed with a reduced rig in mind, this then means that the creator has to redo a large amount of work for platform compatibility.
In the case of one-to-many; spine and neck joint mappings immediately come to mind. If a creator has 3 spine bones initially, and a platform expects 5; this extension can provide the mapping to the expected 5 and enable distribution of the animation values across the target 3 (creating a smoother animation for the reduced rig set)
Perhaps part of this extension should also indicate what portion of the joint movement is desired (translation, rotation, scale). That level of granularity would likely assist in a variety of scenarios.
@aaronfranke; while that's definitely a solution, it sounds like it's orthogonal to what's being proposed here; both could exist. If anything, it sounds like the extension could actually assist with your scenario, as you could then leverage it with the scripts you're using for the retargeting for the initial mapping step (Unless I'm missing something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the opinion that, from the perspective of glTF portability, this extension should support only 1:1 mapping.
If my understanding is correct, glTF places importance on portability, meaning that it is desirable to obtain the same results in any environment. In other words, it is preferable that any platform can handle the specification smoothly and easily.
When considering the glTF specification, I saw many discussions from the perspective of whether major platforms and engines would be able to process it without problems, and in some cases, even for features that seemed to have high demand, adoption was postponed if there were engines for which processing seemed difficult.
If it is an extension, those portability requirements may be somewhat relaxed, but if we aim for a standard KHR extension, it would be desirable to aim for the same level of portability.
I don't think we should necessarily limit ourselves based on whether engines support this natively or not at the time of publishing. Agreed that it then leads to custom scripts/constraints; but if we provide utility that then leads to adoption, we hopefully can generate enough interest to get native support for it added.
For the above reasons, I have a different view. If there are definitions that are not easy to process, support for the extension itself may not progress and its spread may be hindered, and I think that self-imposed limitations are in fact an important element for glTF. The fact that processing expected on a particular platform may no longer be reproducible in glTF, or that creators may have to redo their work, is, to some extent, an unavoidable sacrifice for standardization. Striking the right balance is the key element.
If we want to generate interest, I think there is the option of defining it as an EXT or vendor extension with a lower degree of standardization. The KHR extension would support only 1:1 mapping, the EXT/vendor extension would extend the KHR extension to also support one-to-many mapping, and on platforms that do not support the EXT/vendor extension, the KHR extension would be the fallback. This scenario would keep the problem small while providing engines, etc., an opportunity to try one-to-many mapping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the discussion would go more smoothly if the rules, or at least clearer guidelines or policies, on what these extension suite does and does not cover were made more concrete. This includes not only the 1:1 mapping but also the discussion around auto.
Personally, since these are standard KHR extension suite that many engines are likely to implement and use for various purposes, I think anything that is hard to implement or becomes complex on certain platforms (especially major ones), or is likely to cause performance issues in real-time runtimes, should be considered out of scope. If needed, we could consider a separate, lower-standard extension.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's going smoothly, all things considered. Honestly, the discussion around this proposal is what I'd expect the back-and-forth on extensions such as this to be.
I believe the extensions are pretty self-contained in terms of what they do and don't do; and the discussion up until now has improved the overall documentation and proposal as you've all asked for more clarity (which I appreciate). What other items are not clear; do you have any other examples that I can try to address?
In the case of 'auto', it currently is not present in the extensions (and hasn't been in any iteration).
In the case of 1-to-1 mapping versus 1-to-many; I've removed the latter for now as it's more appropriate to introduce it in a later phase (as part of the longer-term character extensions), with additional accompanying extensions for remapping/retargeting and informing motion systems. You're correct that there are potential performance issues in rare edge cases (like mapping every joint to every other joint). Do you see other performance concerns, and can you go into more details around the difficulties of implementing support for this?
|
|
||
| | Property | Type | Description | | ||
| |------------------|-------------|-----------------------------------------------------------------------------| | ||
| | `renderVisibility` | string | Controls camera-based visibility. Enum: `"always"` | `"firstPersonOnly"` | `"thirdPersonOnly"` | `"never"` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto exists in the VRM FirstPerson spec. Its behavior is to remove vertices that have a weight associated with the humanoid head at runtime.
https://github.com/vrm-c/vrm-specification/blob/master/specification/VRMC_vrm-1.0/firstPerson.md
There are several challenges to introducing auto in the KHR_avatar spec:
- We must associate the skeleton extensions with the mesh annotations extension to define which bone is
head. - We must recommend or define how the runtime should hide the head polygons.
- Having a separate mesh before exporting would be more efficient than hiding polygons at runtime, and this might be the recommended way as a standard.
However, it's true that many VRM avatars already depend on auto.
I believe that VRChat users also don't specify first-person mesh annotations by themselves. There is a component called VRCHeadChop that specifies which mesh should be hidden in addition to the head in the first-person view.
https://creators.vrchat.com/avatars/avatar-dynamics/vrc-headchop/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a little afraid of introducing Auto, as it introduces platform-to-platform differentiations (which perhaps is fine, but needs to at least be noted), even with recommendations. There's also then the challenge of runtimes then meeting creator expectations. That being said; I can see how this then makes the user workflow easier.
Totally understand that they don't denote it themselves for VRChat; that being said, I think having a component being added for this is functionally the same as them annotating the mesh (albeit on a higher level given it then happens in-engine).
Do we feel as though VRM could continue to have Auto as part of an VRM-specific extension on top of this one, or do we absolutely need to have it in this extension?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I checked the spec of auto in VRM FirstPerson, and I think it would be better not to include it in this extension. The implementation seems a bit too complex.
If my understanding is correct, there are issues such as:
- Heavy per‑vertex processing, which raises performance concerns in runtime environments like viewers.
- Potential complexity from mesh splitting, which could complicate other extensions as well as core glTF Mesh/Node processing.
- Cross‑extension dependencies (even within the KHR_avatar extension suite). Extensions without dependencies and higher independence are simpler and preferable.
Since this avatar/character extension suite seems to be aiming for standard KHR extension, I personally think it’s better to define a low‑complexity and easy‑to‑implement specification. As a result, the extensions can be more likely to be supported across many environments and see broader adoption.
If auto is desired, I personally feel it would be more appropriate as a vendor extension maybe that extends this one.
| { "target": "spine", "weight": 0.5 }, | ||
| { "target": "chest", "weight": 0.5 } | ||
| ], | ||
| "JawJoint": [{ "target": "jaw", "weight": 1.0 }] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the VRMC members points out that the jaw should no longer be included in the humanoid skeletons since it should rather be controlled by expressions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that depends on the implementation/use-case. I think it's totally fair to have a bone in the skeletal hierarchy (and an understanding as to how it maps) even with an expression that potentially powers it.
|
Thank you for the important proposal. I’ve long wanted a standardized and reusable humanoid skeletal definition in glTF, so this is a very interesting proposal for me. Since the proposal covers a lot and the discussion has already grown quite long, I may have missed some parts of the conversation, but I’d like to start by sharing a few of my thoughts. Apologies if any of this has already been outdated. This may tie into the topic of Avatar vs Character, but I’m wondering whether functionality that is not specific to avatars could be actively separated from the Avatar extension and defined as its own general-purpose extensions. Then those could serve as foundational extensions, with the Avatar extension built on top of them. If we first focus discussion and review on those foundational extensions, then move on to the Avatar-specific ones afterward, I believe we could progress through the specification process more smoothly. While the proposal is divided into phases, I personally feel that with so many extensions being proposed at once for Phase 1, it has become difficult to keep the discussion focused. I understand the intent was to gather broad feedback by sharing the proposal early, but for those who join later, the volume of information makes it quite hard to catch up with the ongoing discussion. Would it be possible to treat the general-purpose foundational extensions as a sort of “Phase 0”? |
While I generally agree with the idea that we should consider each component step by step, there seem to be cases where we should think about the final imagery that multiple extensions cooperate to prevent overlooking features essential to our purpose, like the mesh annotation: "auto" discussion: |
Absolutely no worries; we really have just started this conversation! Feel free to jump in with your concerns, even if they echo the concerns that have already been stated by others. Hearing back from the community will help us make informed decision/changes to the spec as it evolves!
I agree with this conceptually, but we'd need to have consensus that the extensions themselves are general purpose enough to be separated from the avatar extension set. Right now I'm not entirely sure what could be considered general purpose enough (other than perhaps the KHR_avatar_mesh_annotation extension). What portions of these would you consider as a general-use phase 0 set of extensions?
+1 to this; given the composition it might become harder to separate out than initially thought. |
| "jointBindPoses": [ | ||
| { | ||
| "joint": 0, | ||
| "matrix": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's likely a better way to represent this via an additional skin object with a different skeleton and InverseBindMatrices reference for the A/T/Custom Pose.
Realized that the formatting done on some of these areas prettified it in a way that is inconsistent with other extensions in the Khronos repo (new lines around the allOf fields). Fixed this.
|
|
||
| | Property | Type | Description | | ||
| |------------------|-------------|-----------------------------------------------------------------------------| | ||
| | `renderVisibility` | string | Controls camera-based visibility. Enum: `"always"` | `"firstPersonOnly"` | `"thirdPersonOnly"` | `"never"` | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What use cases is never intended for? For example, is it invisible by default, but shown by switching cosmetics? If that’s the use, then it isn’t camera-based, right?
I also wonder whether always is even necessary. The default behavior without this renderVisiblity setting is always visible, so always doesn’t add any new information.
For example, wouldn’t it be more convenient if this property handled only which camera types should not display it? And if, separate from this property, we allowed a simple visible/invisible setting on the Mesh, the intended use would be clearer. One advantage of doing it this way is that you could express a combination like initially invisible and hidden in FirstPersonCamera as well. With the current spec, that isn’t possible.
Proposal: (I think we can find better names)
visibility?: boolean;
cameraBasedVisibility?: 'invisibleInThirdPerson' | 'invisibleInFirstPerson'; // Either one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on us needing to structure this better and have better field names. I also question always/never and appreciate you bringing it up, as I've gone back-and-forth on them.
I think having the split as you suggest makes sense (it makes me think we'd then potentially also want a mesh primitive default visibility extension like the node visibility extension; but then would need to figure out how this ladders into it).
Having the default visibility as part of this then makes me think that cameraBasedVisibility would need an option for when something is visible in both (but has an initial invisible default visibility setting).
Edit: To elaborate on that last point; I think we'd just need "Both" if it's not split; as then I could see this getting used purely for mesh primitive default visibility
Functionally, one-to-many compositions like this are describing how to achieve a particular destination expression, and the way it’s written right now could technically achieve that; but is convoluted. It should be reversed:
<target expression> : { <source expression, weight>, <source expression, weight> }
It also doesn’t make sense to require them to add up to 1.0, as blendshapes don’t really behave that way (and if anything, the weights should reflect how much they contribute to the expression and the max value they should be).
I've made changes to make it clearer, and fixed up the schema (gave it the gltf. prepend and added proper detail as it was really a placeholder before).
Following up with fixing up the schemas elsewhere as well.
Realized some of the verbiage on this has led to some misunderstandings when reading the proposal. I've added some context above the example schema as well as rephrased the first section to make it less confusing; sorry for the churn!
| | Property | Type | Description | | ||
| | ------------ | -------- | -------------------------------------------------------------------- | | ||
| | `tags` | string[] | List of free-form labels applicable to this primitive | | ||
| | `customData` | object | Optional free-form object for runtime-specific annotations (optional) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the “tags” and “customData” properties essentially need to be separate? Would it be acceptable to combine them into a single property, where items that do not require associated data are set to an empty object? I feel that having fewer properties would be cleaner.
Example:
"annotations": [
"touchable": {},
"foo": {
"color": { r: 0, g: 0, b: 0 }
}
]There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be. That being said; we're delaying this extension and figuring out if we want to use EXT_structural_metadata (and if not, will be bundling this with a future extension set proposal around metadata/LODs/renderviews).
|
Hi all, Apologies for the lack of movement here. We've been doing scenario analysis around the current set of extensions to determine next steps for them. We've categorized the current set of extensions into Keep/Amend/Delay/Add. Keep:
Amend:
Delay:
Add:We're not sure when we'll add these; but we identified them as follow-ups for expressions
The above changes will happen in the near-ish future (our next meeting is on 12/15; so we'll be iterating on the amendments needed after that point). |
| { | ||
| "extensions": { | ||
| "KHR_character": { | ||
| "sceneIndex": 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have we already discussed the property naming sceneIndex? If not, I would prefer scene instead, following the property with the same name in the core spec.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We haven't! I actually think we might want to revisit using the scene index all together due to it being not well supported across the board (and instead use a root node index concept). If we decide against that, we can change it to just being scene
| "expression": "smile", | ||
| "animation": 0, | ||
| "extensions": { | ||
| "KHR_character_expressions_morphtarget": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "KHR_character_expressions_morphtarget": { | |
| "KHR_character_expression_morphtarget": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for catching this! Will update/fix
| "expression": "frown", | ||
| "animation": 1, | ||
| "extensions": { | ||
| "KHR_character_expressions_morphtarget": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "KHR_character_expressions_morphtarget": { | |
| "KHR_character_expression_morphtarget": { |
|
|
||
| This extension **does not animate morph targets directly**. It provides metadata only. | ||
|
|
||
| All morph target expressions should be driven using standard glTF animation channels, targeting the `weights` path on the corresponding node: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it also support /nodes/i/weights and /nodes/i/weights/j if the implementation supports KHR_animation_pointer?
Thought it would be easier to describe morph animations using /nodes/i/weights/j if the node has many morphs and the animation wants to control only a few morphs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You bring up a really good point; I think we likely need to change this extension to explicitely depend on KHR_animation_pointer.
While in in theory, we can get around this by providing guidance on how to utilize the 0'd-out animation frames for other expression morph targets; it would end up in a huge amount of wasted space animation-wise. Thanks for catching this!
| "myRig_hips": "hips", | ||
| "myRig_head": "head", | ||
| "myRig_leftFoot": "leftFoot", | ||
| "myRig_rightFoot": "rightFoot", | ||
| "myRig_leftHand": "leftHand", | ||
| "myRig_rightHand": "rightHand" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the key-value direction is opposite between KHR_character_skeleton_mapping and KHR_character_expression_mapping.
Also, we might want to use an object on the value side instead of a literal value to add extra properties in the future...? low confidence.
I leave a schema idea below:
"skeletalRigMappings": {
"vrmHumanoid": {
"hips": {
"node": 0 // if we are going to delay KHR_character_skeleton_biped, this is going to be a node index instead of a node name I believe
},
"spine": {
"node": 1, // we might make it optional for the sake of futureproof...? as how /animations/{}/channels/{}/target/node does
"extensions": { // an example that adds extra information to the mapping, not confident enough to give a more practical example
"KHR_character_skeleton_mapping_something": {
"something": [
{ "node": 1, "weight": 0.5 },
{ "node": 2, "weight": 0.5 }
]
}
}
}
}
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. I updated the expressions mapping and neglected to update this; but that definitely led to a weird pattern. I think reversing it to match the expressions mapping extension is likely the immediate next step.
I don't think delaying KHR_character_skeleton_biped means this has to use the node index instead; though perhaps it'd be better to have the index as well due to there being a potential for duplicate node names. Definitely something to talk about.
|
About Given the VRM1.0 spec compatibility, I noticed that
However, the spec also states that it can animate properties defined in
Would we assume that it is valid to mutate extension properties not mentioned in the Asset Object Model documentation if an implementation supports? |
This is a much harder one; we should discuss it more during the next TSG meeting. Personally I think this lies in the realm of "Yes, it's valid but your mileage may vary" given the example would be using the Extras objects (which is already caveated as being on a per-application basis). I'd assert that we'd likely want to recommend gracefully degrading if such a thing appears and can't be interpreted by the application. |
Addressing the issues 0b5vr pointed out
Delaying due to identifying whether we want to utilize and recommend EXT_structural_metadata for this purpose. If not we'd likely iterate on extensions with an upcoming LOD/Renderview extension set proposal
KhronosGroup#2512 (comment) Delaying and will add to the above-mentioned LOD/Renderview extension set proposal we'll be iterating on in the future.
|
New person joining in here: This effort has come to my attention and I wanted to add some thoughts on behalf of non-developers, designers, artists, and learning experience professionals. Lack of specific examples and lack of standardization as highlighted by @aaronfranke and others on this proposal hold back forward momentum in the entire industry. As a beginner avatar maker, it is frustrating to make a product that, only after development, I discover doesn't work as designed in a platform. Tutorials, instructions, and deep technical pools don't help when what is needed is a standardized cross-platform clear set of expectations. I realize that this entire proposal essentially agrees with this; I'm preaching to the choir. But a milquetoast approach of holding off on specifics when Khronos is properly authorized to set a standard is, in my opinion, a laissez-faire method of trying to not offend or cut off anyone. But we know where this road leads. As the phrase goes "One size fits all means that it essentially fits none". Also, a word for our users, those that "wear" our avatars. Two decades of experiences in XR experiences rarely (yes sometimes but rarely!) return impactful experiences within a scene or with a view. But users do remember their avatars. They take selfies. They role play. They escape their physical boundaries. A user chooses an avatar and then the avatar shapes the user. I don't think it can be understated how important avatars are to their users. The future of all XR experiences is hindered if a user can't easily chose to be a book, a dragon, or a Pharaoh. Clear standardization will grease the tracks for more avatar creators, which opens more XR doors for everyone. |
|
@hobbs-Hobbler Totally understand your frustration. It's why we're iterating on this and future sets of extensions; due to any lack of standards (even in terms of basics on how to convey them). We're currently working with other contributors in the Metaverse Standards Forum on the topic; focusing first on the ongoing translation framework work going there (for now skeletal, followed by expressions. Link here to Meshula's RCSF). The hope is that will ladder up into some form of standardization that we agree on across SDOs/organizations/companies/communities, that we can incorporate into our extension sets. Re: Examples; this diff is currently in the proposal stage and being iterated on; once we make the changes being discussed in recent conversation I expect that we'll then be producing example assets for reference.
I'd recommend you watch the recent Metaverse Standards Forum Characters Town Hall that we assisted in organizing. I talk about the philosophy behind our decisions here (and you'll hear the perspective of other groups around character/avatar standards as well). |
|
About I believe that glTF spec does not describe how we are going to play two or more animations at once. If node rotation animations are mutually exclusive, we have to prepare two joint nodes for each rotation axis of the eye movement. Like, when we have a model with a single bone for each eye, if we set 1 to both I personally am okay with defining it as is, but does this match your intention? |
I noticed that bone rotation animation might be mutually exclusive if we want to apply two or more at once. Left a comment on the spec PR: KhronosGroup/glTF#2512
…KHR_animation_pointer Based on 0b5vr's feedback, it became more obvious that we should have this extension rely on KHR_animation_pointer so we could animation explicit weights indices rather than the whole property. I've updated the README to reflect this.
Adding IPose pose type, as well as modified the bindpose extension to enable multiple poseType/bindpose definitions
After discussing it with the TSG; switching to the root node index makes more sense here given scenes aren't really well-supported across the board. Also added new contributors to the extension
…th KHR_character_expression_mapping It was pointed out that KHR_character_skeleton_mapping should align more with KHR_character_expression_mapping in terms of the keys being the target names, and the values being the source joint names; so I've updated it to do so.
|
I should also leave the link here; I published my testbed for checking compatibility between VRM and KHR_character. It includes a conversion script from VRM1.0 to the current KHR_character draft spec. The main motivation is to support the development and discussion of the KHR_character extension by applying it to real-world models. https://github.com/0b5vr/khr-character-testbed |
|
from the Governance Team of the Hubs Foundation: The Governance Team of the Hubs Foundation has discussed this. We know that most avatars are designed using different conventions on how to name bones than the convention we currently use. We concluded: if there are standard names for bones and how they connect, it would be a priority for us to implement. If there aren't, we would only implement sets of bone names and connections once they had a large pool of avatars our users could draw on. The Our current needs are modest, and almost any reference rig would work for us. Of the reference rigs we've looked at, the Reference Canonical Skeleton Framework Architecture appears to us to have the best balance of clarity and extensibility. |


Pull Request Draft: KHR Character and Avatar Extension Set – Phase 1
Summary
This PR introduces the initial suite of KHR_character and KHR_character_avatar extensions, a collaborative effort between the Khronos 3D Formats Working Group and the VRM Consortium (VRMC). These extensions aim to provide a structured and interoperable foundation for representing character and avatar models in the glTF ecosystem across platforms, runtimes, and tooling pipelines.
Motivation
Characters and Avatars have emerged as a core content primitive in real-time applications including gaming, virtual reality, social communication, streaming, and telepresence. Yet, there has been a lack of standards around them to express key character/avatar-specific behaviors such as:
To address this, the Khronos 3D Formats Working Group and VRMC have collaboratively designed a modular set of extensions for avatar assets. With these, we aim to provide creators and developers with a standard representing an expectation in data and functionality per-avatar that can be used across platforms; like building blocks.
Characters
A 3D model representing a potentially interactable, controllable, and/or generally animatable entity.
Examples include user avatars, characters in animated entertainment, NPCs in games controlled by behavior systems, virtual agents embodying a character, etc.
Avatars
A type of Character which is embodied and controlled by a user, representing that user’s identity.
Examples include user-driven characters in third-and-first person experiences, VR scenarios with full-body embodied avatars, 2D telecommunication scenarios with embodied characters representing other users, etc.
Phase 1 Scope
The extensions in this PR represent the Phase 1 Extensions (and more); as outlined in the Khronos Avatar Extensions Working Document.
Core Extensions
KHR_character– Root-level flag denoting a character model glTF asset.Update 12/12 - The below was delayed to a future extension set proposal
KHR_character_avatar– Extension built on top of KHR_character denoting a model is intended for use as an avatar.Expression Extensions
Expressions in this context describe face-localized animations used to drive small and/or larger movements across the face and/or down-chain meshes needed for reasonable conveyance of emotion/intent.
For examples of relevant types of expressions, you can reference concepts such as:
Expression Extensions - Core
All rely on standard glTF
animations, and target different control domains:KHR_character_expression_morphtargetsKHR_character_expression_jointKHR_character_expression_textureCharacter expressions conform to 0-to-1 float values on top-level as ‘drivers’ in a similar way to how morph targets are typically used.
Any key-framed animation (whether joint, morph target, or animation_pointer) still relies on the 0.0-to-1.0 float property; which then interpolates between the N keyframes (e.g. For N keyframes, treating the 0-index keyframe as 0.0, and the Nth keyframe as the 1.0 state. For 1 keyframe, it interpolates between the target model property at rest and that keyframe).
Regardless of the above, we still want to respect creator-defined default values for the morph targets. This may result in some clipping/mesh weirdness when being driven in an experience, but it’s the trade that an avatar creator makes.
For properties represented in the animations not covered by the animation expression extension type:
For each expression-mapped animation; it checks what extensions are present to inform what channels are expected to be animated or not.
As an example, if an animation contains a weight channel, but there’s no *expression_morphtargets extension; the expectation would be that it won’t animate that channel.
Expression Extensions - Metadata-adjacent
KHR_character_expression_procedural- Provides context whether an expression expects to be driven procedurally or not (and then, if so, what the creator's preferred method of handling it is)KHR_character_expression_mapping– Common expression vocabulary normalizationSkeleton/Rig Extensions
KHR_character_bindpose– Declares original skeletal bindpose metadataKHR_character_skeleton_mapping– Maps joints between arbitrary rigs (1:1)Update 12/12 - The below was delayed to a future extension set proposal
KHR_character_skeleton_biped– Declarative semantic labeling of a bipedal skeletonGeneral Extensions
The below are added as part of this as they provide value to characters and avatars, but are not tied enough to be in the direct namespace. We are still presenting them as part of phase 1 due to our belief they are a net-positive add.
Mesh Annotation Extensions
Update 12/12 - The below were delayed to a future extension set proposal
KHR_mesh_annotation– General-purpose semantic tags per mesh primitiveKHR_mesh_annotation_renderview– Describes render-time visibility for first and third person view modesVirtual Transform Extension
KHR_virtual_transform– Runtime targets for attach points, look-at control, etc.Design Principles/Philosophies
Modular and layered: Aiming for extensions in similar categories to be built on each other where it makes sense, and independent of one another where it doesn’t. Layering on top of the baseplate extensions (KHR_character for character specific and KHR_character_avatar for avatar-specific functionality) where possible when the functionality makes sense only in the context of a character and/or avatar.
Aiming to be compatible with VRM and other avatar ecosystems, with an overall goal in this phase to not unnaturally force existing avatar systems to conform to vocabularies and hierarchies.
Enabling a self-describing character/avatar - extensions indicate what data types are contained and need to be driven to animate/embody/power the character upon being loaded. Additionally, mapping extensions and other proposed metadata extensions are used to assist in enabling the character’s general compatibility with a loaded-into runtime.
Recurring patterns - For example, with the expressions extensions the goal was to create a recurring expectation as to how to access the animation channels/fields. With these extensions (and future ones), all expression channels utilize glTF’s animation model (
weights,rotation,translation,scale, andKHR_animation_pointerfor other properties).Enabling VRM to adopt these extensions out of the gate. We have a longer-term convergence plan, and VRM utilizing the proposed extensions here as needed is part of it.
Future Work
Phase 2 will contain more sets of functionality that the community will help inform! Right now, here are some topic areas we're thinking of:
In parallel to Phase 1 and Phase 2, we’re currently engaging with the Metaverse Standards Forum and AOUSD in order to start conversations around potential standards for rigs/expressions (around vocabularies). These are, by nature, longer conversations.
Because of that, we’re going to continue to make progress on this set of extensions (in phase 1 and 2), and keep in mind the design that it should be able to support any future standards around vocabularies and rigs.
Open Questions to the Community
We would love feedback on several topic areas!
Expression-Based Mesh Activation
Should activation/deactivation of nodes or submeshes based on expressions (e.g., switching between open/closed eyes or toggling glasses) be included in the expression extension family, or separated into a visibility/scene control extension?
Ideally, we'd then be utilizing the in-progress
EXT_node_visibilityproposal.Fallback Skeleton Mappings
**TL;DR - We're going to be working with AOUSD and the Metaverse Standards Forum and foster discussion on industry rig/expression vocabularies. Because of this, we don't believe we should enforce this in this phase.
Discussion Link 1 Discussion Link 2**
Should we fallback to any known skeleton mappings when the skeleton_mapping extension is used on the DCC/Content Tooling side? Is VRM's Humanoid something the community feels okay with ratifying, or is the Unity origin something that makes that less desirable? Should we fall back to the OpenXR Skeleton definition?
VRM 1.0 Humanoid
OpenXR Skeleton
LookAt Implementation: Separate or Virtual Transform?
Should a standardized
LookAtconstruct (like VRM’s LookAt extension) be:Both options are currently viable.
Avatar vs Character Namespaces
With feedback from the community, we’ve made the decision to transition to having KHR_character be the base namespace, with KHR_character_avatar denoting a model is an avatar!
Are we missing anything obvious for our V1 set of Extensions proposed here?
We'd love to hear back as to whether we should consider adding more extensions overall to this proposal; or if any changes are needed!
References
License
All extensions in this PR are licensed under the Khronos Group glTF Extension License.