Skip to content

Conversation

coldhex
Copy link

@coldhex coldhex commented Jun 8, 2025

  1. Use barycentric coordinates to interpolate depth values. Linux, Mesa and AMD Radeon RX 6600 with Vulkan driver currently has quite poor interpolation precision, see video and tests below. This PR replaces GPU interpolation with a manual one. Also note that current Xemu w-buffer interpolation uses gl_FragCoord.w which can't reproduce all w-values, e.g. 1.0f/16777046.0f equals 1.0f/16777047.0f in 32-bit floats and so inversion can't reproduce both 16777046.0f and 16777047.0f. Also, depth value differences are used in interpolation, which has the desired property that a triangle with the same z-value on all vertices will result in exactly that same z-value when interpolated. At least the game Shenmue II sky rendering relies on this. Fixes Shenmue II: Flickering sky #2049. Fixes Super Monkey Ball Deluxe: cutscenes not rendering #472.

  2. Computes polygon depth bias slope for both z-buffering and w-buffering. These are computed by taking the max and abs of partial derivatives of either of the functions z=z(x,y) or w=w(x,y), where x,y,z,w are screen-space coordinates. This matches Xbox hardware for z-buffering where the partial derivatives are constants over any fixed triangle. However, for w-buffering the partial derivatives vary over any fixed triangle, but Xbox appears to compute just a single depth slope at the first visible pixel (where "first" means something like first in top-left order) and uses that over the whole triangle. The way the PR computes the partial derivatives is by using the chain-rule, e.g. dw/dx = -w^2 * d(1/w)/dx. This is useful since 1/w is linear in screen-space and therefore d(1/w)/dx is constant over any fixed triangle. However, finding out the w-value for the first visible pixel of a triangle is difficult in OpenGL/Vulkan and is not done here. Instead we calculate depth slope per-pixel. Fixes The Chronicles of Riddick: Z fighting flickering #2041.

  3. Since the depth buffer calculation modifications above require modifying geometry shader code and outputs, this PR also fixes some flat shading and polygon line mode bugs since keeping them as is with the new modifications would have been at least as much work.

Relevant additions to nxdk_pgraph_tests are here:

I tested using Radeon RX 6600. The following video shows the depth precision issue (which is different from the depth slope z-fighting issue and is also fixed by PR) with Xemu 0.8.67 and Chronicles of Riddick (where depth value inaccuracy causes problems with shadow volume calculations):

xemu0867-radeon-riddick-depth.mp4

Radeon OpenGL is a little better than Vulkan. (Btw, this inaccuracy issue doesn't occur with Intel UHD 770.) I reproduced one of the triangles in an nxdk_pgraph_test 24-bit integer W-buffering test. The following is with Vulkan:

WBuf24D_Riddick_V1_ZB0_ZS0

OpenGL and Xemu 0.8.67 produce depth values closer to Xbox HW values, but not close enough as the video above shows. That was Radeon, but I also tested Xemu 0.8.67 on Intel UHD 770 and it matches Xbox HW in this test, so Intel's hardware interpolation seems to be of better quality than AMDs.

Regarding the new depth slope implementation for W-buffering, here are tests for the case of 24-bit integer W-buffer. These use slope factor 65536.0, which is much larger than games would typically use (e.g. Riddick uses 1.0), but it makes it possible to see how the implementation in PR doesn't produce exactly the correct values. Note how Xbox HW appears to compute the slope offset value only once at first visible pixel. The quad is exactly the same among ClipF tests and exactly the same among ClipW tests, only window clipping changes (on the red line):

WBuf24D_ClipF-150-032_V1_ZB0_ZS1
WBuf24D_ClipF-150-128_V1_ZB0_ZS1
WBuf24D_ClipF-150-224_V1_ZB0_ZS1
WBuf24D_ClipW-159-000_V1_ZB0_ZS1
WBuf24D_ClipW-261-000_V1_ZB0_ZS1
WBuf24D_ClipW-363-000_V1_ZB0_ZS1

(Also note in ClipW tests how the last W-value 0x30CB5 doesn't change. This is because Xbox draws the quad as two triangles and the last W-value belongs to the second triangle which doesn't need clipping and therefore its depth slope is unaffected.)

To further show curious Xbox HW behaviour, the following tests draw the same triangle in 1-pixel offsets. TriH tests draws the triangles from left to right with 1-pixel vertical drops. TriV tests draw from top to bottom with 1-pixel shifts to right. The cycling with period 4 is due to the 1-pixel shifts only.

WBuf24D_TriH_V1_ZB0_ZS0
WBuf24D_TriH_V1_ZB0_ZS1
WBuf24D_TriV_V1_ZB0_ZS0
WBuf24D_TriV_V1_ZB0_ZS1

From this, I'm guessing NV2A rounds the first pixel to some point in a 4x4 pixel block and samples the depth slope once for the whole triangle. Sampling once in a such a manner is rather difficult to implement in OpenGL/Vulkan graphics pipeline so PR doesn't even try and implements slope calculation per-pixel instead.

For Z-buffering, depth slope calculation in this PR is about the same as Xemu 0.8.67. However, depth offset improves since, for whatever reason, Mesa adds the offset doubled:

ZBuf24D_FloorQuad_V1_ZB0_ZS0
ZBuf24D_FloorQuad_V1_ZB0_ZS1
ZBuf24D_FloorQuad_V1_ZB1_ZS0

Depth offset for lines is fixed in this PR. This depends on if V_PGRAPH_SETUPRASTER_POFFSETLINEENABLE is set. Xemu 0.8.67 applies the offset if any POFFSET is enabled, so PR improves on that, e.g.:

WBuf24D_LineStrip_V1_ZB0_ZS0
WBuf24D_LineStrip_V1_ZB1_ZS0

For the flat shading and polygon line mode changes, here are test cases where PR makes improvements: (I include some test cases where also Xemu 0.8.67 matches HW)

ProgLM_TriFan_Flat_First
ProgLM_TriFan_Flat_Last
ProgLM_Tri_Flat_First
ProgLM_Tri_Flat_Last
ProgLM_TriStrip_Flat_First
ProgLM_TriStrip_Flat_Last
ProgLM_TriStrip_Smooth_First
ProgLM_TriStrip_Smooth_Last
Prog_TriFan_Flat_First
Prog_TriFan_Flat_Last
Prog_Tri_Flat_First
Prog_Tri_Flat_Last
Prog_TriStrip_Flat_First
Prog_TriStrip_Flat_Last

@Triticum0
Copy link
Collaborator

Triticum0 commented Jun 9, 2025

Should fix. #2204 #2191 #2228 #2038 . Also fixed madden series but never created an issue for it. Thought this would solve the remaining z-fighting issues but seem to be unrelated. #1020

@Triticum0
Copy link
Collaborator

Triticum0 commented Jun 9, 2025

Also a couple games regressed from your previous pr, #2213. Also marvel vs capcom, metal slug 5 where also effected. In my extensive testing I never came across any issues unfortunately before pr was merged but hopefully you know now. Let me know if you want me to open an issues for it. Lastly keep up the great work.

@coldhex
Copy link
Author

coldhex commented Jun 9, 2025

@Triticum0 Yes, a screenshot of Metal Slug 5 would be nice since the Beat Down regression doesn't happen with my hardware.

@Triticum0
Copy link
Collaborator

Triticum0 commented Jun 9, 2025

Here you go. My guess it nivida exclusive I need a build renderdoc debuh build of lastest realese. once I do I see what I can do.

Metal slug 5
xemu-2025-06-09-22-20-17

MvsC 2
xemu-2025-06-09-22-22-56

Edit: Tested with your suggestion and seem to fix the issue.

@mborgerson
Copy link
Member

Very cool, thanks! It may take me a few days to get time to review.

@Triticum0
Copy link
Collaborator

@coldhex If you have time might be worth look at the remain z-fighting issuses I know matt has Alias so you could start there or outrun 2 as z-fighting easily noticeable and could be related to this pr #1020

@coldhex
Copy link
Author

coldhex commented Jun 12, 2025

Thanks for the screenshots. Metal Slug 5 doesn't have such artifacts on Radeon (or Intel UHD 770). (On Beat Down issue, I commented that I now noticed there is an artifact present on Radeon, but less than on GeForce.) Looking at MS5 in renderdoc, it too renders the "Mission ..." text using rectangles at half-pixel coordinates and nearest-neighbor texture atlases, like Beat Down. So they have the same cause.

OutRun 2 has z-fighting on distant buildings even on Xbox HW. Seems to be caused by the usual low precision of z-buffering at distances far from the camera. This PR may (or may not) reduce the z-fighting to be closer to HW, but in general PR (or any Xemu version) does not produce exact HW z-buffer values. There may be small differences. Additionally, different floating-point rounding mode in NV2A and in modern GPUs cause differences in vertex program output already. In one nxdk pgraph test (one of texture shadow comparator), there is a +3 difference on vertex program z-coordinate output caused only by floating-point rounding mode (round-towards-zero vs round-to-nearest). To obtain exactly the same z-buffer values floating-point arithmetic would have to be exactly the same as in NV2A and also exactly the same interpolation algorithm and order of calculations would need to be followed (whatever those are.)

@antoniodesousa
Copy link

I tested Metal Slug 5 and Beat Down on my AMD RX 6600 GPU and I don't have those lines.

@mborgerson mborgerson force-pushed the pr/depth-precision branch from 5016691 to 7678284 Compare June 23, 2025 11:15
Copy link
Member

@mborgerson mborgerson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a fun review--thanks! Once again, thanks also for providing test cases and comparisons against Xbox/current xemu. I took the liberty of rebasing the patch set onto current master and fixing the conflicts. Left a couple of questions in review before merge

" EndPrimitive();\n";
} else if (polygon_mode == POLY_MODE_LINE) {
need_linez = true;
/* FIXME: input here is lines and not triangles so we cannot
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you think we should handle this?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think n-polygon in line mode should be changed to use a triangle fan instead of a line strip (or line loop in OpenGL.)

if (polygon_mode == POLY_MODE_LINE) {
return VK_PRIMITIVE_TOPOLOGY_LINE_STRIP; // FIXME
} else if (polygon_mode == POLY_MODE_FILL) {
return VK_PRIMITIVE_TOPOLOGY_TRIANGLE_FAN;
}

There isn't any built-in way for a geometry shader to detect the last triangle in a triangle fan, but one possibility is to add a new boolean vertex attribute such that it is true only for the last vertex of a triangle fan. The geometry shader could then detect the last triangle using that boolean and complete the boundary line loop.

@@ -793,12 +789,10 @@ static void create_pipeline(PGRAPHState *pg)
VkPipelineRasterizationProvokingVertexStateCreateInfoEXT provoking_state;

if (r->provoking_vertex_extension_enabled) {
// TODO: remove use of provoking vertex extension since we just want
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to do it now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Setting last vertex convention is actually necessary, because Vulkan default is first vertex (unlike OpenGL where default is last vertex.) I fixed the comment.

I also added a new fixme comment to the geometry shader. I'll explain this in more detail here. Suppose a triangle strip's vertex data are denoted by the following IDs: 1 2 3 4 5 6. When the last vertex convention is used, the sequence of four input triangles received by the geometry shader could be reasonably assumed to be:

Case A:
123
324
345
546

The last vertex for these triangles matches with the last vertex convention for the strip. (The second and fourth triangles had two vertices swapped to get correct winding order.) On the other hand if the first vertex convention was used instead, a more reasonable triangle vertex order would be:

Case B:
123
243
345
465

With this ordering, strip first vertex convention and triangle first vertex convention still match.

At least AMD Radeon RX 6600 and Nvidia GeForce GTX 1650 behave as above. However, Intel UHD 770 and Intel HD Graphics 630 always use Case A ordering regardless of the vertex convention setting. In the PR geometry shader, Case A ordering is assumed, so flat shading works for all these GPUs. However, I have now tested with the much older Intel HD Graphics 4000 (integrated in Ivy Bridge CPUs) and it always uses Case B ordering regardless of the vertex convention setting and therefore PR gets the flat shading colors wrong for every other triangle in strips. Same problem of implementation dependence also exists in Xemu master in line mode flat shading and n-polygon flat shading (but for e.g. triangle strips with fill mode, master doesn't use the geometry shader at all.)

OpenGL and Vulkan only specify winding order, but absolute vertex order is implementation dependent. OpenGL specs leave the reader wondering if the order is specified or not, but Vulkan docs mention this explicitly:

https://github.com/KhronosGroup/Vulkan-Docs/blob/481dd347dae6cac4bf4ef8dcfae41789ab39f650/chapters/geometry.adoc
"Vertices may be in a different absolute order than specified by the topology, but must adhere to the specified winding order."

Since there isn't a built-in way to detect how GPUs order input triangle vertices in geometry shaders, one possible fix would be to add an integer vertex attribute (along with texture coordinates etc.) giving the original vertex index. (The boolean vertex attribute for a triangle fan's last vertex could be combined with this attribute such that a negative index indicates the last vertex.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now tested with the much older Intel HD Graphics 4000 (integrated in Ivy Bridge CPUs) and it always uses Case B ordering regardless of the vertex convention setting and therefore PR gets the flat shading colors wrong for every other triangle in strips.

Thank you for testing with a variety of hardware, much appreciated! I'm a bit concerned about the impact for merging as-is with this caveat in mind

one possible fix would be to add an integer vertex attribute (along with texture coordinates etc.) giving the original vertex index

Would you be willing to do this for this pr?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one possible fix would be to add an integer vertex attribute (along with texture coordinates etc.) giving the original vertex index

Would you be willing to do this for this pr?

Yes, I can do that. It would involve these changes:

  1. Original vertex index can't be a vertex attribute per se, because the maximum is 16 and those are already all taken for NV2A. With OpenGL 4.0, Buffer Textures can be used and indexed by the built-in gl_VertexID variable in shader. With Vulkan, either Uniform Texel Buffer or Storage Buffer.
  2. The functions pgraph_gl_flush_draw and pgraph_vk_flush_draw need to be modified, especially for Inline Elements, since the draw command indexes a vertex array and the vertex index available to shader must be basically the index of the index array instead of the index into the vertex array. This involves unindexing and copying vertices into a new array so that datum of each vertex is separate. This also needs to be checked for Draw Arrays since at least in principle the start&count ranges could overlap. The Buffer Texture needs to be updated for each draw command.

Simpler would be to not use a Buffer Texture and only do vertex array copies similar to point 2 above. Then gl_VertexID in shader could be used by itself for flat shading vertex ordering, but it would be insufficient for closing line loops and n-polygon boundary loops, since gl_VertexID alone cannot be used to detect which primitive (e.g. triangle in a triangle fan) is last.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because the maximum is 16

On what hardware? I think GL specifies a minimum of 16, but it is often higher? We may still be able to use this approach.


As an aside, I've been considering alternative approaches to the current geometry shader solution, mostly because MoltenVK doesn't support them and it would be nice to be able to start using Vulkan on macOS. The straightforward solution there is rewriting primitive on the host, which aligns with the copy approach you've described.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On second thought, if it's just ancient Intel HD4000 that's affected by this, I'm open to deferring this robustness improvement to the future if this effort to correct these primitive cases is going to be substantial

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On what hardware? I think GL specifies a minimum of 16, but it is often higher? We may still be able to use this approach.

Radeon OpenGL has 32, but Intel integrated and GeForce are still at 16 at least with Mesa on hardware what I tested. Online searching gives results of users posting GeForce OpenGL parameters and it always seems to be 16. GeForce GTX 1650 has 32 with Vulkan.

Anyway, I rebased the PR and added a new commit which implements point 2 above (and this fixes HD 4000). It doesn't need a new vertex attribute and just uses gl_VertexID (and gl_VertexIndex for Vulkan). The new vertex attribute would be needed for closing n-polygon line mode boundary. I initially thought the same approach could be used for plain line loops. That would almost work, but not for the special line loop having only two vertices since triangle fan requires at least 3. So fixing Vulkan line loop needs writing an additional vertex for the drawing command anyway and if that is done in the future then might as well add an additional vertex for n-polygon line mode too.

I'll mark this PR draft for a while. Copying inline elements to a new array seems to work fine and at least Halo 2 didn't show any noticable framerate drop for me, but I'll test this more and run all the nxdk_pgraph_tests. I don't know in what CPU Intel changed triangle vertex order to what it is in UHD 770 (and 630 is the same), but who knows if they might go back to HD 4000 style ordering in the future.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rebased this PR and reworked the last commit. (The old solution is still available in the branch https://github.com/coldhex/xemu/tree/deprecated/pr/depth-precision)

Instead of modifying and copying vertex data, this now tests and detects geometry shader vertex order during initialization. Since (likely all) GPUs are consistent with their input triangle vertex ordering in geometry shaders, this seems to me sufficient and the better solution. (The previous solution would work even if the vertex ordering varied randomly at runtime.)

The detected vertex ordering is printed during initialization. For Radeon it prints "0, 0, 0, 0" indicating that no vertex reordering is needed. For Intel it prints "0, 0, 0, 2" indicating that triangle fan vertices need reordering.

@Triticum0
Copy link
Collaborator

Should fix #2191

@coldhex
Copy link
Author

coldhex commented Jun 24, 2025

Another thing about the PR: since this uses geometry shaders for triangles, triangle strips and fans, it adds to the stop-the-world pauses when shaders are compiled. Since geometry shader code generation depends only on a relatively few variables such as primitive type and polygon mode, there aren't all that many geometry shader code variations. However, currently geometry shaders are compiled every time there is a new vertex or fragment shader. It might be worth to cache geometry shaders separately to avoid recompiling the same geometry shader code.

@Triticum0
Copy link
Collaborator

Fixes the z-fighting in madden series as well fixes z-fighting in Pariah look at the gun.
https://www.youtube.com/watch?v=9ki3JeHnRKA&list=PLmPVsOuxTsQJpSGHGa1YFSWRqnUohZSrt&index=3

@mborgerson
Copy link
Member

mborgerson commented Jun 26, 2025

It might be worth to cache geometry shaders separately to avoid recompiling the same geometry shader code.

Agreed. Vertex, geometry, and fragment shaders should probably be cached independently. We'll defer it for now Done in #2324

@Triticum0
Copy link
Collaborator

Triticum0 commented Jun 28, 2025

@coldhex while I was testing some games I came across a where black triangle would cover most of the screen seem to stumble across it on multiple games and think I track it down to regression starting in PR: Perspective-correct interpolation for w-buffering it shows like this.

Bionicle - The Game
xemu-2025-06-29-00-23-50

Carmen Sandiego: The Secret of the Stolen Drums 2262
Image

ultimate beach soccer
xemu-2025-06-30-01-39-39

Only found out it was a regression today as wasn't very reproducible as only show up randomly when loading into the level and fixed itself when exiting levels. Only Bionicle - The Game was reproduceable so could only test now I found it good test case.

No rush only metioning it as this pr doesn't fix the issue. Also simiar issue in Counter Terrorist Special Forces don't know if it also related

@coldhex coldhex force-pushed the pr/depth-precision branch from af3d3bd to 5b0dde9 Compare June 29, 2025 22:11
@coldhex coldhex marked this pull request as draft June 29, 2025 22:25
@coldhex
Copy link
Author

coldhex commented Jun 29, 2025

@coldhex while I was testing some games I came across a where black triangle would cover most of the screen seem to stumble across it on multiple games and think I track it down to regression starting in PR: Perspective-correct interpolation for w-buffering it shows like this.

Bionicle and Counter Terrorist Special Forces seem to give garbled input data to a vertex shader. For the latter game, when exporting vertex data bytes to a file from Renderdoc, the data even contained an English word in ascii in it even though it is supposed to be coordinates and RGB values. Bionicle seems to have corrupt data too and possibly always has, but the regression might be caused by a bug in this if-condition:

"float clampAwayZeroInf(float t) {\n"
" if (t > 0.0 || floatBitsToUint(t) == 0) {\n"
" t = clamp(t, uintBitsToFloat(0x1F800000), uintBitsToFloat(0x5F800000));\n"
" } else {\n"
" t = clamp(t, uintBitsToFloat(0xDF800000), uintBitsToFloat(0x9F800000));\n"
" }\n"
" return t;\n"
"}\n"

If the floating-point number t is a positive subnormal number (from garbled passthrough input data w-coordinate), then t>0.0 is false and so is the second condition and therefore t gets clamped using the negative range and sign is flipped. Flushing subnormals to zero is what GPUs do and this can be taken into account by replacing the if-condition with e.g. "floatBitsToInt(t) >= 0". I'll do a PR about this later. The garbled input data is a separate issue. I don't know if that is Xemu or bug in game itself.

@Triticum0
Copy link
Collaborator

Seem the bug with ultimate beach soccer fixed with this pr.

coldhex added 3 commits July 7, 2025 22:29
This fixes nxdk_pgraph_tests W_param tests a bit more. Note that the
geometry shader approach in Xemu for polygon line mode doesn't currently
implement face culling. It could be improved by using the built-in
gl_FrontFacing variable in the geometry shader.
…ctor

1. Use barycentric coordinates to interpolate depth values. Linux, Mesa
and AMD Radeon RX 6600 with Vulkan driver currently has quite poor
interpolation precision, which results in artifacts in at least Chronicles
of Riddick. Intel integrated UHD 770 has much better precision, for
example. This commit handles depth interpolation manually. Also note that
the previous w-buffer interpolation used gl_FragCoord.w which can't produce
all w-values, e.g. 1.0f/16777046.0f equals 1.0f/16777047.0f with 32-bit
floats. This also uses depth value differences in interpolation which has
the desired property that a triangle with the same z-value on all vertices
will result in exactly that same z-value when interpolated. At least the
game Shenmue II sky rendering relies on this.

2. Computes polygon depth bias slope for both z-buffering and w-buffering.
These are computed by taking the max and abs of partial derivatives of
either of the functions z=z(x,y) or w=w(x,y), where x,y,z,w are
screen-space coordinates. This matches Xbox hardware for z-buffering where
the partial derivatives are constants over any fixed triangle. However,
for w-buffering the partial derivatives vary over any fixed triangle, but
Xbox appears to compute just a single depth slope at the first visible
pixel (where "first" means something like first in top-left order) and uses
that over the whole triangle. This commit computes the slope per-pixel.
The way to compute the partial derivatives is by using the chain-rule, e.g.
dw/dx = -w^2 * d(1/w)/dx. This is useful since 1/w is linear in
screen-space and therefore d(1/w)/dx is constant over any fixed triangle.
But, as mentioned, finding out the w-value for the first visible pixel
of a triangle is difficult in OpenGL/Vulkan and is not done here. Instead
we calculate depth slope per-pixel.
Xbox draws lines in polygon mode without trying to avoid overlaps, e.g.
internal edge lines are drawn twice for triangle strips and fans. This is
evidenced by using additive blending and also stencil adds. This commit
removes the gl_PrimitiveIDIn==0 checks which were there to avoid drawing
lines twice.

This commit also implements flat shading first/last provoking vertex
handling. This fixes triangle strip and fan flat shading in
nxdk_pgraph_tests shade model tests.
@coldhex coldhex force-pushed the pr/depth-precision branch from 5b0dde9 to 0f611d9 Compare July 8, 2025 11:17
@coldhex coldhex marked this pull request as ready for review July 8, 2025 11:26
@Triticum0
Copy link
Collaborator

Triticum0 commented Jul 8, 2025

Here Shader winding for 30 series of graphic cards.
VK geometry shader winding: 0, 0, 0, 0
GL geometry shader winding: 0, 0, 0, 0

Test Vega 8 on integrated graphics 7730U and also same as above.

Also got this on 3060 TI when looking at the log
Could not determine triangle rotation, got color: R=0, G=129, B=190
Could not determine triangle strip0 rotation, got color: R=0, G=171, B=169
Could not determine triangle strip1 rotation, got color: R=84, G=171, B=86
Unexpected inconsistency in triangle fan winding, got colors: R=0, G=169, B=171 and R=84, G=86, B=171

Test was done just on xbox boot logo.

@coldhex
Copy link
Author

coldhex commented Jul 8, 2025

Also got this on 3060 TI when looking at the log Could not determine triangle rotation, got color: R=0, G=129, B=190 Could not determine triangle strip0 rotation, got color: R=0, G=171, B=169 Could not determine triangle strip1 rotation, got color: R=84, G=171, B=86 Unexpected inconsistency in triangle fan winding, got colors: R=0, G=169, B=171 and R=84, G=86, B=171

Thanks for testing! Which one is that, OpenGL or Vulkan? Those color values indicate that something is wrong the rendering. It's just supposed to render some triangles to an offscreen image and those are RGB values are invalid.

@Triticum0
Copy link
Collaborator

Opengl

@coldhex coldhex force-pushed the pr/depth-precision branch from 0f611d9 to 7cec10d Compare July 8, 2025 18:01
@coldhex
Copy link
Author

coldhex commented Jul 8, 2025

I added error code checking and printing for OpenGL and also a glFinish call (which should be unnecessary), but nothing in particular sticks out. @Triticum0 could you try it again and see what error it gives?

@Triticum0
Copy link
Collaborator

The error code the same
Could not determine triangle rotation, got color: R=0, G=129, B=190
Could not determine triangle strip0 rotation, got color: R=0, G=171, B=169
Could not determine triangle strip1 rotation, got color: R=84, G=171, B=86
Unexpected inconsistency in triangle fan winding, got colors: R=0, G=169, B=171 and R=84, G=86, B=171
GL geometry shader winding: 0, 0, 0, 0

@coldhex
Copy link
Author

coldhex commented Jul 8, 2025

Ok, thanks. Odd, those RGB values are what would be expected if the test geometry shader just passed through colors per vertex. I'll try to figure it out.

@coldhex coldhex marked this pull request as draft July 8, 2025 19:01
@coldhex
Copy link
Author

coldhex commented Jul 9, 2025

@Triticum0 Could you test with the latest commit? I added redundant computation to the test geometry shader which may fix this if it is a driver bug.

@Triticum0
Copy link
Collaborator

Triticum0 commented Jul 9, 2025

@coldhex it does fix the driver bug. no longer show up in the logs

@Triticum0
Copy link
Collaborator

here the log
xemu.log

coldhex added 3 commits July 9, 2025 21:00
Test OpenGL/Vulkan geometry shader triangle, strip and fan vertex ordering
during backend initialization. OpenGL/Vulkan does not guarantee absolute
vertex order for geometry shader input triangles. The test results are
used to reorder input triangle vertices into the first vertex convention
order so that correct provoking vertex can be chosen for flat shading.

Also, this removes use of the Vulkan provoking vertex extension. The
default first vertex convention is now used when emitting line strips
in geometry shader. (It would of course be possible to always emit only
separate line segments and then the convention wouldn't matter at all.)
This adds redundant computation to the simple geometry shader for winding
testing based on the assumption that Nvidia GeForce compiler has a bug
which may incorrectly detect a simple geometry shader as a passthrough
shader.
… too

This is possibly not needed with Vulkan, but apply it anyway just in case.
@coldhex coldhex force-pushed the pr/depth-precision branch from 8475b13 to 9a03037 Compare July 9, 2025 20:54
@coldhex
Copy link
Author

coldhex commented Jul 9, 2025

@Triticum0 Cool, thanks!

I applied the same fix also to the Vulkan geometry shader tester, just in case. I also updated related code comments. Furthermore, now a fresh OpenGL context is created for the tester so that it always runs with clean state.

@coldhex coldhex marked this pull request as ready for review July 9, 2025 21:24
@Triticum0
Copy link
Collaborator

Great, hopefully get merged soon, as had few game was going retest once it get merged. Worried when you put it back to draft after all you hard work. Thanks.

coldhex added 2 commits July 11, 2025 20:57
Mesa OpenGL radeonsi driver has a bug where triangles in triangle strips
emitted from geometry shader may have provoking vertices not corresponding
to either first or last vertex convention when GL_FIRST_VERTEX_CONVENTION
is used.

This commit changes geometry shader such that it always emits separate
triangles and line segments and it doesn't matter what vertex OpenGL or
Vulkan implementation chooses as provoking.
@coldhex
Copy link
Author

coldhex commented Jul 13, 2025

I added another commit to work around a Radeon OpenGL bug in Mesa. Geometry shader now emits separate triangles and line segments so that OpenGL/Vulkan implementation's provoking vertex choice doesn't matter for output primitives.

I found this bug in Mesa and created a gist that reproduces it on my system:
https://gist.github.com/coldhex/1744021258fe0af686dbca5b2e5550c4

Expected output is that the right rectangle should have the same colors as the left one, but with Radeon the output is:
radeon-geom-shader-bug

@Triticum0
Copy link
Collaborator

Fixes #2394 and #2393

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shenmue II: Flickering sky The Chronicles of Riddick: Z fighting flickering Super Monkey Ball Deluxe: cutscenes not rendering
4 participants