-
Notifications
You must be signed in to change notification settings - Fork 52
WIP Mitigate hallucinations with codeblocks containing Typescript/Web Components/Javascript #64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments but this looks cleaner!
@@ -0,0 +1,995 @@ | |||
import { z } from "zod"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've moved from storing big files likes schemas inside dev-mcp and we now store all schemas inside shopify.dev itself and fetch only what we need on demand. Do you think we could move this to shopify.dev and fetch it at runtime when needed? You can take a look at how we fetch and cache graphql schemas to get inspiration here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup! Hassan is going to work on this if I can't get to it sooner.
package.json
Outdated
@@ -3,15 +3,15 @@ | |||
"version": "1.1.0", | |||
"main": "dist/index.js", | |||
"scripts": { | |||
"build": "tsc && node -e \"require('fs').chmodSync('dist/index.js', '755')\"", | |||
"build:watch": "tsc --watch", | |||
"build": "tspc && node -e \"require('fs').chmodSync('dist/index.js', '755')\"", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the change to tpsc? Was there something failing with tsc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shit -- great catch -- it was used when I was spiking with typia. I'll revert this
0457de9
to
99d744e
Compare
Why
There is nothing more costly to productivity than LLM hallucinations within codeblocks that accepted into a codebase.
We have mitigated that risk with GraphQL. We need to do the same with Typescript, Web components, and Javascript codeblocks using Shopify's APIs.
What
Even if users aren't using Typescript (web components with polaris app home), we can still use the types of the Typescript APIs that are directly or indirectly used.
This introduces a new
validate_typescript_codeblocks
MCP tool that accepts a TS package name that will be used to detect hallucinations / errors in codeblocks LLMs generated/modified.In the case of app home, it will accept the Typescript library
@shopify/app-bridge-ui-types
that app home uses indirectly as our TS source of truth. We'll configurelearn_shopify_api
's response to require all the LLM to use this tool to validate any codeblocks they would offer return to a user.WIP
What's still in question is how we will use TS .
This uses Typia to validate TS without having a user's machine. I need to stress test the implementation with more than the 3 weak sauce evals I've been testing this with.
This previous idea added a temporary TS file and using the user's CLI to typescript compile. I worry about how many more things can go wrong with this case: the user doesn't have typescript installed because they prefer JS/Web components. This could send an agent spiraling and modify the user's machine in a way that isn't necessary. It also has the risk of not cleaning up after itself (it temporariliy creates a TS file to run the typescript compile command.