Skip to content

[FLINK-37799][model][docs] Add document for OpenAI Model Function #26671

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jun 21, 2025

Conversation

yunfengzhou-hub
Copy link
Contributor

What is the purpose of the change

This PR is a continuation to #26652, adding document for the introduced OpenAI Model Function.

Brief change log

  • Adds document for the OpenAI Model Function.

Verifying this change

This change is a trivial rework / code cleanup without any test coverage.

Does this pull request potentially affect one of the following parts:

  • Dependencies (does it add or upgrade a dependency): no
  • The public API, i.e., is any changed class annotated with @Public(Evolving): no
  • The serializers: no
  • The runtime per-record code paths (performance sensitive): no
  • Anything that affects deployment or recovery: JobManager (and its components), Checkpointing, Kubernetes/Yarn, ZooKeeper: no
  • The S3 file system connector: no

Documentation

  • Does this pull request introduce a new feature? yes
  • If yes, how is the feature documented? docs

@yunfengzhou-hub yunfengzhou-hub marked this pull request as ready for review June 12, 2025 07:06
@flinkbot
Copy link
Collaborator

flinkbot commented Jun 12, 2025

CI report:

Bot commands The @flinkbot bot supports the following commands:
  • @flinkbot run azure re-run the last Azure build


# OpenAI

The OpenAI Model Function allows Flink SQL to call OpenAI API for inference tasks.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest a link to the OpenAI javadoc on the words OpenAI API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment! I've add the link.

@yunfengzhou-hub
Copy link
Contributor Author

Hi @fsk119 @lihaosky could you please help review this PR?


The function supports calling remote OpenAI model services via Flink SQL for prediction/inference tasks. Currently, the following tasks are supported:

* chat completions
Copy link
Contributor

@davidradl davidradl Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be more information about these 2 tasks, and why we are restricted to these tasks?
It would be good to include links to what these mean,
https://platform.openai.com/docs/api-reference/chat
https://platform.openai.com/docs/api-reference/embeddings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. I've added a link and a brief description to each of the 2 tasks.

why we are restricted to these tasks?

Because more tasks require additional Model Function subclasses to adapt, which has not been implemented yet. Despite that the other tasks are not supported, the current 2 tasks should have been enough for examples and demos proving the end-to-end functionality of Flink Model Functions in this early phase. We can continue to add support for more tasks in future.

Given that the above message is pure developer's consideration, I would prefer not to expose such reasons to readers of this document.


## Usage examples

The following example creates a chat completions model and use it to predict sentiment labels for movie reviews.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use -> uses

WITH (
'provider'='openai',
'endpoint'='https://api.openai.com/v1/chat/completions',
'api-key' = '<YOUR KEY>',
Copy link
Contributor

@davidradl davidradl Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we define <YOUR KEY>.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about what it means to "define" <YOUR KEY>. Do you mean we should somehow define a variable or constant in this markdown document?

<td>required</td>
<td style="word-wrap: break-word;">(none)</td>
<td>String</td>
<td>Full URL of the OpenAI API endpoint, e.g., <code>https://api.openai.com/v1/chat/completions</code> or
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: e.g., -> e.g.

</tbody>
</table>

### chat/completions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chat/completions is different to chat completions that we mentioned earlier as a task. I suggest being consistent.

<td>optional</td>
<td style="word-wrap: break-word;">"You are a helpful assistant."</td>
<td>String</td>
<td>System message for chat tasks.</td>
Copy link
Contributor

@davidradl davidradl Jun 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to point into the OpenAI docs as to what this and the other parameters mean. I notice we use the phrase system-prompt an System message, I suggest we define one and point to it an only use one way to refer to this. When we say system here - do we mean system role?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. system-prompt means the context message for the system role in the input request for OpenAI API. I have changed its description to clarify this ambiguity.

I suggest we define one and point to it an only use one way to refer to this.

Same as the comment above, I'm not sure how to "define one". Is it something like a variable or constant?

Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

<td>optional</td>
<td style="word-wrap: break-word;">null</td>
<td>Double</td>
<td>Probability cutoff for token selection (used instead of temperature). See <a href=\"https://platform.openai.com/docs/api-reference/responses/create#responses-create-top_p\">top_p</a></td>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

<table class="table table-bordered">
<thead>
<tr>
<th class="text-center">Task Type</th>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this task type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It corresponds to the tasks mentioned in the document above.

The function supports calling remote OpenAI model services via Flink SQL for prediction/inference tasks. Currently, the following tasks are supported...

There is no model option named task type yet. I have changed this line from "Task Type" to "Task" to avoid possible ambiguity.

Copy link
Member

@fsk119 fsk119 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@lihaosky lihaosky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks

@fsk119 fsk119 merged commit ceecdb7 into apache:master Jun 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants