Skip to content

Add GITHUB_API_BASE_URL environment variable #174

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion api/data_pipeline.py
Original file line number Diff line number Diff line change
Expand Up @@ -443,7 +443,8 @@ def get_github_file_content(repo_url: str, file_path: str, access_token: str = N

# Use GitHub API to get file content
# The API endpoint for getting file content is: /repos/{owner}/{repo}/contents/{path}
api_url = f"https://api.github.com/repos/{owner}/{repo}/contents/{file_path}"
api_base_url = os.getenv("GITHUB_API_BASE_URL", "https://api.github.com")
api_url = f"https://{api_base_url}/repos/{owner}/{repo}/contents/{file_path}"
Comment on lines +446 to +447
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The current construction of api_url has a couple of issues:

  1. Critical: If GITHUB_API_BASE_URL is not set, api_base_url becomes "https://api.github.com". Then, api_url becomes f"https://https://api.github.com/...", which is incorrect due to the double https://.
  2. Medium: If GITHUB_API_BASE_URL is set by the user and already includes the https:// scheme (e.g., "https://my.ghe.com/api/v3"), the same double https:// issue will occur.
  3. Minor: If GITHUB_API_BASE_URL ends with a trailing slash (e.g., "https://my.ghe.com/api/v3/"), the resulting URL might have a double slash (e.g., "...v3//repos/..."), which is generally undesirable though often tolerated by servers.

Could we adjust the logic to ensure the GITHUB_API_BASE_URL is expected to contain the scheme, and then we don't prepend https://? Also, stripping any trailing slash from the base URL would make it more robust. For example:

Suggested change
api_base_url = os.getenv("GITHUB_API_BASE_URL", "https://api.github.com")
api_url = f"https://{api_base_url}/repos/{owner}/{repo}/contents/{file_path}"
raw_github_api_base_url = os.getenv("GITHUB_API_BASE_URL", "https://api.github.com")
# Ensure the base URL includes the scheme and remove any trailing slash for clean concatenation.
api_base_url = raw_github_api_base_url.rstrip('/')
api_url = f"{api_base_url}/repos/{owner}/{repo}/contents/{file_path}"


# Prepare curl command with authentication if token is provided
curl_cmd = ["curl", "-s"]
Expand Down
5 changes: 3 additions & 2 deletions src/app/[owner]/[repo]/page.tsx
Original file line number Diff line number Diff line change
Expand Up @@ -1057,8 +1057,9 @@ IMPORTANT:
let treeData = null;
let apiErrorDetails = '';

const apiBaseUrl = process.env.GITHUB_API_BASE_URL || 'https://api.github.com';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for apiBaseUrl is good in that it expects the full base URL (including scheme) from process.env.GITHUB_API_BASE_URL or defaults correctly.

To make it more robust, what do you think about ensuring that apiBaseUrl doesn't have a trailing slash? If process.env.GITHUB_API_BASE_URL is set to something like "https://my.ghe.com/api/v3/", the current concatenation apiUrl = ${apiBaseUrl}/repos/...would result in"https://my.ghe.com/api/v3//repos/..."`. While many servers handle // as /, it's cleaner to avoid it.

Consider removing a potential trailing slash:

        const apiBaseUrl = (process.env.GITHUB_API_BASE_URL || 'https://api.github.com').replace(/\/$/, '');

for (const branch of ['main', 'master']) {
const apiUrl = `https://api.github.com/repos/${owner}/${repo}/git/trees/${branch}?recursive=1`;
const apiUrl = `${apiBaseUrl}/repos/${owner}/${repo}/git/trees/${branch}?recursive=1`;
const headers = createGithubHeaders(token);

console.log(`Fetching repository structure from branch: ${branch}`);
Expand Down Expand Up @@ -1099,7 +1100,7 @@ IMPORTANT:
try {
const headers = createGithubHeaders(token);

const readmeResponse = await fetch(`https://api.github.com/repos/${owner}/${repo}/readme`, {
const readmeResponse = await fetch(`${apiBaseUrl}/repos/${owner}/${repo}/readme`, {
headers
});

Expand Down