-
-
Notifications
You must be signed in to change notification settings - Fork 618
Description
🚀 feature request
Relevant Rules
py_binary
, py_test
(from @rules_python)
Description
When building Python binaries or tests using Bazel (specifically with py_binary and py_test),
we encounter an "Argument list too long" error during the build process.
This happens when our projects depend on a very large number of files, particularly those from large Python libraries managed by pip_parse (e.g., boto3, msgraph-sdk-python).
The root cause seems to be that Bazel calls the zipper tool by passing all file paths to be included in the package directly as command-line arguments.
rules_python/python/private/py_executable.bzl
Lines 891 to 957 in 9429ae6
def _create_zip_file(ctx, *, output, original_nonzip_executable, zip_main, runfiles): | |
"""Create a Python zipapp (zip with __main__.py entry point).""" | |
workspace_name = ctx.workspace_name | |
legacy_external_runfiles = _py_builtins.get_legacy_external_runfiles(ctx) | |
manifest = ctx.actions.args() | |
manifest.use_param_file("@%s", use_always = True) | |
manifest.set_param_file_format("multiline") | |
manifest.add("__main__.py={}".format(zip_main.path)) | |
manifest.add("__init__.py=") | |
manifest.add( | |
"{}=".format( | |
_get_zip_runfiles_path("__init__.py", workspace_name, legacy_external_runfiles), | |
), | |
) | |
for path in runfiles.empty_filenames.to_list(): | |
manifest.add("{}=".format(_get_zip_runfiles_path(path, workspace_name, legacy_external_runfiles))) | |
def map_zip_runfiles(file): | |
if file != original_nonzip_executable and file != output: | |
return "{}={}".format( | |
_get_zip_runfiles_path(file.short_path, workspace_name, legacy_external_runfiles), | |
file.path, | |
) | |
else: | |
return None | |
manifest.add_all(runfiles.files, map_each = map_zip_runfiles, allow_closure = True) | |
inputs = [zip_main] | |
if _py_builtins.is_bzlmod_enabled(ctx): | |
zip_repo_mapping_manifest = ctx.actions.declare_file( | |
output.basename + ".repo_mapping", | |
sibling = output, | |
) | |
_py_builtins.create_repo_mapping_manifest( | |
ctx = ctx, | |
runfiles = runfiles, | |
output = zip_repo_mapping_manifest, | |
) | |
manifest.add("{}/_repo_mapping={}".format( | |
_ZIP_RUNFILES_DIRECTORY_NAME, | |
zip_repo_mapping_manifest.path, | |
)) | |
inputs.append(zip_repo_mapping_manifest) | |
for artifact in runfiles.files.to_list(): | |
# Don't include the original executable because it isn't used by the | |
# zip file, so no need to build it for the action. | |
# Don't include the zipfile itself because it's an output. | |
if artifact != original_nonzip_executable and artifact != output: | |
inputs.append(artifact) | |
zip_cli_args = ctx.actions.args() | |
zip_cli_args.add("cC") | |
zip_cli_args.add(output) | |
ctx.actions.run( | |
executable = ctx.executable._zipper, | |
arguments = [zip_cli_args, manifest], | |
inputs = depset(inputs), | |
outputs = [output], | |
use_default_shell_env = True, | |
mnemonic = "PythonZipper", | |
progress_message = "Building Python zip: %{label}", | |
) |
This leads to the argument list exceeding the operating system's ARG_MAX limit.
This error makes our Bazel builds unstable and undermines the reliability of our CI/CD pipelines for large Python projects.
Describe the solution you'd like
We propose enhancing rules_python to always pass arguments to the zipper tool via a temporary response file, rather than directly on the command line.
Upon inspecting zipper's zip_main.cc source code, it appears to support reading arguments from a file using the @ syntax (as indicated by logic to process arguments starting with @).
By consistently utilizing this response file capability, rules_python can entirely bypass OS ARG_MAX limitations.