-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
While going through the rules hunting for issues, I noticed a fair number have an example that doesn't give an error when copy/pasted verbatim. This is both inconvenient and misleading, so I wrote a script to try and find all of these cases. This issue is intended to serve as a tracking place for the progress on fixing them all.
There are definitely false positives in here (rules who's examples rely on specific configuration), and there might be false negatives (rules that follow a non-standard pattern that I didn't catch), but this should be a good starting point. Even for those rules that are false positives due to configuration, it should be good to double check they mention the needed config.
Once all of these are complete, I might work on a PR to add the checks to somewhere like check_docs_formatted.py
so any future examples will be checked in CI.
Current progress: 51 complete, 0 in progress, 15 false positives, 2 issues with rules, 15 missing tooling, 12 need done
Complete
In progress
Group | Code | File | Issue | PR |
---|
False positive
Group | Code | File | Issue | Reason |
---|---|---|---|---|
Flake8Bugbear | B014 | duplicate_exceptions.rs:71-76 | Error code 0 | This isn't really a false positive, more an undocumented deviation since the lint currently doesn't support checking the exception hierarchy for duplicates |
Flake8Executable | EXE001 | shebang_not_executable.rs:33-35 | Error code 0 | Relies on reading the execution settings of a file |
Flake8ImportConventions | ICN002 | banned_import_alias.rs:23-25 | Error code 0 | Relies on non-default configuration setting |
Flake8ImportConventions | ICN003 | banned_import_from.rs:22-24 | Error code 0 | Relies on non-default configuration setting |
Flake8TidyImports | TID253 | banned_module_level_imports.rs:27-33 | Error code 0 | Relies on non-default configuration setting |
Isort | I002 | add_required_imports.rs:28-30 | Error code 0 | Relies on non-default configuration setting |
Pycodestyle | W505 | doc_line_too_long.rs:41-44 | Error code 0 | Relies on non-default configuration setting |
Pycodestyle | E112 | indentation.rs:104-107 | SyntaxError | Generates both the lint and the SyntaxError , might be worth deprecating, see #19122 |
Pycodestyle | E113 | indentation.rs:165-168 | SyntaxError | Generates both the lint and the SyntaxError , might be worth deprecating, see #19122 |
Pydocstyle | D104 | not_missing.rs:342-344 | Error code 0 | Only works if the file is __init__.py |
Pylint | PLE0116 | continue_in_finally.rs:18-24 | Error code 0 | Only works on python < 3.8 |
Pylint | PLW0406 | import_self.rs:18-24 | Error code 0 | Relies on specific file name |
Pylint | PLR0904 | too_many_public_methods.rs:23-45 | Error code 0 | Example relies on non-default configuration |
Pylint | PLR0915 | too_many_statements.rs:24-40 | Error code 0 | Example is truncated for brevity |
Pylint | PLW0101 | unreachable.rs:20-25 | Invalid rule code | Is test-only rule |
Issue with rule
Group | Code | File | Issue | Reason and Issue/PR |
---|---|---|---|---|
Flake8Bandit | S601 | paramiko_calls.rs:18-23 | Error code 0 | Check is too restrictive, #19006 |
Flake8Bugbear | B017 | assert_raises_exception.rs:23-25 | Error code 0 | Example is false negative #19050 |
Missing tooling
This section is for the SyntaxError
s/missing errors that are due to the code using otherwise invisible characters. Theoretically they could be rendered like how the playground does it, which would make them both valid code and much easier to understand, but as far as I know that kind of tooling does not exist yet for MkDocs. This also applies to code that relies on normally invisible characters.
Group | Code | File | Issue |
---|---|---|---|
Pycodestyle | E101 | mixed_spaces_and_tabs.rs:21-23 | SyntaxError |
Pycodestyle | W391 | too_many_newlines_at_end_of_file.rs:23-25 | SyntaxError |
Pycodestyle | W291 | trailing_whitespace.rs:19-21 | SyntaxError |
Pycodestyle | W293 | trailing_whitespace.rs:56-58 | SyntaxError |
Pycodestyle | E223 | space_around_operator.rs:18-20 | SyntaxError |
Pycodestyle | E224 | space_around_operator.rs:82-84 | SyntaxError |
Pycodestyle | E242 | space_around_operator.rs:145-147 | SyntaxError |
Pycodestyle | E273 | whitespace_around_keywords.rs:74-76 | SyntaxError |
Pycodestyle | E274 | whitespace_around_keywords.rs:103-105 | SyntaxError |
Ruff | RUF054 | indented_form_feed.rs:22-24 | SyntaxError |
Pylint | PLE2510 | invalid_string_characters.rs:20-22 | Error code 0 |
Pylint | PLE2512 | invalid_string_characters.rs:55-57 | Error code 0 |
Pylint | PLE2513 | invalid_string_characters.rs:90-92 | Error code 0 |
Pylint | PLE2514 | invalid_string_characters.rs:125-127 | Error code 0 |
Pylint | PLE2515 | invalid_string_characters.rs:159-161 | Error code 0 |
Needs done
Group | Code | File | Issue |
---|---|---|---|
Ruff | RUF032 | decimal_from_float_literal.rs:22-24 | Error code 0 |
Ruff | RUF026 | default_factory_kwarg.rs:43-46 | Error code 0 |
Ruff | RUF056 | falsy_dict_get_fallback.rs:18-21 | Error code 0 |
Ruff | RUF051 | if_key_in_dict_del.rs:19-22 | Error code 0 |
Ruff | RUF045 | implicit_classvar_in_dataclass.rs:35-39 | Error code 0 |
Ruff | RUF064 | non_octal_permissions.rs:22-24 | Error code 0 |
Ruff | RUF041 | unnecessary_nested_literal.rs:35-42 | Error code 0 |
Ruff | RUF055 | unnecessary_regular_expression.rs:25-27 | Error code 0 |
Ruff | RUF039 | unraw_re_pattern.rs:29-31 | Error code 0 |
Flake8UsePathlib | PTH208 | violations.rs:539-549 | Error code 0 |
Flake8Pyi | PYI034 | non_self_return_type.rs:24-40 | Error code 0 |
McCabe | C901 | function_is_too_complex.rs:22-34 | Error code 0 |
The script
This script assumes it is placed in the top level ruff
folder (ie next to .git
/crates
/README.md
).
It relies on uvx
being in-path (I might change this later to use the run locally cargo command if needed during dev)
import json
import re
import subprocess
from itertools import pairwise
from pathlib import Path
ruff = Path(__file__).parent
ruff_linter = ruff / "crates" / "ruff_linter" / "src"
ruff_rules = ruff_linter / "rules"
ruff_codes_content = (ruff_linter / "codes.rs").open(encoding="utf-8").read()
linter_to_prefix = {
linter: prefix
for prefix, linter in re.findall(
r" +#\[prefix = \"(\w+)\"\]\n +(\w+)",
(ruff_linter / "registry.rs").open(encoding="utf-8").read(),
)
}
# Needs special casing since in the code it has both E and W prefix markers,
# and the lints already have those prefixes.
linter_to_prefix["Pycodestyle"] = ""
example_forms = [
"## Example",
"## Examples",
"For example:",
"## Example:",
]
use_instead_forms = [
"Use instead:",
"Use instead (with default setting):",
"On Python 3.14+, use instead:",
"On Python 3.10+, use instead:",
"installed, use instead:",
"Use instead",
"NaN. Instead, prefer `math.isnan`:",
"Use instead (using the NumPy docstring format):",
"Or, using the Google docstring format:",
"Or (in the Google docstring format):",
"Use instead (in the NumPy docstring format):",
"Used instead:",
"Instead, refactor into separate implementations:",
"Or, refactor to use an `Enum`:",
"Or, make the argument a keyword-only argument:",
"Use instead if the batches must be of uniform length:",
"Or if the batches can be of non-uniform length:",
"Instead, use `.replace(tzinfo=<timezone>)`:",
"Or, use `.astimezone()`:",
"Instead, assign the string to a variable:",
"The automatic fix will remove the print statement entirely:",
"To keep the line for logging purposes, instead use something like:",
"Instead, use a simple string comparison, such as `==` or `!=`:",
"Assuming `multiline-quotes` is set to `double`, use instead:",
"Assuming `docstring-quotes` is set to `double`, use instead:",
"Assuming `inline-quotes` is set to `double`, use instead:",
"Use one of these instead:",
]
for path in [
*ruff_rules.rglob("*/rules/**/*.rs"),
*ruff_rules.rglob("*/violations.rs"),
]:
if path.name == "test_rules.rs":
continue
content = path.open(encoding="utf-8").read()
if "ViolationMetadata" not in content:
continue
content_lines = content.splitlines()
for content_line_index, content_line in enumerate(content_lines):
if content_line != "#[derive(ViolationMetadata)]":
continue
if content_lines[content_line_index + 1].startswith("#[deprecated"):
continue
rule_struct_name = re.search(
r"struct (\w+)", content_lines[content_line_index + 1]
)[1]
if ruff_codes_content_search := re.search(
rf"(?m)^ +\((\w+), \"(\w+)\"\) => \(RuleGroup::(?:Stable|Preview), [\w:]+::{rule_struct_name}\),",
ruff_codes_content,
):
(linter, rule_suffix) = ruff_codes_content_search.groups()
else:
continue
rule_code = linter_to_prefix[linter] + rule_suffix
doc_comment = []
for doc_comment_line_relative_index, doc_comment_line in enumerate(
content_lines[:content_line_index][::-1]
):
if not doc_comment_line.startswith("///"):
doc_comment_start_index = (
content_line_index - doc_comment_line_relative_index + 1
)
break
if doc_comment_line.startswith("/// "):
doc_comment.append(doc_comment_line.removeprefix("/// "))
else:
doc_comment.append(doc_comment_line.removeprefix("///"))
doc_comment[-1] = doc_comment[-1].rstrip()
doc_comment.reverse()
example_indices = []
for doc_comment_line_index, doc_comment_line in enumerate(doc_comment):
if doc_comment_line in example_forms:
example_indices.append(doc_comment_line_index)
if not example_indices:
continue
use_instead_indices = []
for doc_comment_line_index, doc_comment_line in enumerate(doc_comment):
if doc_comment_line in use_instead_forms:
use_instead_indices.append(doc_comment_line_index)
example_spans = [
(
start,
min([
end,
*[index for index in use_instead_indices if start < index < end],
]),
)
for start, end in pairwise([*example_indices, len(doc_comment)])
]
for start_of_examples, end_of_examples in example_spans:
example_code_blocks = []
inside_example_code_block = False
for doc_comment_line_index, doc_comment_line in enumerate(
doc_comment[start_of_examples:end_of_examples]
):
if doc_comment_line.startswith(r"\`\`\`py"):
if doc_comment_line.startswith(r"\`\`\`pycon"):
continue
example_code_blocks.append((
[],
"pyi" if doc_comment_line.startswith(r"\`\`\`pyi") else "py",
doc_comment_line_index
+ start_of_examples
+ doc_comment_start_index,
))
inside_example_code_block = True
elif doc_comment_line.startswith(r"\`\`\`") and inside_example_code_block:
inside_example_code_block = False
example_code_blocks[-1] = (
*example_code_blocks[-1],
doc_comment_line_index
+ start_of_examples
+ doc_comment_start_index,
)
elif inside_example_code_block:
example_code_blocks[-1][0].append(doc_comment_line)
for (
example_code_block,
path_extension,
code_block_start,
code_block_end,
) in example_code_blocks:
stdin = "\n".join(example_code_block).encode("utf-8")
completed_process = subprocess.run(
[
"uvx",
"ruff",
"check",
"--isolated",
"--preview",
"--output-format",
"json",
"--select",
rule_code,
"--stdin-filename",
f"test.{path_extension}",
"-",
],
input=stdin,
capture_output=True,
check=False,
)
if completed_process.returncode == 0:
# Check for rules that only work on a newer version
completed_process = subprocess.run(
[
"uvx",
"ruff",
"check",
"--target-version",
"py313",
"--isolated",
"--preview",
"--output-format",
"json",
"--select",
rule_code,
"--stdin-filename",
f"test.{path_extension}",
"-",
],
input=stdin,
capture_output=True,
check=False,
)
error_string = f"|{linter}|{rule_code}|[{path.name}:{code_block_start}-{code_block_end}](https://github.com/astral-sh/ruff/blob/main/{path.relative_to(ruff).as_posix()}#L{code_block_start}-L{code_block_end})|"
if (
completed_process.returncode == 1
and completed_process.stdout
and all(
error["code"] == rule_code
for error in json.loads(completed_process.stdout)
)
):
pass
elif completed_process.returncode == 0:
print(error_string + "Error code 0|")
elif (
completed_process.returncode == 1
and b"SyntaxError" in completed_process.stdout
):
print(error_string + "SyntaxError|")
elif (
completed_process.returncode == 2
and b"--select <RULE_CODE>" in completed_process.stderr
):
print(error_string + "Invalid rule code|")
else:
print(rule_code, content_line_index, path)
print(completed_process)
print(repr(stdin))