Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Oct 8, 2025

Problem

Vulture was processing temporary and backup files created by text editors when scanning directories. For example, Emacs creates temporary files like .#filename.py for unsaved buffers, and these were being analyzed by vulture even though they're not valid Python modules according to PEP 8.

Solution

This PR adds filtering to only process Python files with valid module names when scanning directories. According to PEP 8, module names should be valid Python identifiers.

What's changed:

  • Added _is_valid_module_name() helper function that checks if a filename is a valid Python identifier (must start with a letter or underscore, followed by letters, digits, or underscores)
  • Updated get_modules() to filter out files with invalid names when using rglob() on directories
  • Files explicitly specified on the command line are still processed regardless of name (respecting user intent)

Examples of filtered files:

  • .#filename.py - Emacs temporary files
  • .dotfile.py - Hidden files starting with dots
  • 2module.py - Files starting with numbers
  • my-module.py - Files containing dashes
  • ~backup.py - Backup files

Examples of files still processed:

  • module.py - Valid lowercase names
  • _private.py - Names starting with underscore
  • __init__.py - Double underscore names
  • module_123.py - Names containing numbers (but not starting with them)
  • café.py - Unicode identifiers (valid per PEP 3131)

Testing:

Added comprehensive test coverage including:

  • 10 tests for valid/invalid module name detection
  • 3 tests for directory scanning behavior
  • Verified explicit file specification still works for invalid names

Fixes the issue where users had to manually exclude editor temporary files using patterns like "*/.#*.py" in their configuration.

Original prompt

This section details on the original issue you should resolve

<issue_title>Source file filtering is presently too broad</issue_title>
<issue_description>I use emacs for text editing, and when you have an unsaved buffer, emacs creates a temporary file ".#{original_file_name}". I'm able to exclude these in my configs via "*/.#*.py", however I think it's worth bringing this up, as no valid python source file would have such characters.

I'd suggest that by default, vulture only look for python files with valid names.

Cheers</issue_description>

<agent_instructions>Add test for this at a place that fits well into the test suite.</agent_instructions>

Comments on the Issue (you are @copilot in this section)

Fixes #402

💬 Share your feedback on Copilot coding agent for the chance to win a $200 gift card! Click here to start the survey.

- Add `_is_valid_module_name()` helper to check if a filename is a valid Python identifier
- Update `get_modules()` to filter out files with invalid names when scanning directories
- Files explicitly specified on command line are still processed regardless of name
- Add comprehensive tests for valid/invalid module name detection
- Fixes issue with emacs temporary files (.#filename.py) being processed

Co-authored-by: jendrikseipp <[email protected]>
Copilot AI changed the title [WIP] Refactor file filtering to exclude temporary emacs files Filter out Python files with invalid module names during directory scanning Oct 8, 2025
Copilot AI requested a review from jendrikseipp October 8, 2025 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Source file filtering is presently too broad

2 participants