Add accessibility validation checks to site validator#8797
Add accessibility validation checks to site validator#8797TheAshutoshMishra wants to merge 4 commits intojenkins-infra:masterfrom
Conversation
- Added check for missing alt text on images (WCAG 2.1 Level A) - Added validation for generic alt text (image, logo, icon, etc) - Added check for icon-only buttons/links missing aria-label - Helps prevent accessibility issues during build time This enhancement will catch common accessibility problems before they reach production, improving site inclusivity for users with screen readers and assistive technologies.
There was a problem hiding this comment.
Pull request overview
This PR integrates automated accessibility validation into the site build process to enforce WCAG 2.1 Level A compliance. The validator checks for missing or generic alt text on images and verifies that icon-only interactive elements have appropriate ARIA labels.
Changes:
- Added
validate_accessibilitymethod to check pages for accessibility compliance - Implemented
check_image_alt_textto detect missing/generic alt attributes on images in both HAML and HTML - Implemented
check_aria_labelsto validate ARIA labels on icon-only buttons and links
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Find all img tags - both HAML and HTML | ||
| img_pattern = /%img\{[^}]*\}|<img[^>]*>/i |
There was a problem hiding this comment.
The regex pattern doesn't handle HAML's multiline image syntax (where attributes can span multiple lines). In HAML, you can have %img{ :src => 'path', :alt => 'text' } on separate lines with proper indentation. This pattern will only match if the closing brace is on the same line, potentially missing images that need validation. Consider using a more robust matching approach or document this limitation.
| # Find all img tags - both HAML and HTML | |
| img_pattern = /%img\{[^}]*\}|<img[^>]*>/i | |
| # Find all img tags - both HAML and HTML (support multiline HAML attributes) | |
| img_pattern = /%img\{.*?\}|<img[^>]*>/im |
| def check_aria_labels(page) | ||
| # Find buttons and links with only icons (no text content) | ||
| # Look for patterns like %button, %a with only ion-icon or svg inside | ||
| icon_only_pattern = /(%button|%a)(\{[^}]*\})?[^%\n]*(%ion-icon|<svg|<ion-icon)/i |
There was a problem hiding this comment.
The regex pattern will match buttons and links that have both text content AND icons, not just icon-only elements as the comment suggests. For example, a button with text "Download" and an icon will be flagged. The pattern should ensure the element contains ONLY an icon without text content. Consider adding negative lookahead or checking the captured content for text nodes.
content/_ext/validator.rb
Outdated
| unless img_tag.match(/:alt\s*=>|alt\s*=/i) | ||
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | ||
| else |
There was a problem hiding this comment.
The validation only checks if alt attribute exists or contains generic terms, but doesn't check for empty alt text (alt="" or alt=''). Empty alt text is different from missing alt text - decorative images should have empty alt text, but this should be an explicit choice. Consider adding a check to warn about empty alt text to ensure it's intentional rather than an oversight.
content/_ext/validator.rb
Outdated
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | ||
| if alt_match | ||
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | ||
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] |
There was a problem hiding this comment.
Hardcoding the generic terms list makes it difficult to maintain and extend. Consider extracting this to a constant at the class level (e.g., GENERIC_ALT_TERMS) to improve code organization and make it easier to add or modify terms in the future.
content/_ext/validator.rb
Outdated
| element_line = page.content[page.content.index(match[0])..page.content.index(match[0]) + 150] | ||
|
|
||
| # Check if there's an aria-label or aria-labelledby | ||
| unless element_line.match(/aria-label|aria-labelledby/i) |
There was a problem hiding this comment.
The regex only checks for 'aria-label' and 'aria-labelledby' in standard HTML attribute format, but doesn't account for HAML's symbol syntax (e.g., :"aria-label" => "text"). This means buttons/links with ARIA labels written in HAML hash syntax might be incorrectly flagged as missing labels. The regex should also match patterns like :"aria-label"\s*=>\s* or :"aria-labelledby"\s*=>.
| element_line = page.content[page.content.index(match[0])..page.content.index(match[0]) + 150] | |
| # Check if there's an aria-label or aria-labelledby | |
| unless element_line.match(/aria-label|aria-labelledby/i) | |
| element_start = page.content.index(match[0]) | |
| element_line = page.content[element_start..element_start + 150] | |
| # Check if there's an aria-label or aria-labelledby in either HTML or HAML syntax | |
| unless element_line.match(/(aria-label\s*=|aria-labelledby\s*=|:"aria-label"\s*=>|:"aria-labelledby"\s*=>)/i) |
content/_ext/validator.rb
Outdated
| icon_only_pattern = /(%button|%a)(\{[^}]*\})?[^%\n]*(%ion-icon|<svg|<ion-icon)/i | ||
|
|
||
| page.content.scan(icon_only_pattern).each do |match| | ||
| element_line = page.content[page.content.index(match[0])..page.content.index(match[0]) + 150] |
There was a problem hiding this comment.
Critical bug: page.content.index(match[0]) can return nil if the matched text is not found, which will cause an error when attempting to create a range. This can occur because the regex captures groups and match[0] may not exist in the original content string. Consider using match.begin(0) from the MatchData object or scanning with match_index to get the position directly.
Use Regexp.last_match.begin(0) instead of string index to safely get the match position, preventing nil errors when match is not found.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
content/_ext/validator.rb
Outdated
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | ||
| if alt_match | ||
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | ||
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] |
There was a problem hiding this comment.
The generic terms list includes 'logo' and 'icon', but these might be acceptable in certain contexts. For example, in the codebase at content/_partials/social-media-buttons.html.haml:33, there's %img{:title => "Logo", :src => expand_link(item.logo), :width => "96"} where "Logo" might be combined with other context. Consider whether warnings for these terms should be case-sensitive or whether they should allow compound descriptions like "Jenkins logo" or "GitHub icon".
| # Check if alt attribute is missing | ||
| unless img_tag.match(/:alt\s*=>|alt\s*=/i) | ||
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | ||
| else | ||
| # Check for generic/poor alt text | ||
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | ||
| if alt_match | ||
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | ||
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] | ||
|
|
There was a problem hiding this comment.
The validation doesn't check for images that use the title attribute without alt. In the codebase, several images use :title without :alt (e.g., content/2.0.html.haml:102). According to WCAG 2.1 Level A (Success Criterion 1.1.1), the title attribute alone is insufficient for accessibility because:
- Screen readers may not consistently announce title attributes
- Title attributes are only visible on hover, making them inaccessible to keyboard and touch users
- The
altattribute is the proper way to provide text alternatives for images
The validator should warn when images have title but no alt, as this suggests the developer may have mistakenly used title thinking it provides accessibility.
| # Check if alt attribute is missing | |
| unless img_tag.match(/:alt\s*=>|alt\s*=/i) | |
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | |
| else | |
| # Check for generic/poor alt text | |
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | |
| if alt_match | |
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | |
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] | |
| # Determine presence of alt and title attributes (HAML and HTML forms) | |
| has_alt = !!img_tag.match(/:alt\s*=>|alt\s*=/i) | |
| has_title = !!img_tag.match(/:title\s*=>|title\s*=/i) | |
| if !has_alt | |
| if has_title | |
| warning "Image uses title without alt in #{page.output_path}: #{img_tag.strip[0..80]}" | |
| else | |
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | |
| end | |
| else | |
| # Check for generic/poor alt text | |
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | |
| if alt_match | |
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | |
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] |
| # Validate accessibility requirements for pages | ||
| # Checks for WCAG 2.1 Level A compliance | ||
| def validate_accessibility(page) | ||
| return unless page.content | ||
|
|
||
| # Check for images without alt text | ||
| check_image_alt_text(page) | ||
|
|
||
| # Check for buttons and links that might need ARIA labels | ||
| check_aria_labels(page) | ||
| end | ||
|
|
||
| def check_image_alt_text(page) | ||
| # Find all img tags - both HAML and HTML | ||
| img_pattern = /%img\{[^}]*\}|<img[^>]*>/i | ||
| page.content.scan(img_pattern).each do |img_tag| | ||
| # Check if alt attribute is missing | ||
| unless img_tag.match(/:alt\s*=>|alt\s*=/i) | ||
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | ||
| else | ||
| # Check for generic/poor alt text | ||
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | ||
| if alt_match | ||
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | ||
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] | ||
|
|
||
| if generic_terms.include?(alt_text) | ||
| warning "Generic alt text '#{alt_text}' in #{page.output_path} - should be more descriptive" | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
|
|
||
| def check_aria_labels(page) | ||
| # Find buttons and links with only icons (no text content) | ||
| # Look for patterns like %button, %a with only ion-icon or svg inside | ||
| icon_only_pattern = /(%button|%a)(\{[^}]*\})?[^%\n]*(%ion-icon|<svg|<ion-icon)/i | ||
|
|
||
| page.content.scan(icon_only_pattern) do |match| | ||
| # Get position of match in content | ||
| match_pos = Regexp.last_match.begin(0) | ||
| next unless match_pos | ||
|
|
||
| # Extract surrounding context for validation | ||
| element_line = page.content[match_pos, 150] | ||
|
|
||
| # Check if there's an aria-label or aria-labelledby | ||
| unless element_line.match(/aria-label|aria-labelledby/i) | ||
| warning "Interactive element with icon needs aria-label in #{page.output_path}" | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
The validation approach of using regex pattern matching on raw HAML/HTML content is fragile and difficult to maintain. Consider these alternative approaches:
-
Parse HAML/HTML into a DOM: Use libraries like Nokogiri to parse the rendered HTML output, which would allow more reliable element traversal and attribute checking.
-
Hook into the rendering pipeline: If possible, validate during or after the HAML-to-HTML rendering phase when the structure is cleaner.
-
Use HAML parser: Parse HAML syntax trees directly using a HAML parser to understand the structure before HTML generation.
The current regex approach will be prone to false positives and false negatives as the codebase evolves and uses different HAML/HTML patterns. A parsing-based approach would be more robust and maintainable long-term.
| # Find buttons and links with only icons (no text content) | ||
| # Look for patterns like %button, %a with only ion-icon or svg inside | ||
| icon_only_pattern = /(%button|%a)(\{[^}]*\})?[^%\n]*(%ion-icon|<svg|<ion-icon)/i | ||
|
|
||
| page.content.scan(icon_only_pattern) do |match| | ||
| # Get position of match in content | ||
| match_pos = Regexp.last_match.begin(0) | ||
| next unless match_pos | ||
|
|
||
| # Extract surrounding context for validation | ||
| element_line = page.content[match_pos, 150] | ||
|
|
||
| # Check if there's an aria-label or aria-labelledby | ||
| unless element_line.match(/aria-label|aria-labelledby/i) | ||
| warning "Interactive element with icon needs aria-label in #{page.output_path}" | ||
| end |
There was a problem hiding this comment.
The icon-only pattern has several issues that will lead to false positives and false negatives:
- False negatives: The pattern requires icon elements to appear on the same line as the button/link (see
[^%\n]*), but in HAML, content is often indented on the next line. For example:
%button
%ion-icon{name: "copy-outline"}
This valid HAML structure won't be matched.
-
False positives: The pattern matches any button/link that contains an icon anywhere, even if it also has text content. For example, a button with both an icon and text label would still trigger a warning.
-
Incomplete icon detection: The pattern only looks for
%ion-icon,<svg, and<ion-icon, but icons could also be embedded as SVG files (e.g.,File.read("content/images/symbols/rss.svg")as seen in content/_layouts/changelog.html.haml:16).
Consider a more robust approach that parses the element structure to determine if it has text content in addition to icons.
| # Find buttons and links with only icons (no text content) | |
| # Look for patterns like %button, %a with only ion-icon or svg inside | |
| icon_only_pattern = /(%button|%a)(\{[^}]*\})?[^%\n]*(%ion-icon|<svg|<ion-icon)/i | |
| page.content.scan(icon_only_pattern) do |match| | |
| # Get position of match in content | |
| match_pos = Regexp.last_match.begin(0) | |
| next unless match_pos | |
| # Extract surrounding context for validation | |
| element_line = page.content[match_pos, 150] | |
| # Check if there's an aria-label or aria-labelledby | |
| unless element_line.match(/aria-label|aria-labelledby/i) | |
| warning "Interactive element with icon needs aria-label in #{page.output_path}" | |
| end | |
| # Find buttons and links that appear to be icon-only (no text content) | |
| # Handles HAML structures where content is on indented lines and detects | |
| # several icon patterns, including inline SVG and SVGs included via File.read. | |
| return unless page.content | |
| lines = page.content.lines | |
| lines.each_with_index do |line, index| | |
| # Match HAML-style %button or %a at the start of the line, capturing indentation | |
| next unless line =~ /^(?<indent>\s*)(%button|%a)(\{[^}]*\})?.*$/i | |
| indent = Regexp.last_match[:indent] | |
| # Collect this line and any more-indented child lines as the element block | |
| block_lines = [line] | |
| child_indent_prefix = indent + " " | |
| j = index + 1 | |
| while j < lines.length | |
| next_line = lines[j] | |
| break unless next_line.start_with?(child_indent_prefix) | |
| block_lines << next_line | |
| j += 1 | |
| end | |
| block = block_lines.join | |
| # Skip if the element already has ARIA labelling | |
| next if block.match(/aria-label\s*=|aria-label\s*=>|aria-labelledby\s*=|aria-labelledby\s*=>/i) | |
| # Detect icon usage: ion-icon, inline SVG, or SVG files included via File.read | |
| has_icon = | |
| block.match(/%ion-icon\b/i) || | |
| block.match(/<svg\b/i) || | |
| block.match(/<ion-icon\b/i) || | |
| block.match(/File\.read\([^)]*\.svg['"]?\)/i) | |
| next unless has_icon | |
| # Heuristically determine whether there is any textual content besides icons/markup. | |
| text_candidate = block.dup | |
| # Remove common icon constructs | |
| text_candidate.gsub!(/%ion-icon[^\n]*/i, "") | |
| text_candidate.gsub!(/<svg.*?<\/svg>/im, "") | |
| text_candidate.gsub!(/<ion-icon.*?\/?>/im, "") | |
| text_candidate.gsub!(/File\.read\([^)]*\.svg['"]?\)/im, "") | |
| # Strip HAML element prefixes and attribute hashes | |
| text_candidate.gsub!(/%[a-z0-9_]+(\{[^}]*\})?/i, "") | |
| text_candidate.gsub!(/\{[^}]*\}/m, "") | |
| # Remove simple Ruby string assignments/attributes like :title => "..." | |
| text_candidate.gsub!(/:[a-z0-9_]+\s*=>\s*['"].*?['"]/im, "") | |
| text_candidate.gsub!(/[a-z0-9_]+\s*=\s*['"].*?['"]/im, "") | |
| # Remove HAML comments | |
| text_candidate.gsub!(/^-#.*$/, "") | |
| # Collapse everything that is not alphanumeric to spaces | |
| text_candidate.gsub!(/[^a-z0-9]+/i, " ") | |
| # If nothing alphanumeric remains, we treat this as icon-only | |
| next unless text_candidate.strip.empty? | |
| warning "Interactive element with icon needs aria-label in #{page.output_path}" |
| # Check if there's an aria-label or aria-labelledby | ||
| unless element_line.match(/aria-label|aria-labelledby/i) |
There was a problem hiding this comment.
The ARIA label check doesn't account for different HAML syntax styles. In the codebase, ARIA attributes can be written as:
- Symbol keys with string values:
:"aria-label" => "RSS"(content/_layouts/changelog.html.haml:15) - Regular strings:
"aria-expanded" => "false"(content/_partials/dropdown.html.haml:1)
The current regex /aria-label|aria-labelledby/i won't match the symbol syntax like :"aria-label". Additionally, it doesn't check for aria-describedby which is another valid way to provide accessible labels according to WCAG.
| # Check if there's an aria-label or aria-labelledby | |
| unless element_line.match(/aria-label|aria-labelledby/i) | |
| # Check if there's an ARIA label attribute (supports HTML and HAML symbol syntax) | |
| unless element_line.match(/(:\"aria-(label|labelledby|describedby)\"|aria-(label|labelledby|describedby))/i) |
|
|
||
| # Check if there's an aria-label or aria-labelledby | ||
| unless element_line.match(/aria-label|aria-labelledby/i) | ||
| warning "Interactive element with icon needs aria-label in #{page.output_path}" |
There was a problem hiding this comment.
The error message "Interactive element with icon needs aria-label" is too generic and doesn't help developers locate the specific issue. It lacks:
- Context about which element (button vs link)
- Line number or sufficient context to find the element
- The actual element attributes for debugging
Consider including more details in the warning message, similar to how line 100 includes the first 80 characters of the img tag for context. For example: "Button/link with icon-only content needs aria-label in #{page.output_path}: #{element_line.strip[0..80]}"
| warning "Interactive element with icon needs aria-label in #{page.output_path}" | |
| # Determine element type (button or link) from the match | |
| match_data = Regexp.last_match | |
| raw_type = match_data && match_data[1] ? match_data[1].downcase : 'element' | |
| element_type = case raw_type | |
| when '%button' | |
| 'Button' | |
| when '%a' | |
| 'Link' | |
| else | |
| 'Interactive element' | |
| end | |
| # Approximate line number based on position in content | |
| line_number = page.content[0...match_pos].count("\n") + 1 | |
| # Short snippet of the offending element for context | |
| snippet = element_line.strip[0..80] | |
| warning "#{element_type} with icon-only content needs aria-label in #{page.output_path}: line #{line_number}: #{snippet}" |
|
|
||
| def check_image_alt_text(page) | ||
| # Find all img tags - both HAML and HTML | ||
| img_pattern = /%img\{[^}]*\}|<img[^>]*>/i |
There was a problem hiding this comment.
The HTML image pattern <img[^>]*> won't match self-closing img tags like <img ... /> which is valid XHTML and commonly used. The pattern should be <img[^>]*\/?> to match both <img> and <img /> formats.
| img_pattern = /%img\{[^}]*\}|<img[^>]*>/i | |
| img_pattern = /%img\{[^}]*\}|<img[^>]*\/?>/i |
content/_ext/validator.rb
Outdated
| # Check for generic/poor alt text | ||
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | ||
| if alt_match | ||
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | ||
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] | ||
|
|
||
| if generic_terms.include?(alt_text) | ||
| warning "Generic alt text '#{alt_text}' in #{page.output_path} - should be more descriptive" |
There was a problem hiding this comment.
Empty alt text (alt="") is being treated as valid, but it's semantically different from missing alt text. Empty alt text indicates a decorative image (which is valid for WCAG), but this validation doesn't distinguish between:
- Missing alt attribute (accessibility issue)
- Empty alt text alt="" (decorative image - valid)
- Generic alt text like "image" (poor accessibility)
The current regex check on line 99 will pass for empty alt text, then line 103 will try to extract it but won't warn about it. Consider explicitly checking for empty alt text and deciding whether to warn based on context (e.g., functional vs decorative images).
| # Check for generic/poor alt text | |
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | |
| if alt_match | |
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | |
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] | |
| if generic_terms.include?(alt_text) | |
| warning "Generic alt text '#{alt_text}' in #{page.output_path} - should be more descriptive" | |
| # Check for empty or generic/poor alt text | |
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | |
| if alt_match | |
| raw_alt_text = (alt_match[1] || alt_match[2]).to_s | |
| alt_text = raw_alt_text.strip.downcase | |
| if alt_text.empty? | |
| # Empty alt text is only appropriate for decorative images. | |
| # Treat as decorative only if explicitly marked as such. | |
| decorative = !!(img_tag.match(/aria-hidden\s*=\s*['"]true['"]/i) || | |
| img_tag.match(/role\s*=\s*['"](presentation|none)['"]/i)) | |
| unless decorative | |
| warning "Empty alt text on non-decorative image in #{page.output_path}: #{img_tag.strip[0..80]}" | |
| end | |
| else | |
| # Check for generic/poor alt text | |
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] | |
| if generic_terms.include?(alt_text) | |
| warning "Generic alt text '#{alt_text}' in #{page.output_path} - should be more descriptive" | |
| end |
content/_ext/validator.rb
Outdated
| unless img_tag.match(/:alt\s*=>|alt\s*=/i) | ||
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | ||
| else | ||
| # Check for generic/poor alt text | ||
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | ||
| if alt_match | ||
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase |
There was a problem hiding this comment.
The alt text regex pattern doesn't account for different quote types being used in HAML vs HTML. HAML can use:
- Single quotes:
:alt => 'text'oralt: 'text' - Double quotes:
:alt => "text"oralt: "text" - Symbol syntax:
alt: 'text'
The current pattern /:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i only captures hash rocket syntax for HAML (:alt =>) and doesn't capture the modern symbol syntax (alt:). Look at line 1 in content/_partials/dropdown.html.haml for an example where symbols like "aria-expanded" use the key-value syntax without hash rockets.
| unless img_tag.match(/:alt\s*=>|alt\s*=/i) | |
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | |
| else | |
| # Check for generic/poor alt text | |
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | |
| if alt_match | |
| alt_text = (alt_match[1] || alt_match[2]).to_s.strip.downcase | |
| unless img_tag.match(/:alt\s*=>|alt\s*[:=]/i) | |
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | |
| else | |
| # Check for generic/poor alt text | |
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*:\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | |
| if alt_match | |
| alt_text = (alt_match[1] || alt_match[2] || alt_match[3]).to_s.strip.downcase |
Handle both hash rocket (:alt =>) and symbol (alt:) syntax in addition to HTML (alt=) to properly validate all alt text formats used in the codebase.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
content/_ext/validator.rb
Outdated
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] | ||
|
|
||
| if generic_terms.include?(alt_text) | ||
| warning "Generic alt text '#{alt_text}' in #{page.output_path} - should be more descriptive" | ||
| end |
There was a problem hiding this comment.
The generic terms check will flag legitimate use cases where simple alt text is appropriate. For example, a company logo image with alt="logo" might be perfectly acceptable when the company name is already in surrounding text. The list includes "logo" which appears to be used legitimately in the codebase (e.g., content/_layouts/sig.html.haml:19). Consider making this check more contextual or less strict, perhaps as a "nit" level warning rather than treating all generic terms equally.
| def validate_accessibility(page) | ||
| return unless page.content | ||
|
|
||
| # Check for images without alt text | ||
| check_image_alt_text(page) | ||
|
|
||
| # Check for buttons and links that might need ARIA labels | ||
| check_aria_labels(page) | ||
| end | ||
|
|
||
| def check_image_alt_text(page) | ||
| # Find all img tags - both HAML and HTML | ||
| img_pattern = /%img\{[^}]*\}|<img[^>]*>/i | ||
| page.content.scan(img_pattern).each do |img_tag| | ||
| # Check if alt attribute is missing | ||
| # Supports: :alt => "text", alt: "text", and alt="text" | ||
| unless img_tag.match(/:alt\s*=>|alt\s*[:=]/i) | ||
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | ||
| else | ||
| # Check for generic/poor alt text | ||
| # Matches: :alt => "text", alt: "text", or alt="text" | ||
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*:\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | ||
| if alt_match | ||
| alt_text = (alt_match[1] || alt_match[2] || alt_match[3]).to_s.strip.downcase | ||
| generic_terms = ['image', 'img', 'logo', 'icon', 'picture', 'photo'] | ||
|
|
||
| if generic_terms.include?(alt_text) | ||
| warning "Generic alt text '#{alt_text}' in #{page.output_path} - should be more descriptive" | ||
| end | ||
| end | ||
| end | ||
| end | ||
| end | ||
|
|
||
| def check_aria_labels(page) | ||
| # Find buttons and links with only icons (no text content) | ||
| # Look for patterns like %button, %a with only ion-icon or svg inside | ||
| icon_only_pattern = /(%button|%a)(\{[^}]*\})?[^%\n]*(%ion-icon|<svg|<ion-icon)/i | ||
|
|
||
| page.content.scan(icon_only_pattern) do |match| | ||
| # Get position of match in content | ||
| match_pos = Regexp.last_match.begin(0) | ||
| next unless match_pos | ||
|
|
||
| # Extract surrounding context for validation | ||
| element_line = page.content[match_pos, 150] | ||
|
|
||
| # Check if there's an aria-label or aria-labelledby | ||
| unless element_line.match(/aria-label|aria-labelledby/i) | ||
| warning "Interactive element with icon needs aria-label in #{page.output_path}" | ||
| end | ||
| end | ||
| end |
There was a problem hiding this comment.
The validation performs multiple full-content scans with regex operations (one for images, one for icon-only elements). For large pages with lots of content, this could be slow. Consider:
- Combining patterns where possible to scan content once
- Caching compiled regex patterns outside the methods
- Adding early termination if a page is too large
- Using more efficient parsing approaches for known content structures
While this may not be critical for build-time validation, it's worth considering the performance impact if many pages need to be validated.
Only flag the most problematic alt text like 'image', 'img', 'picture', 'photo'. Allow 'logo', 'icon' which can be legitimate in context (e.g., company logo when name is in surrounding text). This makes the validator less strict while still catching the worst accessibility issues.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 1 out of 1 changed files in this pull request and generated 1 comment.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Check if alt attribute is missing | ||
| # Supports: :alt => "text", alt: "text", and alt="text" | ||
| unless img_tag.match(/:alt\s*=>|alt\s*[:=]/i) | ||
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | ||
| else | ||
| # Check for obviously poor alt text (most egregious cases only) | ||
| # Allows "logo", "icon" etc which can be legitimate in context | ||
| # Matches: :alt => "text", alt: "text", or alt="text" | ||
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*:\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | ||
| if alt_match | ||
| alt_text = (alt_match[1] || alt_match[2] || alt_match[3]).to_s.strip.downcase | ||
| # Only flag the most problematic generic terms | ||
| worst_terms = ['image', 'img', 'picture', 'photo'] | ||
|
|
||
| if worst_terms.include?(alt_text) | ||
| warning "Poor alt text '#{alt_text}' in #{page.output_path} - should describe the image content" | ||
| end |
There was a problem hiding this comment.
The regex pattern for detecting alt text presence /:alt\s*=>|alt\s*[:=]/i will match cases where 'alt' appears in other contexts, such as in URLs, comments, or other attribute values. For instance, it could incorrectly match something like data-alternative="value". While less likely to occur, consider using word boundaries or more specific patterns to ensure you're matching actual alt attributes.
| # Check if alt attribute is missing | |
| # Supports: :alt => "text", alt: "text", and alt="text" | |
| unless img_tag.match(/:alt\s*=>|alt\s*[:=]/i) | |
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | |
| else | |
| # Check for obviously poor alt text (most egregious cases only) | |
| # Allows "logo", "icon" etc which can be legitimate in context | |
| # Matches: :alt => "text", alt: "text", or alt="text" | |
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*:\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | |
| if alt_match | |
| alt_text = (alt_match[1] || alt_match[2] || alt_match[3]).to_s.strip.downcase | |
| # Only flag the most problematic generic terms | |
| worst_terms = ['image', 'img', 'picture', 'photo'] | |
| if worst_terms.include?(alt_text) | |
| warning "Poor alt text '#{alt_text}' in #{page.output_path} - should describe the image content" | |
| end | |
| # Extract alt attribute if present | |
| # Supports: :alt => "text", alt: "text", and alt="text" | |
| alt_match = img_tag.match(/:alt\s*=>\s*['"](.*?)['"]|alt\s*:\s*['"](.*?)['"]|alt\s*=\s*['"](.*?)['"]/i) | |
| unless alt_match | |
| warning "Missing alt text on image in #{page.output_path}: #{img_tag.strip[0..80]}" | |
| else | |
| # Check for obviously poor alt text (most egregious cases only) | |
| # Allows "logo", "icon" etc which can be legitimate in context | |
| alt_text = (alt_match[1] || alt_match[2] || alt_match[3]).to_s.strip.downcase | |
| # Only flag the most problematic generic terms | |
| worst_terms = ['image', 'img', 'picture', 'photo'] | |
| if worst_terms.include?(alt_text) | |
| warning "Poor alt text '#{alt_text}' in #{page.output_path} - should describe the image content" |
|
This sort of stuff should not be done with regular expressions. If you're concerned about accessibility, you can try to find actual accessibility issues (e.g. https://validator.nu/?doc=https%3A%2F%2Fwww.jenkins.io ) and fix them. |
Integrated automated accessibility checks into the site validator to enforce WCAG 2.1 Level A compliance during the build process. Changes: Added validation for missing/generic alt text and ARIA labels for icon-only interactive elements in both HAML and HTML. Impact: Prevents accessibility regressions by catching common errors before production, improving the experience for screen reader users.