Skip to content

Conversation

@shyam-ramani
Copy link

Fix Table Alignment with Unicode Characters

Issue Description

Currently, the Rich library's table rendering doesn't properly handle the visual width of Unicode characters, causing misalignment in tables containing mixed content (ASCII, Unicode, emojis, etc.). This is particularly noticeable when displaying:

  • Full-width characters (e.g., Japanese, Chinese)
  • Emojis
  • Special symbols (arrows, stars, etc.)

Solution

Implemented a new get_unicode_width function in rich/text.py that properly calculates the visual width of Unicode characters based on their properties. The function:

  • Uses unicodedata.east_asian_width() to determine character width
  • Handles different character types (full-width, wide, ambiguous, narrow, half-width, neutral)
  • Returns the correct visual width for proper table alignment

Implementation Details

  1. Added get_unicode_width function to rich/text.py:

    def get_unicode_width(text: str) -> int:
        """Calculate the visual width of a string containing Unicode characters."""
        width = 0
        for char in text:
            char_width = unicodedata.east_asian_width(char)
            if char_width in ('F', 'W'):  # Full-width or Wide
                width += 2
            elif char_width == 'A':  # Ambiguous
                width += 1
            else:  # Narrow, Half-width, or Neutral
                width += 1
        return width
  2. Modified Table.add_row in rich/table.py to use the new width calculation:

    def add_row(self, *cells: Any, style: Optional[StyleType] = None) -> None:
        # Calculate padding for each cell based on Unicode width
        padded_cells = []
        for cell, column in zip(cells, self.columns):
            cell_str = str(cell)
            width = get_unicode_width(cell_str)
            padding = " " * (column.width - width) if hasattr(column, 'width') else ""
            padded_cells.append(cell_str + padding)

Test Cases

The fix has been tested with various Unicode content:

table = Table()
table.add_column("English")
table.add_column("Japanese")
table.add_column("Emoji")

table.add_row("Hello", "こんにちは", "👋")
table.add_row("World", "世界", "��")
table.add_row("→ Arrow", "→ 矢印", "➡️")
table.add_row("★ Star", "★ 星", "⭐")

Result:
┏━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━┓
┃ English ┃ Japanese ┃ Emoji ┃
┡━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━┩
│ Hello │ こんにちは │ 👋 │
│ World │ 世界 │ 🌍 │
│ → Arrow │ → 矢印 │ ➡️ │
│ ★ Star │ ★ 星 │ ⭐ │
└─────────┴────────────┴───────┘

Impact and Considerations

  • Positive Impact:

    • Improved table alignment for international users
    • Better support for modern Unicode content (emojis, symbols)
    • More accurate visual representation of mixed content
  • Performance:

    • Minimal performance impact as the width calculation is done only when adding rows
    • Uses built-in unicodedata module for efficient character property lookup
  • Backward Compatibility:

    • Fully backward compatible with existing code
    • No changes to the public API
    • Maintains existing behavior for ASCII-only content

Additional Notes

  • The fix handles all Unicode character width properties defined in the Unicode standard
  • Special attention given to ambiguous-width characters to ensure consistent display
  • The implementation is efficient and follows Rich's existing code style and patterns

@willmcgugan
Copy link
Member

Rich already handles double cell characters. Some emoji are never going to work because terminals render them at different widths, no matter what the unicodedata says.

If you are using an LLM to write this code, you should know that they tend to produce garbage.

@willmcgugan willmcgugan closed this Apr 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants