Skip to content

Commit ec55cc7

Browse files
authored
Merge pull request #4 from i-dot-ai/file-convert
Add File converter, blacklist and cli
2 parents 2904987 + 1514b2b commit ec55cc7

File tree

9 files changed

+1029
-30
lines changed

9 files changed

+1029
-30
lines changed

README.md

Lines changed: 38 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -6,22 +6,50 @@
66
[![Commit activity](https://img.shields.io/github/commit-activity/m/i-dot-ai/uwotm8)](https://img.shields.io/github/commit-activity/m/i-dot-ai/uwotm8)
77
[![License](https://img.shields.io/github/license/i-dot-ai/uwotm8)](https://img.shields.io/github/license/i-dot-ai/uwotm8)
88

9-
Converting American English to British English
9+
Converting American English to British English - a tool to automatically convert American English spelling to British English spelling in your text and code files.
1010

1111
- **Github repository**: <https://github.com/i-dot-ai/uwotm8/>
1212
- **Documentation** <https://i-dot-ai.github.io/uwotm8/>
1313

14-
To finalize the set-up for publishing to PyPI or Artifactory, see [here](https://fpgmaas.github.io/cookiecutter-poetry/features/publishing/#set-up-for-pypi).
15-
For activating the automatic documentation with MkDocs, see [here](https://fpgmaas.github.io/cookiecutter-poetry/features/mkdocs/#enabling-the-documentation-on-github).
16-
To enable the code coverage reports, see [here](https://fpgmaas.github.io/cookiecutter-poetry/features/codecov/).
14+
## Installation
1715

18-
## Releasing a new version
16+
```bash
17+
pip install uwotm8
18+
```
1919

20-
- Create an API Token on [PyPI](https://pypi.org/).
21-
- Add the API Token to your projects secrets with the name `PYPI_TOKEN` by visiting [this page](https://github.com/i-dot-ai/uwotm8/settings/secrets/actions/new).
22-
- Create a [new release](https://github.com/i-dot-ai/uwotm8/releases/new) on Github.
23-
- Create a new tag in the form `*.*.*`.
24-
- For more details, see [here](https://fpgmaas.github.io/cookiecutter-poetry/features/cicd/#how-to-trigger-a-release).
20+
## Quick Start
21+
22+
Convert a single file:
23+
24+
```bash
25+
uwotm8 example.txt
26+
```
27+
28+
Read from stdin and write to stdout:
29+
30+
```bash
31+
echo "I love the color gray." | uwotm8
32+
# Output: "I love the colour grey."
33+
```
34+
35+
Use in Python code:
36+
37+
```python
38+
from uwotm8 import convert_american_to_british_spelling
39+
40+
en_gb_str = convert_american_to_british_spelling("Our American neighbors' dialog can be a bit off-color.")
41+
print(en_gb_str)
42+
# Output: "Our American neighbours' dialogue can be a bit off-colour."
43+
```
44+
45+
## Features
46+
47+
- Converts common American English spellings to British English
48+
- Preserves words in special contexts (code blocks, URLs, hyphenated terms)
49+
- Maintains a blacklist of technical terms that shouldn't be converted
50+
- Preserves original capitalization patterns
51+
52+
For full documentation, examples, and advanced usage, please visit the [documentation site](https://i-dot-ai.github.io/uwotm8/).
2553

2654
---
2755

docs/index.md

Lines changed: 21 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -15,7 +15,7 @@ LLMs are fantastic things, but sometimes they need a little help to write in the
1515
pip install uwotm8
1616
```
1717

18-
## Usage
18+
## Quick Start
1919

2020
```python
2121
from uwotm8 import convert_american_to_british_spelling
@@ -28,6 +28,26 @@ Bosh! You'll get back:
2828

2929
> Our American **neighbours**' **dialogue** can be a bit off-**colour** when you're used to British spelling, you **recognise**?
3030
31+
Or use it on the command line:
32+
33+
```bash
34+
echo "The gray color of the theater is recognized by our neighbors." | uwotm8
35+
# Output: "The grey colour of the theatre is recognised by our neighbours."
36+
```
37+
38+
For complete documentation on all available features and options, see the [Usage Guide](usage.md).
39+
40+
## Features
41+
42+
uwotm8 intelligently preserves words in certain contexts:
43+
44+
- Code blocks (text within backticks)
45+
- URLs and URIs
46+
- Hyphenated terms (e.g., "3-color" remains "3-color" rather than becoming "3-colour")
47+
- Technical terms in the blacklist (e.g., "program" in computing contexts)
48+
49+
For detailed information on how these features work, see the [Implementation Details](modules.md).
50+
3151
## Acknowledgements
3252

3353
Built by the [Incubator for AI (i.AI)](https://ai.gov.uk), part of GDS in the Department for Science, Innovation and Technology (DSIT).

docs/modules.md

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,40 @@
11
::: uwotm8.convert.convert_american_to_british_spelling
2+
3+
## Word Context Detection
4+
5+
The `convert_american_to_british_spelling` function includes special handling for various text contexts:
6+
7+
### Hyphenated Terms
8+
9+
Words that are part of hyphenated terms are preserved in their original form. For example:
10+
11+
- "3-color" remains "3-color" (not converted to "3-colour")
12+
- "x-coordinate" remains "x-coordinate" (not converted to "x-coordinate")
13+
- "multi-colored" remains "multi-colored" (not converted to "multi-coloured")
14+
15+
This is useful for preserving technical terminology and compound adjectives where conversion might be inappropriate.
16+
17+
### Code Blocks
18+
19+
Words within code blocks (surrounded by backticks) are not converted, preserving code syntax and variable names.
20+
21+
### URLs and URIs
22+
23+
Words that appear in lines containing URLs or URIs (identified by "://" or "www.") are not converted to avoid breaking links.
24+
25+
### Conversion Blacklist
26+
27+
A blacklist of words that should not be converted is maintained, including technical terms that have different meanings in different contexts:
28+
29+
- "program" vs "programme" (in computing contexts)
30+
- "disk" vs "disc" (in computing contexts)
31+
- "analog" vs "analogue" (in technical contexts)
32+
- And others
33+
34+
## Capitalization Preservation
35+
36+
The function preserves the capitalization pattern of the original word:
37+
38+
- ALL CAPS words remain ALL CAPS
39+
- Title Case words remain Title Case
40+
- lowercase words remain lowercase

docs/usage.md

Lines changed: 193 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,193 @@
1+
# Usage Guide
2+
3+
## Command Line Usage
4+
5+
### Basic Usage
6+
7+
Convert a single file:
8+
9+
```bash
10+
uwotm8 example.txt
11+
```
12+
13+
Convert multiple files:
14+
15+
```bash
16+
uwotm8 file1.txt file2.md file3.py
17+
```
18+
19+
Process an entire directory:
20+
21+
```bash
22+
uwotm8 ./my_project/
23+
```
24+
25+
Read from stdin and write to stdout:
26+
27+
```bash
28+
echo "I love the color gray and my favorite food is filet mignon." | uwotm8
29+
# Output: "I love the colour grey and my favourite food is filet mignon."
30+
```
31+
32+
### Command Line Options
33+
34+
```
35+
usage: uwotm8 [-h] [--check] [--strict] [--include INCLUDE [INCLUDE ...]] [--exclude EXCLUDE [EXCLUDE ...]] [-o OUTPUT] [--version] [src ...]
36+
37+
Convert American English spelling to British English spelling.
38+
39+
positional arguments:
40+
src Files or directories to convert. If not provided, reads from stdin.
41+
42+
options:
43+
-h, --help show this help message and exit
44+
--check Don't write the files back, just return status. Return code 0 means nothing would change. Return code 1 means some files would be reformatted.
45+
--strict Raise an exception if a word cannot be converted.
46+
--include INCLUDE [INCLUDE ...]
47+
File extensions to include when processing directories. Default: .py .txt .md
48+
--exclude EXCLUDE [EXCLUDE ...]
49+
Paths to exclude when processing directories.
50+
-o OUTPUT, --output OUTPUT
51+
Output file (when processing a single file). If not provided, content is written back to source file.
52+
--version show program's version number and exit
53+
```
54+
55+
### Examples
56+
57+
Check which files would be changed without modifying them:
58+
59+
```bash
60+
uwotm8 --check myproject/
61+
```
62+
63+
Convert a file and write the output to a different file:
64+
65+
```bash
66+
uwotm8 american.txt -o british.txt
67+
```
68+
69+
Only convert specific file types in a directory:
70+
71+
```bash
72+
uwotm8 myproject/ --include .md .rst
73+
```
74+
75+
Exclude specific paths:
76+
77+
```bash
78+
uwotm8 myproject/ --exclude myproject/vendor/ myproject/generated/
79+
```
80+
81+
## Python API Usage
82+
83+
For more fine-grained control, you can use the Python API:
84+
85+
### Convert a String
86+
87+
```python
88+
from uwotm8 import convert_american_to_british_spelling
89+
90+
# Basic usage
91+
text = "The color of the theater is gray."
92+
result = convert_american_to_british_spelling(text)
93+
print(result) # "The colour of the theatre is grey."
94+
95+
# With strict mode
96+
try:
97+
result = convert_american_to_british_spelling(text, strict=True)
98+
except Exception as e:
99+
print(f"Conversion error: {e}")
100+
```
101+
102+
### Convert a File
103+
104+
```python
105+
from uwotm8 import convert_file
106+
107+
# Convert a file in-place
108+
convert_file("document.txt")
109+
110+
# Convert a file and write to a new file
111+
convert_file("document.txt", "document_gb.txt")
112+
113+
# Check if changes would be made without modifying the file
114+
would_change = convert_file("document.txt", check=True)
115+
if would_change:
116+
print("File would be modified")
117+
else:
118+
print("No changes needed")
119+
```
120+
121+
### Process Multiple Files
122+
123+
```python
124+
from uwotm8 import process_paths
125+
126+
# Process multiple files and directories
127+
total, modified = process_paths(["file1.txt", "directory/"])
128+
print(f"Processed {total} files, modified {modified}")
129+
130+
# Check mode
131+
total, modified = process_paths(["file1.txt", "directory/"], check=True)
132+
print(f"Would modify {modified} of {total} files")
133+
```
134+
135+
### Stream Processing
136+
137+
```python
138+
from uwotm8 import convert_stream
139+
140+
# Process a stream of lines
141+
with open("input.txt", "r") as f:
142+
for converted_line in convert_stream(f):
143+
print(converted_line, end="")
144+
```
145+
146+
## Special Cases and Context Handling
147+
148+
uwotm8 includes intelligent handling of various text contexts:
149+
150+
### Hyphenated Terms
151+
152+
Words that are part of hyphenated terms are preserved in their original form. For example:
153+
154+
```bash
155+
echo "The colors are red and blue, but a 3-color system is used." | uwotm8
156+
# Output: "The colours are red and blue, but a 3-color system is used."
157+
```
158+
159+
This is useful for preserving technical terminology and compound adjectives where conversion might be inappropriate.
160+
161+
### Code Blocks
162+
163+
Words within code blocks (surrounded by backticks) are not converted:
164+
165+
```bash
166+
echo "The `setColor(color)` function sets the color." | uwotm8
167+
# Output: "The `setColor(color)` function sets the colour."
168+
```
169+
170+
### URLs and URIs
171+
172+
Words that appear in lines containing URLs or URIs are not converted:
173+
174+
```bash
175+
echo "Visit http://example.com/color-picker to select a color." | uwotm8
176+
# Output: "Visit http://example.com/color-picker to select a colour."
177+
```
178+
179+
### Technical Terms Blacklist
180+
181+
A blacklist of technical terms that shouldn't be converted is maintained:
182+
183+
```bash
184+
echo "This program uses an analog signal processor." | uwotm8
185+
# Output: "This program uses an analog signal processor."
186+
```
187+
188+
Common blacklisted terms include:
189+
190+
- "program" (vs "programme") in computing contexts
191+
- "disk" (vs "disc") in computing contexts
192+
- "analog" (vs "analogue") in technical contexts
193+
- "filet" (vs "fillet") in culinary contexts

mkdocs.yml

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ copyright: Maintained by <a href="https://i-dot-ai.com">i.AI</a>
99

1010
nav:
1111
- Home: index.md
12-
- Modules: modules.md
12+
- Usage Guide: usage.md
13+
- Implementation Details: modules.md
14+
- Abbreviations: abbreviations.md
1315

1416
theme:
1517
logo: assets/i-dot-ai-white-invert.svg

pyproject.toml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[project]
22
name = "uwotm8"
3-
version = "0.0.4"
3+
version = "0.1.0"
44
description = "Converting American English to British English"
55
authors = [{name = "i.AI", email = "[email protected]"}]
66
repository = "https://github.com/i-dot-ai/uwotm8"
@@ -11,6 +11,9 @@ packages = [
1111
{include = "uwotm8"}
1212
]
1313

14+
[project.scripts]
15+
uwotm8 = "uwotm8.convert:main"
16+
1417
[tool.poetry.dependencies]
1518
python = ">=3.9,<4.0"
1619
breame = "^0.1.2"

0 commit comments

Comments
 (0)