A Swift package for Part-of-Speech (POS) tagging using BERT-based machine learning models with CoreML integration.
- 🧠 BERT-based Model: Utilizes a quantized BERT model for accurate POS tagging
- 📱 iOS/macOS Support: Compatible with iOS 16+ and macOS 12+
- ⚡ CoreML Integration: Optimized for Apple Silicon and GPU acceleration
- 🔄 Automatic Model Management: Built-in model download and extraction
- 🎯 Simple API: Clean, easy-to-use interface
- 📦 Swift Package Manager: Easy integration into your projects
Add SwiftPOSTagger to your project using Xcode:
- File → Add Package Dependencies
- Enter the repository URL:
https://github.com/Otosaku/OtosakuPOSTagger-iOS - Select the version you want to use
Or add it to your Package.swift:
dependencies: [
.package(url: "https://github.com/Otosaku/OtosakuPOSTagger-iOS", from: "1.0.0")
]import SwiftPOSTagger
// Initialize the tagger with model directory
let modelDirectoryURL = // URL to directory containing model files
let tagger = try SwiftPOSTagger(modelDirectoryURL: modelDirectoryURL)
// Perform POS tagging
let text = "The quick brown fox jumps over the lazy dog."
let results = try tagger.predict(text: text)
// Results is an array of (word, tag) tuples
for (word, tag) in results {
print("\(word) -> \(tag)")
}The -> DT
quick -> JJ
brown -> JJ
fox -> NN
jumps -> VBZ
over -> IN
the -> DT
lazy -> JJ
dog -> NN
. -> .
The library expects a directory containing:
ModelQuantized.mlmodelc- The CoreML model filevocab.txt- BERT vocabulary fileoutTokens.txt- POS tag labels
You can download the pre-trained model or use your own compatible BERT-based POS tagging model.
You can specify which compute units to use for inference:
// Use Neural Engine (default)
let tagger = try SwiftPOSTagger(modelDirectoryURL: modelURL, computeUnits: .all)
// Use CPU only
let tagger = try SwiftPOSTagger(modelDirectoryURL: modelURL, computeUnits: .cpuOnly)
// Use GPU only
let tagger = try SwiftPOSTagger(modelDirectoryURL: modelURL, computeUnits: .cpuAndGPU)This library uses a quantized BERT model specifically fine-tuned for Part-of-Speech tagging. The model:
- Architecture: BERT-base with classification head
- Quantization: INT8 quantized for efficient mobile inference
- Input: Tokenized text with BERT tokenization
- Output: POS tags following Penn Treebank tagset
- Context Length: Supports up to 128 tokens per inference
The model outputs tags from the Penn Treebank tagset, including:
- NN (Noun, singular), NNS (Noun, plural)
- VB (Verb, base form), VBZ (Verb, 3rd person singular)
- JJ (Adjective), RB (Adverb)
- DT (Determiner), IN (Preposition)
- CC (Coordinating conjunction)
- And many more...
The repository includes a complete example iOS app demonstrating:
- Automatic model download and setup
- Real-time POS tagging with visual results
- Error handling and user feedback
To run the example:
- Open
Example/Example.xcodeproj - Build and run on iOS Simulator or device
- Download the model and test with your own text
The library provides comprehensive error handling:
do {
let results = try tagger.predict(text: text)
// Process results
} catch SwiftPOSTaggerError.modelLoadingFailed(let message) {
print("Model loading failed: \(message)")
} catch SwiftPOSTaggerError.outputExtractionFailed(let message) {
print("Output extraction failed: \(message)")
} catch {
print("Other error: \(error)")
}- iOS 16.0+ / macOS 12.0+
- Xcode 14.0+
- Swift 6.0+
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Built with Apple's CoreML framework
- Uses BERT architecture for natural language understanding
- Tokenization based on BERT WordPiece tokenizer