NusaCatalogue: https://indonlp.github.io/nusa-catalogue/card.html?icon
| Dataset |
icon |
| Description |
In this work, we publish ICON (Indonesian CONstituency treebank), a manually-annotated benchmark Indonesian constituency treebank with a size of 10,000 sentences and approximately 124,000 constituents and 182,000 tokens, which can support the training of state-of-the-art transformer-based models. We use 15 phrase level tags and 24 POS tags. The sentences were taken from Wikipedia (3000) and news articles (7000). |
| License |
CC-BY-SA 4.0 |