Upload kashmiri_char_tokenizer/README.md with huggingface_hub
Browse files
kashmiri_char_tokenizer/README.md
CHANGED
|
@@ -23,7 +23,7 @@ datasets:
|
|
| 23 |
| Architecture | Character-Level |
|
| 24 |
| Language | Kashmiri (ks / kas) |
|
| 25 |
| Script | Perso-Arabic (Nastaliq) |
|
| 26 |
-
| Vocabulary Size |
|
| 27 |
| Training Corpus | KS-LIT-3M (3,091,180 words) |
|
| 28 |
| License | Apache-2.0 |
|
| 29 |
|
|
@@ -33,9 +33,9 @@ datasets:
|
|
| 33 |
|--------|-------|-------------|
|
| 34 |
| Fertility | 5.2453 | Tokens per word (lower = better) |
|
| 35 |
| Diacritic Preservation Score (DPS) | 0.0000 | Novel KS-specific metric (1.0 = perfect) |
|
| 36 |
-
| Morphological Boundary Alignment (MBA) | 0.
|
| 37 |
| OOV Rate (held-out) | 0.0000 | Tested on unseen text |
|
| 38 |
-
| Composite Quality Score (CQS) | 0.
|
| 39 |
|
| 40 |
## 🎯 Recommended Use Cases
|
| 41 |
|
|
|
|
| 23 |
| Architecture | Character-Level |
|
| 24 |
| Language | Kashmiri (ks / kas) |
|
| 25 |
| Script | Perso-Arabic (Nastaliq) |
|
| 26 |
+
| Vocabulary Size | 134 |
|
| 27 |
| Training Corpus | KS-LIT-3M (3,091,180 words) |
|
| 28 |
| License | Apache-2.0 |
|
| 29 |
|
|
|
|
| 33 |
|--------|-------|-------------|
|
| 34 |
| Fertility | 5.2453 | Tokens per word (lower = better) |
|
| 35 |
| Diacritic Preservation Score (DPS) | 0.0000 | Novel KS-specific metric (1.0 = perfect) |
|
| 36 |
+
| Morphological Boundary Alignment (MBA) | 0.1994 | IoU with gold morpheme boundaries |
|
| 37 |
| OOV Rate (held-out) | 0.0000 | Tested on unseen text |
|
| 38 |
+
| Composite Quality Score (CQS) | 0.2895 | Weighted combination |
|
| 39 |
|
| 40 |
## 🎯 Recommended Use Cases
|
| 41 |
|