Omarrran commited on
Commit
994ed1b
·
verified ·
1 Parent(s): caae7b9

Upload kashmiri_char_tokenizer/README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. kashmiri_char_tokenizer/README.md +3 -3
kashmiri_char_tokenizer/README.md CHANGED
@@ -23,7 +23,7 @@ datasets:
23
  | Architecture | Character-Level |
24
  | Language | Kashmiri (ks / kas) |
25
  | Script | Perso-Arabic (Nastaliq) |
26
- | Vocabulary Size | 135 |
27
  | Training Corpus | KS-LIT-3M (3,091,180 words) |
28
  | License | Apache-2.0 |
29
 
@@ -33,9 +33,9 @@ datasets:
33
  |--------|-------|-------------|
34
  | Fertility | 5.2453 | Tokens per word (lower = better) |
35
  | Diacritic Preservation Score (DPS) | 0.0000 | Novel KS-specific metric (1.0 = perfect) |
36
- | Morphological Boundary Alignment (MBA) | 0.2434 | IoU with gold morpheme boundaries |
37
  | OOV Rate (held-out) | 0.0000 | Tested on unseen text |
38
- | Composite Quality Score (CQS) | 0.2595 | Weighted combination |
39
 
40
  ## 🎯 Recommended Use Cases
41
 
 
23
  | Architecture | Character-Level |
24
  | Language | Kashmiri (ks / kas) |
25
  | Script | Perso-Arabic (Nastaliq) |
26
+ | Vocabulary Size | 134 |
27
  | Training Corpus | KS-LIT-3M (3,091,180 words) |
28
  | License | Apache-2.0 |
29
 
 
33
  |--------|-------|-------------|
34
  | Fertility | 5.2453 | Tokens per word (lower = better) |
35
  | Diacritic Preservation Score (DPS) | 0.0000 | Novel KS-specific metric (1.0 = perfect) |
36
+ | Morphological Boundary Alignment (MBA) | 0.1994 | IoU with gold morpheme boundaries |
37
  | OOV Rate (held-out) | 0.0000 | Tested on unseen text |
38
+ | Composite Quality Score (CQS) | 0.2895 | Weighted combination |
39
 
40
  ## 🎯 Recommended Use Cases
41