AniFileBERT / docs /android.md
ModerRAS's picture
Close schema v2 anime parser release
67d8016

Android Export and Runtime / Android 导出与运行时

AniFileBERT is used by MiruPlay as a Git submodule at tools/anime_parser.

AniFileBERT 在 MiruPlay 中作为 tools/anime_parser 子模块使用。

Export / 导出

From this repository root, export the published root checkpoint:

在本仓库根目录导出当前发布 checkpoint:

uv sync
uv run python -m tools.export_onnx --model-dir . --max-length 128 --android-assets-dir ../../scraper/src/main/assets/anime_parser

The exporter writes:

导出器会写入:

  • exports/anime_filename_parser.onnx
  • exports/anime_filename_parser.metadata.json
  • scraper/src/main/assets/anime_parser/anime_filename_parser.onnx
  • scraper/src/main/assets/anime_parser/vocab.json
  • scraper/src/main/assets/anime_parser/config.json

Static Graph Shape / 静态图 Shape

input_ids      int64[1,128]
attention_mask int64[1,128]
logits         float32[1,128,37]

The final logits dimension must match the schema v2 label map in config.json.

最后一维必须与 config.json 里的 schema v2 label map 一致。

The current export is verified against PyTorch, with max absolute logits difference recorded in exports/anime_filename_parser.metadata.json.

当前导出会和 PyTorch 做数值对齐,最大 logits 误差记录在 exports/anime_filename_parser.metadata.json

Local ONNX Smoke Test / 本地 ONNX 冒烟测试

uv run python -m tools.onnx_inference "[GM-Team][国漫][神印王座][Throne of Seal][2022][200][AVC][GB][1080P].mp4"

Expected fields / 期望字段:

title=神印王座, episode=200, group=GM-Team, resolution=1080P, source=GB

Special-code example / 特典编号示例:

uv run python -m tools.onnx_inference "[YYDM&VCB-Studio] Shinsekai Yori [NCED02][Ma10p_1080p][x265_flac].mkv"

Expected fields / 期望字段:

title=Shinsekai Yori, episode=null, group=YYDM&VCB-Studio, special=NCED02

Runtime Contract / 运行时契约

The ONNX graph returns token logits only. Android must implement the same:

ONNX 图只返回 token logits。Android 必须实现同一套:

  • custom character tokenizer / 自定义字符 tokenizer
  • token id lookup from vocab.json / 使用 vocab.json 查 token id
  • fixed-length padding to 128 / padding 到固定长度 128
  • constrained BIO decoding / 约束 BIO 解码
  • field aggregation / 字段聚合
  • thin string/number normalization / 轻量字符串和数字规范化

The Android runtime implementation lives in MiruPlay:

Android 运行时实现位于 MiruPlay:

scraper/src/main/kotlin/com/miruplay/tv/scraper/filename/AnimeFilenameParser.kt

The app exposes it through FilenameMetadataParser in core:model. During a scan, ScanCoordinator passes that parser into VideoDirectoryClassifier.

应用通过 core:modelFilenameMetadataParser 暴露解析能力。扫描时, ScanCoordinator 会把解析器传给 VideoDirectoryClassifier

Asset Update Rule / 资产更新规则

When updating the parser, keep these files in sync:

更新解析器时,以下文件必须同步:

anime_filename_parser.onnx
vocab.json
config.json

Do not update only the ONNX file. Token ids, label ids, and max length are part of the runtime contract.

不要只更新 ONNX。token id、label id 和 max length 都是运行时契约的一部分。

More Details / 更多说明

See onnx.md for a minimal Python ONNX Runtime reference.

最小 Python ONNX Runtime 参考见 onnx.md