Quantized GLM-4.7-Flash with llama.cpp and opencode.
3
#66 opened 8 days ago
by
ghostwithahat
Is FIM supported for code completion?
#65 opened 13 days ago
by
kuang12
WARNING 02-08 00:50:40 [vllm.py:1500] `torch.compile` is turned on, but the model zai-org/GLM-4.7-Flash does not support it. Please open an issue on GitHub if you want it to be supported.
1
#62 opened 16 days ago
by
ANASDAVOODTK
This model keep telling lies
1
#61 opened 17 days ago
by
awesome747
Quantized Variants Now Available for FP8 and W8A16
#60 opened 18 days ago
by
Geodd
Use the correct way of initializing using latest transformers latest branch v5.0.0
1
#58 opened 25 days ago
by
rjmehta
[Docs] Add LightLLM deployment example
2
#57 opened 25 days ago
by
FubaoSu
Off reasoning
👀 2
3
#56 opened 27 days ago
by
kmahdi
为什么要设计成是否think必须用户指定而不能模型自己选择
#50 opened about 1 month ago
by
fudayuan
Long context reasoning scores bad?
2
#49 opened about 1 month ago
by
sebastienbo
Endless Repetition? Anyone encountered?
5
#48 opened about 1 month ago
by
evilperson068
Inference Much slower as compared to other A3B Models
👀 👍 3
5
#47 opened about 1 month ago
by
engrtipusultan
Possible to run this in 8GB VRAM + 48GB RAM?
7
#46 opened about 1 month ago
by
krigeta
Excellent model - short feedback
2
#44 opened about 1 month ago
by
Dampfinchen
Thank you Z.AI, I love this model! ❤
👀 ❤️ 7
4
#43 opened about 1 month ago
by
MrDevolver
VLLM NVFP4 PROBLEM
1
#41 opened about 1 month ago
by
prudant
Model breaks apart when used with different languages
2
#38 opened about 1 month ago
by
nephepritou
Number of layers 47 or 48
2
#37 opened about 1 month ago
by
jKqfO84n
Amazing! look what this local AI generated in 5 minutes.
👍 🤯 4
7
#36 opened about 1 month ago
by
robert1968
Problems with logical reasoning performance of GLM-4.7-Flash
👀 1
1
#35 opened about 1 month ago
by
sszymczyk
There is no module or parameter named 'model.layers.5.mlp.gate.e_score_correction_bias' in TransformersMoEForCausalLM
➕ 12
1
#34 opened about 1 month ago
by
divinefeng
open Tau^2 benchmark codebase?
👍 1
#33 opened about 1 month ago
by
howtain
Please consider making it available through your official chat website. ❤
#32 opened about 1 month ago
by
MrDevolver
Do you guys have a plan to create dense coding specific model?
2
#31 opened about 1 month ago
by
hanzceo
config.json - "scoring_func": "sigmoid"
👍 1
#28 opened about 1 month ago
by
algorithm
Question about model usage in Turkish
#27 opened about 1 month ago
by
0xStego
出现UNEXPECTED 警告
#26 opened about 1 month ago
by
shanlinguoke
unsupport glm4-moe-lite
2
#25 opened about 1 month ago
by
cppowboy
cannot import name 'AutoModelForVision2Seq' from 'transformers'
#24 opened about 1 month ago
by
marsmc
Problem with model
7
#22 opened about 1 month ago
by
dwojcik
Why does the KV cache occupy so much GPU memory?
13
#21 opened about 1 month ago
by
yyg201708
Excellent version
🔥 5
5
#19 opened about 1 month ago
by
luxiangyu
Cannot run vLLM on DGX Spark: ImportError: libcudart.so.12
4
#18 opened about 1 month ago
by
yyg201708
I hope GLM can release version 4.6 Air with Chinese thought processes, as version 4.7 seems to be written entirely in English. Alternatively, I'd like to release version 4.8 Air directly.
👀 🤗 5
#15 opened about 1 month ago
by
mimeng1990
Installation Video and Testing - Step by Step
👍 1
#13 opened about 1 month ago
by
fahdmirzac
llama.cpp inference - 20 times (!) slower than OSS 20 on a RTX 5090
➕ 1
9
#12 opened about 1 month ago
by
cmp-nct
Thank you!
🔥 18
#4 opened about 1 month ago
by
mav23
Enormous KV-cache size?
👍 ➕ 6
23
#3 opened about 1 month ago
by
nephepritou
Base model
🔥 8
3
#2 opened about 1 month ago
by
tcpmux
Performance Discussion
👀 2
3
#1 opened about 1 month ago
by
IndenScale