Instructions to use AIMS2025/DeepSignal with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use AIMS2025/DeepSignal with llama-cpp-python:

# !pip install llama-cpp-python

from llama_cpp import Llama

llm = Llama.from_pretrained(
	repo_id="AIMS2025/DeepSignal",
	filename="DeepSignal-Phase-4B_V1.F16.gguf",
)

llm.create_chat_completion(
	messages = [
		{
			"role": "user",
			"content": "What is the capital of France?"
		}
	]
)

Notebooks
Google Colab
Kaggle
Local Apps

llama.cpp

How to use AIMS2025/DeepSignal with llama.cpp:

Install from brew

brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AIMS2025/DeepSignal:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf AIMS2025/DeepSignal:Q4_K_M

Install from WinGet (Windows)

winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf AIMS2025/DeepSignal:Q4_K_M
# Run inference directly in the terminal:
llama-cli -hf AIMS2025/DeepSignal:Q4_K_M

Use pre-built binary

# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf AIMS2025/DeepSignal:Q4_K_M
# Run inference directly in the terminal:
./llama-cli -hf AIMS2025/DeepSignal:Q4_K_M

Build from source code

git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf AIMS2025/DeepSignal:Q4_K_M
# Run inference directly in the terminal:
./build/bin/llama-cli -hf AIMS2025/DeepSignal:Q4_K_M

Use Docker

docker model run hf.co/AIMS2025/DeepSignal:Q4_K_M

LM Studio
Jan

vLLM

How to use AIMS2025/DeepSignal with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AIMS2025/DeepSignal"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "AIMS2025/DeepSignal",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/AIMS2025/DeepSignal:Q4_K_M

Ollama
How to use AIMS2025/DeepSignal with Ollama:
```
ollama run hf.co/AIMS2025/DeepSignal:Q4_K_M
```

Unsloth Studio new

How to use AIMS2025/DeepSignal with Unsloth Studio:

Install Unsloth Studio (macOS, Linux, WSL)

curl -fsSL https://unsloth.ai/install.sh | sh
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AIMS2025/DeepSignal to start chatting

Install Unsloth Studio (Windows)

irm https://unsloth.ai/install.ps1 | iex
# Run unsloth studio
unsloth studio -H 0.0.0.0 -p 8888
# Then open http://localhost:8888 in your browser
# Search for AIMS2025/DeepSignal to start chatting

Using HuggingFace Spaces for Unsloth

# No setup required
# Open https://huggingface.co/spaces/unsloth/studio in your browser
# Search for AIMS2025/DeepSignal to start chatting

Docker Model Runner
How to use AIMS2025/DeepSignal with Docker Model Runner:
```
docker model run hf.co/AIMS2025/DeepSignal:Q4_K_M
```

Lemonade

How to use AIMS2025/DeepSignal with Lemonade:

Pull the model

# Download Lemonade from https://lemonade-server.ai/
lemonade pull AIMS2025/DeepSignal:Q4_K_M

Run and chat with the model

lemonade run user.DeepSignal-Q4_K_M

List all available models

lemonade list

DeepSignal (GGUF)

This repository provides GGUF checkpoints for local inference with traffic-signal-control models from AIMSLaboratory/DeepSignal.

Models

This repository currently contains:

DeepSignal-Phase-4B-V1: next signal-phase prediction
DeepSignal_CyclePlan-4B-V1: cycle-level green-time allocation for all phases in the upcoming signal cycle

Model Files

Filename	Task	Quantization	Size	Notes
`DeepSignal-Phase-4B_V1.F16.gguf`	Phase prediction	F16	~8 GB	Existing phase model
`DeepSignal_CyclePlan-4B-V1-F16.gguf`	Cycle planning	F16	~7.5 GB	Highest-fidelity CyclePlan checkpoint
`DeepSignal_CyclePlan-4B-V1-Q8_0.gguf`	Cycle planning	Q8_0	~4.0 GB	Balanced quality / speed
`DeepSignal_CyclePlan-4B-V1-Q4_K_M.gguf`	Cycle planning	Q4_K_M	~2.4 GB	Recommended for local inference

DeepSignal_CyclePlan-4B-V1

DeepSignal_CyclePlan-4B-V1 is a traffic signal cycle planning model. It takes the predicted traffic state for the next cycle and outputs the green-time allocation for each phase while respecting phase-specific minimum and maximum green constraints.

Recommended Prompt Format

System prompt

You are a traffic signal timing optimization expert.
Please carefully analyze the predicted traffic states for each phase in the next cycle, provide the timing plan for the next cycle, and give your reasoning process.
Place the reasoning process between <start_working_out> and <end_working_out>.
Then, place your final plan between <SOLUTION> and </SOLUTION>.

Input JSON format

Wrap the input with 【cycle_predict_input_json】...【/cycle_predict_input_json】 tags. The core field is prediction.phase_waits, an array of per-phase objects:

phase_id: phase index
pred_saturation: predicted saturation for the next cycle
min_green: minimum allowed green time in seconds
max_green: maximum allowed green time in seconds
capacity: reference capacity used to compute pred_saturation

Quickstart with llama.cpp

Q4_K_M is the recommended local default:

llama-cli -m DeepSignal_CyclePlan-4B-V1-Q4_K_M.gguf \
  --ctx-size 8192 \
  --temp 0.2 \
  -p 'You are a traffic signal timing optimization expert.
Please carefully analyze the predicted traffic states for each phase in the next cycle, provide the timing plan for the next cycle, and give your reasoning process.
Place the reasoning process between <start_working_out> and <end_working_out>.
Then, place your final plan between <SOLUTION> and </SOLUTION>.

【cycle_predict_input_json】{
  "prediction": {
    "as_of": "2026-02-22T10:00:00",
    "phase_waits": [
      {"phase_id": 0, "pred_saturation": 0.80, "min_green": 20, "max_green": 60, "capacity": 100},
      {"phase_id": 1, "pred_saturation": 0.55, "min_green": 15, "max_green": 45, "capacity": 80},
      {"phase_id": 2, "pred_saturation": 0.42, "min_green": 15, "max_green": 35, "capacity": 70}
    ]
  }
}【/cycle_predict_input_json】

Task (must complete):
Mainly based on prediction.phase_waits pred_saturation, output the final green-light time for each phase in the next cycle (unit: seconds) while satisfying all hard constraints.'

Expected Output

The final answer should contain a machine-readable plan inside <SOLUTION>...</SOLUTION>, for example:

[
  {"phase_id": 0, "final": 31},
  {"phase_id": 1, "final": 24},
  {"phase_id": 2, "final": 18}
]

Download Example

huggingface-cli download AIMS2025/DeepSignal DeepSignal_CyclePlan-4B-V1-Q4_K_M.gguf --local-dir .

DeepSignal-Phase-4B-V1

DeepSignal-Phase-4B-V1 is designed for next signal-phase prediction. Given the current traffic scene and state at an intersection, it predicts which signal phase to activate next and for how long.

llama-cli -m DeepSignal-Phase-4B_V1.F16.gguf -p "You are a traffic management expert. You can use your traffic knowledge to solve the traffic signal control task.
Based on the given traffic scene and state, predict the next signal phase and its duration.
You must answer directly, the format must be: next signal phase: {number}, duration: {seconds} seconds
where the number is the phase index (starting from 0) and the seconds is the duration (usually between 20-90 seconds)."

Evaluation (Traffic Simulation)

Performance Metrics Comparison by Model (Phase) *

Model	Avg Saturation	Avg Cumulative Queue Length (veh⋅min)	Avg Throughput (veh/5min)	Avg Response Time (s)
`GPT-OSS-20B (thinking)`	0.380	14.088	77.910	6.768
DeepSignal-Phase-4B (thinking, Ours)	0.422	15.703	79.883	2.131
`Qwen3-30B-A3B`	0.431	17.046	79.059	2.727
`Qwen3-4B`	0.466	57.699	75.712	1.994
Max Pressure	0.465	23.022	77.236	**
`LightGPT-8B-Llama3`	0.523	54.384	75.512	3.025***

*: Each simulation scenario runs for 60 minutes. We discard the first 5 minutes as warm-up, then compute metrics over the next 20 minutes (minute 5 to 25). We cap the evaluation window because, when an LLM controls signal timing for only a single intersection, spillback from neighboring intersections may occur after ~20+ minutes and destabilize the scenario. All evaluations are conducted on a Mac Studio M3 Ultra. **: Max Pressure is a fixed signal-timing optimization algorithm (not an LLM), so we omit its Avg Response Time; this metric is only defined for LLM-based signal-timing optimization. ***: For LightGPT-8B-Llama3, Avg Response Time is computed using only the successful responses.

Performance Metrics Comparison by Model (CyclePlan) *

Model	Format Success Rate (%)	Avg Queue Vehicles	Avg Delay per Vehicle (s)	Throughput (veh/min)	Avg Response Time (s)
DeepSignal_CyclePlan-4B-V1 F16 (thinking, Ours)	100.0	3.504	27.747	8.611	4.351
`GLM-4.7-Flash (thinking)`	100.0	7.323	29.422	8.567	36.388
DeepSignal_CyclePlan-4B-V1 Q4_K_M (thinking, Ours)	98.1	4.783	29.891	7.722	1.674
`Qwen3-30B-A3B`	97.1	6.938	31.135	7.578	7.885
`LightGPT-8B-Llama3`	68.0	5.026	31.266	7.380	167.373
`GPT-OSS-20B (thinking)`	65.4	6.289	31.947	7.247	4.919
`Qwen3-4B (thinking)`	54.1	10.060	48.895	7.096	122.333

*: Each simulation scenario runs for 60 minutes. We discard the first 5 minutes as warm-up, then compute metrics over the next 20 minutes (minute 5 to 25). All evaluations are conducted on a Mac Studio M3 Ultra.

License

This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0). Commercial use is strictly prohibited.

Downloads last month: 246

GGUF

Model size

4B params

Architecture

qwen3

Hardware compatibility

4-bit

8-bit

16-bit