---
license: apache-2.0
language:
  - en
library_name: smriti-memory-ai
pipeline_tag: text-generation
base_model_relation: adapter
base_model:
  - google/gemma-4-E2B-it
  - Qwen/Qwen2.5-1.5B-Instruct
  - meta-llama/Llama-3.2-1B-Instruct
  - microsoft/Phi-3-mini-4k-instruct
datasets:
  - luciferai-devil/smriti-ai-benchmarks
tags:
  - smriti-ai
  - memory
  - agent-memory
  - long-term-memory
  - external-memory
  - training-free
  - frozen-model
  - inference-time-augmentation
  - retrieval-augmented-generation
  - rag
  - semantic-search
  - knowledge-graph
  - identity-continuity
  - small-language-model
  - small-language-models
  - ai-agent
  - gemma
  - gemma-4
  - qwen
  - qwen2.5
  - llama
  - llama-3.2
  - phi-3
---

# Smriti AI

## What this is

Smriti AI is a memory-augmented inference layer for small language models. It adds external memory, semantic retrieval, knowledge-graph recall, identity continuity, and privacy-ready memory deletion without changing base model weights.

This repository layout is intended for a Hugging Face model-style deployment with a custom `handler.py`. The handler loads a base causal language model or calls a remote model endpoint, wraps it with Smriti AI memory, and returns model responses plus retrieved memories.

This model-card template targets Smriti AI v1.0.9. The companion public benchmark dataset is `luciferai-devil/smriti-ai-benchmarks`, and the CPU-safe demo Space target is `luciferai-devil/smriti-ai-demo`.

## Discovery keywords

Smriti AI is designed for people searching for Gemma memory, Qwen memory, small model memory, agent memory, external memory, long-term memory, semantic recall, graph recall, and training-free memory augmentation.

## What this is not

Smriti AI is not a newly trained foundation model. It is not a fine-tuned model unless a separate fine-tuned checkpoint is explicitly included. It is an inference-time wrapper around a base language model.

Do not interpret this repository as a standalone model checkpoint or a Gemma/Qwen release checkpoint. Use the original base-model repositories when you need the base checkpoint itself. The base model is configured through `BASE_MODEL_ID` or `HF_ENDPOINT_URL`.

## Research Lineage

Smriti AI follows four principles:

- **External memory**: conversational facts live outside model weights in a persistent, inspectable store.
- **Training-free recall**: relevant facts are retrieved and injected at inference time without fine-tuning the base model.
- **Identity continuity**: persona evidence is tracked as an embedding fingerprint so outputs can be checked for drift.
- **Small-model augmentation**: small causal language models can become more useful when paired with explicit memory and retrieval.

Historical GodelAI-Lite results were measured on an earlier system. Current Smriti AI results are measured separately and should not be conflated with historical results.

## Architecture

```text
User request
  -> Smriti AI handler
  -> memory retrieval
  -> graph retrieval
  -> identity context
  -> base model inference
  -> response
  -> memory write/update
```

The handler supports JSON, SQLite, Redis, and Postgres memory backends. For production, use Redis/Postgres or another external durable store. Do not store private user memory in the Hugging Face model repository.

## Supported base models

Smriti AI is model-agnostic for Hugging Face causal language models.

Supported families depend on the installed `transformers` version and endpoint hardware:

- Gemma-style causal LMs when available, including the current benchmark path `google/gemma-4-E2B-it`.
- Qwen-style causal LMs such as `Qwen/Qwen2.5-1.5B-Instruct` when supported by the runtime.
- Llama/Phi/Mistral-style causal LMs if supported by the runtime environment.
- Deterministic CI checks are kept outside public benchmark claims.

## Evaluation

Current benchmark artifacts in the main Smriti AI repository report real-model
validation over generated public SmritiBench memory fixtures. They are not
MLPerf certification, HELM certification, or final external industry benchmark
evidence.

Benchmark-readiness audit status: `benchmark_invalid_provenance`.

The validation artifact is `results/current/industry_benchmark_summary.json`. It
records model IDs, seeds, hardware/provider metadata, and privacy/delete/security
counters, but it is labeled
`real_model_structured_fixture_validation_not_public_claim` until an accepted
external benchmark/dataset or third-party evaluation process is used. Historical
GodelAI-Lite results were measured on an earlier system and should not be
conflated with current Smriti AI results.

## Privacy

Smriti AI stores user memory. Treat it as user data.

- Memory can be encrypted by setting `SMRITI_ENCRYPTION_KEY`.
- `delete_memory` is supported by the handler.
- Production deployments should use external memory storage such as Redis/Postgres.
- Do not store private user memory in the Hugging Face model repository.
- Public/demo deployments should not receive real PII.

## Limitations

- Retrieval quality depends on the quality and specificity of stored memory.
- Public/demo deployments should not receive real PII.
- Durable memory requires external backend or persistent endpoint storage.
- Latency depends on the base model, backend, retrieval mode, and endpoint hardware.
- CPU demo mode validates handler plumbing but will not produce Gemma-quality answers.
- If no `BASE_MODEL_ID` or `HF_ENDPOINT_URL` is configured, the handler returns memory-only responses.

## Environment variables

| Variable | Purpose |
|---|---|
| `BASE_MODEL_ID` | Hugging Face model ID to load inside the endpoint. |
| `HF_ENDPOINT_URL` | Optional remote model endpoint URL. If set, the handler calls this URL instead of loading a local base model. |
| `HF_TOKEN` | Token for gated/private base models or protected remote endpoints. |
| `SMRITI_MEMORY_BACKEND` | `json`, `sqlite`, `redis`, or `postgres`. |
| `SMRITI_MEMORY_PATH` | JSON user-memory directory or SQLite file path. |
| `REDIS_URL` | External Redis URL. Takes precedence when present. |
| `POSTGRES_DSN` | External Postgres DSN. Takes precedence when present and Redis is not configured. |
| `SMRITI_ENCRYPTION_KEY` | Memory encryption key. Do not commit it. |
| `SMRITI_RETRIEVAL_MODE` | `tfidf`, `semantic`, `semantic_graph`, or `semantic_graph_identity`. |
| `SMRITI_PUBLIC_DEMO` | `true` or `false`. Use `true` only for non-PII demos. |
| `SMRITI_MAX_MEMORY_ENTRIES` | Maximum retained entries per user/topic. |

## How to call the endpoint

### Chat / fact injection

```json
{
  "inputs": {
    "operation": "chat",
    "user_id": "customer-123",
    "message": "My name is Alex and I am a marine biologist.",
    "retrieval_mode": "semantic_graph_identity"
  },
  "parameters": {
    "max_new_tokens": 256,
    "temperature": 0.7,
    "top_p": 0.9,
    "return_memories": true
  }
}
```

### Recall

```json
{
  "inputs": {
    "operation": "chat",
    "user_id": "customer-123",
    "message": "What do you remember about me?",
    "retrieval_mode": "semantic_graph_identity"
  },
  "parameters": {
    "return_memories": true
  }
}
```

### Delete memory

```json
{
  "inputs": {
    "operation": "delete_memory",
    "user_id": "customer-123"
  }
}
```

### Health

```json
{
  "inputs": {
    "operation": "health"
  }
}
```

## Local test

```bash
pip install -r requirements.txt
BASE_MODEL_ID=google/gemma-4-E2B-it HF_TOKEN=$HF_TOKEN SMRITI_MEMORY_BACKEND=json SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json python test_handler_local.py
```

## Custom-container deployment

If the standard Hugging Face handler is insufficient for your model size, CUDA libraries, Redis client policy, or enterprise network requirements, deploy the same files in a custom container. Use the main Smriti AI repository Dockerfiles as the starting point, install this handler, and expose a compatible HTTP API through Hugging Face Inference Endpoints custom container support.

<!-- HARNESS_EVOLUTION_RESULTS_START -->
## Harness Evolution Results

The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation.

| System | Recall | Precision@K | p95 latency ms | Token overhead | Privacy delete |
|---|---:|---:|---:|---:|---|
| baseline_frozen_model | 0.000 | 0.000 | 0.000 | 0 | True |
| smriti_seed_harness | 1.000 | 0.333 | 0.525 | 328 | True |
| smriti_evolved_harness | 1.000 | 0.333 | 0.168 | 328 | True |

Cross-model harness validation:

| Model | Seed recall | Evolved recall | Gate |
|---|---:|---:|---|
| google/gemma-4-E2B-it | 1.000 | 1.000 | pass |
| meta-llama/Llama-3.2-1B | 1.000 | 1.000 | pass |
| microsoft/Phi-3-mini-4k-instruct | 1.000 | 1.000 | pass |
| mistralai/Mistral-7B-Instruct-v0.3 | 1.000 | 1.000 | pass |
| Qwen/Qwen2.5-1.5B-Instruct | 1.000 | 1.000 | pass |

Production gate report: `results/production_gate_report.md`

Historical GodelAI-Lite results remain separate lineage and are not conflated with current Smriti AI harness metrics.
Deterministic CI checks are used only for stability and never counted as public benchmark evidence.
<!-- HARNESS_EVOLUTION_RESULTS_END -->