--- license: apache-2.0 language: - en library_name: smriti-memory-ai pipeline_tag: text-generation base_model_relation: adapter base_model: - google/gemma-4-E2B-it - Qwen/Qwen2.5-1.5B-Instruct - meta-llama/Llama-3.2-1B-Instruct - microsoft/Phi-3-mini-4k-instruct datasets: - luciferai-devil/smriti-ai-benchmarks tags: - smriti-ai - memory - agent-memory - long-term-memory - external-memory - training-free - frozen-model - inference-time-augmentation - retrieval-augmented-generation - rag - semantic-search - knowledge-graph - identity-continuity - small-language-model - small-language-models - ai-agent - gemma - gemma-4 - qwen - qwen2.5 - llama - llama-3.2 - phi-3 --- # Smriti AI ## What this is Smriti AI is a memory-augmented inference layer for small language models. It adds external memory, semantic retrieval, knowledge-graph recall, identity continuity, and privacy-ready memory deletion without changing base model weights. This repository layout is intended for a Hugging Face model-style deployment with a custom `handler.py`. The handler loads a base causal language model or calls a remote model endpoint, wraps it with Smriti AI memory, and returns model responses plus retrieved memories. This model-card template targets Smriti AI v1.0.9. The companion public benchmark dataset is `luciferai-devil/smriti-ai-benchmarks`, and the CPU-safe demo Space target is `luciferai-devil/smriti-ai-demo`. ## Discovery keywords Smriti AI is designed for people searching for Gemma memory, Qwen memory, small model memory, agent memory, external memory, long-term memory, semantic recall, graph recall, and training-free memory augmentation. ## What this is not Smriti AI is not a newly trained foundation model. It is not a fine-tuned model unless a separate fine-tuned checkpoint is explicitly included. It is an inference-time wrapper around a base language model. Do not interpret this repository as a standalone model checkpoint or a Gemma/Qwen release checkpoint. Use the original base-model repositories when you need the base checkpoint itself. The base model is configured through `BASE_MODEL_ID` or `HF_ENDPOINT_URL`. ## Research Lineage Smriti AI follows four principles: - **External memory**: conversational facts live outside model weights in a persistent, inspectable store. - **Training-free recall**: relevant facts are retrieved and injected at inference time without fine-tuning the base model. - **Identity continuity**: persona evidence is tracked as an embedding fingerprint so outputs can be checked for drift. - **Small-model augmentation**: small causal language models can become more useful when paired with explicit memory and retrieval. Historical GodelAI-Lite results were measured on an earlier system. Current Smriti AI results are measured separately and should not be conflated with historical results. ## Architecture ```text User request -> Smriti AI handler -> memory retrieval -> graph retrieval -> identity context -> base model inference -> response -> memory write/update ``` The handler supports JSON, SQLite, Redis, and Postgres memory backends. For production, use Redis/Postgres or another external durable store. Do not store private user memory in the Hugging Face model repository. ## Supported base models Smriti AI is model-agnostic for Hugging Face causal language models. Supported families depend on the installed `transformers` version and endpoint hardware: - Gemma-style causal LMs when available, including the current benchmark path `google/gemma-4-E2B-it`. - Qwen-style causal LMs such as `Qwen/Qwen2.5-1.5B-Instruct` when supported by the runtime. - Llama/Phi/Mistral-style causal LMs if supported by the runtime environment. - Deterministic CI checks are kept outside public benchmark claims. ## Evaluation Current benchmark artifacts in the main Smriti AI repository report real-model validation over generated public SmritiBench memory fixtures. They are not MLPerf certification, HELM certification, or final external industry benchmark evidence. Benchmark-readiness audit status: `benchmark_invalid_provenance`. The validation artifact is `results/current/industry_benchmark_summary.json`. It records model IDs, seeds, hardware/provider metadata, and privacy/delete/security counters, but it is labeled `real_model_structured_fixture_validation_not_public_claim` until an accepted external benchmark/dataset or third-party evaluation process is used. Historical GodelAI-Lite results were measured on an earlier system and should not be conflated with current Smriti AI results. ## Privacy Smriti AI stores user memory. Treat it as user data. - Memory can be encrypted by setting `SMRITI_ENCRYPTION_KEY`. - `delete_memory` is supported by the handler. - Production deployments should use external memory storage such as Redis/Postgres. - Do not store private user memory in the Hugging Face model repository. - Public/demo deployments should not receive real PII. ## Limitations - Retrieval quality depends on the quality and specificity of stored memory. - Public/demo deployments should not receive real PII. - Durable memory requires external backend or persistent endpoint storage. - Latency depends on the base model, backend, retrieval mode, and endpoint hardware. - CPU demo mode validates handler plumbing but will not produce Gemma-quality answers. - If no `BASE_MODEL_ID` or `HF_ENDPOINT_URL` is configured, the handler returns memory-only responses. ## Environment variables | Variable | Purpose | |---|---| | `BASE_MODEL_ID` | Hugging Face model ID to load inside the endpoint. | | `HF_ENDPOINT_URL` | Optional remote model endpoint URL. If set, the handler calls this URL instead of loading a local base model. | | `HF_TOKEN` | Token for gated/private base models or protected remote endpoints. | | `SMRITI_MEMORY_BACKEND` | `json`, `sqlite`, `redis`, or `postgres`. | | `SMRITI_MEMORY_PATH` | JSON user-memory directory or SQLite file path. | | `REDIS_URL` | External Redis URL. Takes precedence when present. | | `POSTGRES_DSN` | External Postgres DSN. Takes precedence when present and Redis is not configured. | | `SMRITI_ENCRYPTION_KEY` | Memory encryption key. Do not commit it. | | `SMRITI_RETRIEVAL_MODE` | `tfidf`, `semantic`, `semantic_graph`, or `semantic_graph_identity`. | | `SMRITI_PUBLIC_DEMO` | `true` or `false`. Use `true` only for non-PII demos. | | `SMRITI_MAX_MEMORY_ENTRIES` | Maximum retained entries per user/topic. | ## How to call the endpoint ### Chat / fact injection ```json { "inputs": { "operation": "chat", "user_id": "customer-123", "message": "My name is Alex and I am a marine biologist.", "retrieval_mode": "semantic_graph_identity" }, "parameters": { "max_new_tokens": 256, "temperature": 0.7, "top_p": 0.9, "return_memories": true } } ``` ### Recall ```json { "inputs": { "operation": "chat", "user_id": "customer-123", "message": "What do you remember about me?", "retrieval_mode": "semantic_graph_identity" }, "parameters": { "return_memories": true } } ``` ### Delete memory ```json { "inputs": { "operation": "delete_memory", "user_id": "customer-123" } } ``` ### Health ```json { "inputs": { "operation": "health" } } ``` ## Local test ```bash pip install -r requirements.txt BASE_MODEL_ID=google/gemma-4-E2B-it HF_TOKEN=$HF_TOKEN SMRITI_MEMORY_BACKEND=json SMRITI_MEMORY_PATH=/tmp/smriti_hf_test.json python test_handler_local.py ``` ## Custom-container deployment If the standard Hugging Face handler is insufficient for your model size, CUDA libraries, Redis client policy, or enterprise network requirements, deploy the same files in a custom container. Use the main Smriti AI repository Dockerfiles as the starting point, install this handler, and expose a compatible HTTP API through Hugging Face Inference Endpoints custom container support. ## Harness Evolution Results The base model remains frozen. Smriti AI is not fine-tuned; these numbers come from memory-harness evaluation. | System | Recall | Precision@K | p95 latency ms | Token overhead | Privacy delete | |---|---:|---:|---:|---:|---| | baseline_frozen_model | 0.000 | 0.000 | 0.000 | 0 | True | | smriti_seed_harness | 1.000 | 0.333 | 0.525 | 328 | True | | smriti_evolved_harness | 1.000 | 0.333 | 0.168 | 328 | True | Cross-model harness validation: | Model | Seed recall | Evolved recall | Gate | |---|---:|---:|---| | google/gemma-4-E2B-it | 1.000 | 1.000 | pass | | meta-llama/Llama-3.2-1B | 1.000 | 1.000 | pass | | microsoft/Phi-3-mini-4k-instruct | 1.000 | 1.000 | pass | | mistralai/Mistral-7B-Instruct-v0.3 | 1.000 | 1.000 | pass | | Qwen/Qwen2.5-1.5B-Instruct | 1.000 | 1.000 | pass | Production gate report: `results/production_gate_report.md` Historical GodelAI-Lite results remain separate lineage and are not conflated with current Smriti AI harness metrics. Deterministic CI checks are used only for stability and never counted as public benchmark evidence.