MALM-165M: Memory-Augmented Language Model

A 165M parameter Memory-Augmented Language Model (MALM) for semantic code search, trained on CodeParrot.

Quick Start

# Install dependencies
pip install mlx huggingface_hub numpy

# Download model
huggingface-cli download codelion/malm-165m --local-dir ./malm-165m

# Run semantic search
python malm-165m/inference.py --query "function that sorts a list"

Example output:

Query: function that sorts a list
------------------------------------------------------------

1. array_sort (score: 0.9526)
   Signature: array_sort(col)
   Docstring: Collection function: sorts the input array in ascending order...

2. sort_array (score: 0.7707)
   Signature: sort_array(col, asc)
   Docstring: Collection function: sorts the input array in ascending or descending order...

Python API

from huggingface_hub import snapshot_download
from pathlib import Path
import sys

# Download and import
model_path = snapshot_download("codelion/malm-165m")
sys.path.insert(0, model_path)

from inference import load_model, search_functions

# Load model
model, tokenizer, functions, config = load_model(Path(model_path))
print(f"Loaded {len(functions)} functions")

# Search
results = search_functions(
    model, tokenizer, functions,
    query="connect to database",
    top_k=5
)

for name, signature, docstring, score in results:
    print(f"{name}: {score:.4f}")

Model Description

MALM combines a transformer with learned memory retrieval for semantic code search:

Query encoder - Encodes natural language queries into embeddings
Value encoder - Encodes function signatures/docstrings
Retrieval - Attention-based lookup from query to memory
Memory bank - 2000 Python functions from CodeParrot

Why not mlx-lm?

MALM uses a memory-augmented architecture different from standard LLMs:

Separate query and value encoders for retrieval
Requires a memory bank of functions
Inference is retrieval-based, not autoregressive generation

This architecture doesn't fit mlx-lm generate, so we provide a custom inference script.

Architecture

Component	Parameters
Embedding	11.1M
Position Embedding	0.1M
Query Encoder (4 layers)	28.4M
Value Encoder (4 layers)	28.4M
Decoder (12 layers)	85.1M
Output Projection	11.1M
Total	~165M

Configuration

{
  "vocab_size": 14407,
  "d_model": 768,
  "n_heads": 12,
  "n_layers": 12,
  "n_query_layers": 4,
  "max_seq_len": 128,
  "num_parameters": 165123656,
  "num_functions": 2000
}

Files

File	Description
`model.npz`	Model weights (MLX-compatible NumPy format)
`config.json`	Model configuration
`tokenizer.json`	Tokenizer vocabulary
`functions.json`	Memory bank of 2000 Python functions
`inference.py`	Standalone inference script

Training

Trained on CodeParrot with a focus on Python function retrieval:

Encodes natural language queries into embedding space
Learns semantic similarity between queries and function signatures
Uses attention-based retrieval over a memory bank

Citation

@article{sharma2026malm,
  title={Reverse Engineering a $500M Mystery: From HashHop to Memory-Augmented Language Models},
  author={Sharma, Asankhaya},
  year={2026},
  url={https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop}
}

Related Work

Part of the HashHop project exploring long-context evaluation and memory-augmented architectures.

License

Apache 2.0

Downloads last month: 28

Dataset used to train codelion/malm-165m

Collection including codelion/malm-165m

Nano Language Models

Collection

A collection of really small language models pre-trained from scratch with open-data. Ideal for use in experimentation and comparisions. • 4 items • Updated 1 day ago • 1