Post

Long-Term Memory Patterns for Production Agents

Long-term memory is what separates agents that get smarter over time from agents that start from scratch on every session. The four memory patterns, how to implement each, and the engineering decisions that make agent memory practical.

Long-Term Memory Patterns for Production Agents

The context window gives agents short-term memory. But context windows clear between sessions. A customer service agent that asks the same clarifying questions every interaction, an engineering assistant that re-learns your codebase conventions each session, a research agent that loses all prior findings on restart — these are agents without long-term memory.

Long-term memory is what allows agents to build on prior interactions rather than starting fresh every time.

graph TD
    A[Session Ends] --> B[Store Episode<br/>What happened]
    B --> C[Extract Semantic Facts<br/>User preferences / domain knowledge]
    C --> D[(Memory Store<br/>Vector DB + Key-Value)]
    E[New Session Starts] --> F[Retrieve Similar Episodes<br/>by embedding similarity]
    F --> G[Retrieve High-confidence Beliefs<br/>from semantic memory]
    G --> H[Inject into System Prompt]
    H --> I[Agent starts with context]
    D --> F
    D --> G

The Four Memory Types

Episodic memory: records of what happened. “In session 47, the user asked about X and we concluded Y.” Stored as structured or unstructured records indexed by time and session.

Semantic memory: generalised knowledge extracted from episodes. “This user prefers concise responses.” “This codebase uses Repository pattern for data access.” Stored as facts that apply across sessions.

Procedural memory: learned workflows. “When the user asks about billing issues, always check account status first, then payment history.” Stored as structured processes or examples that shape agent behaviour.

Working memory: the active context window content — what’s being held in mind right now. Covered in Day 9’s context window management post.


Implementing Episodic Memory

The simplest memory system: store a structured record after each session and retrieve relevant records at the start of new sessions.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
from dataclasses import dataclass, asdict
from datetime import datetime
import json

@dataclass
class EpisodicRecord:
    session_id: str
    timestamp: datetime
    user_id: str
    task_type: str
    summary: str          # What happened in this session
    outcomes: list[str]   # What was accomplished
    key_facts: list[str]  # Facts learned that may be reusable
    embedding: list[float] # For semantic retrieval

async def store_episode(session_data: dict, vector_store) -> None:
    # Generate summary of the session
    summary = await llm.generate(f"""
    Summarise this session in 2-3 sentences, focusing on:
    - What the user was trying to accomplish
    - What was achieved
    - Any important facts or preferences revealed
    
    Session: {json.dumps(session_data)}
    """)
    
    # Extract reusable facts
    facts = await llm.generate(f"""
    Extract any durable facts from this session that would be useful in future sessions.
    Focus on user preferences, domain knowledge, and constraints.
    Return as a JSON list of strings.
    
    Session summary: {summary}
    """)
    
    record = EpisodicRecord(
        session_id=session_data["session_id"],
        timestamp=datetime.now(),
        user_id=session_data["user_id"],
        task_type=session_data["task_type"],
        summary=summary,
        outcomes=session_data.get("outcomes", []),
        key_facts=json.loads(facts),
        embedding=await embed(summary)
    )
    
    await vector_store.upsert(asdict(record))

async def retrieve_relevant_episodes(user_id: str, current_task: str, vector_store) -> list[EpisodicRecord]:
    query_embedding = await embed(current_task)
    return await vector_store.search(
        query_embedding,
        filter={"user_id": user_id},
        top_k=3
    )

Implementing Semantic Memory

Episodic records are specific. Semantic memory generalises them into persistent beliefs about the user or domain.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
class SemanticMemoryStore:
    def __init__(self, db):
        self.db = db
    
    async def update_belief(self, user_id: str, belief_key: str, new_evidence: str):
        existing = await self.db.get(f"belief:{user_id}:{belief_key}")
        
        if existing is None:
            # First evidence for this belief
            belief = await llm.generate(f"Extract a concise belief from: {new_evidence}")
            confidence = 0.6
        else:
            # Update existing belief with new evidence
            updated = await llm.generate(f"""
            Existing belief: {existing['belief']} (confidence: {existing['confidence']})
            New evidence: {new_evidence}
            
            Update the belief based on the new evidence. Return JSON:
            belief
            """)
            result = json.loads(updated)
            belief = result["belief"]
            confidence = result["confidence"]
        
        await self.db.set(f"belief:{user_id}:{belief_key}", {
            "belief": belief,
            "confidence": confidence,
            "last_updated": datetime.now().isoformat()
        })
    
    async def get_relevant_beliefs(self, user_id: str) -> dict:
        keys = await self.db.keys(f"belief:{user_id}:*")
        beliefs = {}
        for key in keys:
            belief_key = key.split(":")[-1]
            beliefs[belief_key] = await self.db.get(key)
        return {k: v for k, v in beliefs.items() if v["confidence"] > 0.5}

Memory Injection at Session Start

At the start of each session, retrieve and inject relevant memory into the context:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
async def build_session_context(user_id: str, current_task: str) -> str:
    # Retrieve relevant episodes
    episodes = await retrieve_relevant_episodes(user_id, current_task, vector_store)
    
    # Retrieve semantic memory
    beliefs = await memory_store.get_relevant_beliefs(user_id)
    
    if not episodes and not beliefs:
        return ""  # No memory to inject
    
    context_parts = []
    
    if beliefs:
        belief_text = "\n".join([f"- {k}: {v['belief']}" for k, v in beliefs.items()])
        context_parts.append(f"What I know about this user:\n{belief_text}")
    
    if episodes:
        episode_text = "\n".join([f"- {ep['summary']}" for ep in episodes])
        context_parts.append(f"Relevant prior sessions:\n{episode_text}")
    
    return "\n\n".join(context_parts)

The memory injection is a system prompt addition. The agent starts the session with relevant context rather than starting cold.


The Privacy and Retention Problem

Long-term memory raises legitimate concerns: what are you storing, who can see it, how long is it retained?

For enterprise agents, answer these explicitly before implementation:

  • Is memory user-scoped, team-scoped, or global?
  • How long do records persist? (GDPR right-to-erasure applies)
  • Who has access to memory records?
  • What happens to memory when a user leaves the organisation?

The technical implementation is the easy part. The governance model for what you’re allowed to remember is the hard part, and it needs to be decided before you build.


Day 17 of the Production Agentic AI series. Previous: Multi-Model Orchestration

This post is licensed under CC BY 4.0 by the author.