๐Ÿ’ 

DISCORD โ†’ LETTA โ†’ CLAUDE

๐Ÿ’  Architecture Documentation ๐Ÿ’ 

Interface โ†’ Memory โ†’ Reasoning

Letta Architecture Overview

Core Concept
Letta implements memory-augmented agents through block-based memory architecture. Each agent maintains discrete memory blocks (core, human, persona, system) persisted across sessions. Memory blocks are injected into LLM context at runtime, enabling stateful conversations with semantic continuity.
Memory Blocks
Core Memory: Fixed-size working memory (2048 tokens). Human Block: User profile, preferences, context. Persona Block: Agent identity, behavior instructions. System Block: Tool schemas, execution context. Blocks stored in PostgreSQL with embedding vectors for semantic retrieval.
Tool System
Tools serialized as JSON schemas with function signatures. LLM generates structured tool calls parsed by Letta runtime. Tool execution creates async jobs with state tracking (pending โ†’ running โ†’ completed/failed). Results appended to message history for multi-turn reasoning.
Async Job Manager
Job queue processes tool executions with timeout handling (default 30s). Job statuses exposed via REST API (/agents/{id}/jobs/{job_id}). Supports parallel tool execution for independent operations. Failed jobs retry with exponential backoff (max 3 attempts).

Architecture Diagram: Discord โ†’ Letta โ†’ Claude

sequenceDiagram participant D as Discord participant B as Bot Layer participant L as Letta Agent participant M as Memory Store participant C as Claude API D->>B: WebSocket: message.content B->>B: Extract metadata
(user_id, channel_id, timestamp) B->>L: POST /api/agents/{id}/messages
{role: "user", text: content} L->>M: Vector search
SELECT * FROM embeddings
ORDER BY cosine_similarity M-->>L: Top-K relevant memories
(K=10, threshold=0.7) L->>L: Inject memory blocks
into system prompt L->>C: POST /v1/messages
{model: "sonnet-4.5",
messages: [...context]} C-->>L: Stream tokens
event: message_delta L->>M: Store conversation
INSERT INTO messages L->>M: Update embeddings
INSERT INTO embeddings L-->>B: {status: "success",
message: {text, usage}} B->>D: channel.send(text)
via Discord API

Memory System Architecture

graph TB subgraph "Agent Memory Lifecycle" A[User Message] --> B{Memory Blocks} B -->|Core Memory| C[Working Context
2048 tokens] B -->|Human Block| D[User Profile
Preferences, History] B -->|Persona Block| E[Agent Identity
Behavior Rules] B -->|System Block| F[Tool Schemas
Function Defs] end subgraph "Storage Layer" C --> G[PostgreSQL] D --> G E --> G F --> G G --> H[Embedding Model
text-embedding-3-small] H --> I[Vector Index
pgvector HNSW] end subgraph "Retrieval" I --> J{Semantic Search} J -->|cosine_similarity| K[Top-K Messages] K --> L[Context Window
8192 tokens] end L --> M[LLM Input] style C fill:#e8f0ff style D fill:#e8f0ff style E fill:#e8f0ff style F fill:#e8f0ff style I fill:#7fb5ff style L fill:#5a9cff
Block Update Protocol
Memory blocks modified via core_memory_append(name, content), core_memory_replace(name, old, new). Updates trigger re-embedding with timestamp versioning. Block size limits enforced: core=2048, human=4096, persona=1024. Overflow triggers automatic summarization with Claude.

Tool Execution Pipeline

flowchart LR A[LLM Response] --> B{Parse Output} B -->|Text| C[Send to User] B -->|Tool Call| D[Extract JSON
tool_name, params] D --> E{Validate Schema} E -->|Invalid| F[Return Error
to LLM] E -->|Valid| G[Create Job
status: pending] G --> H[Job Queue] H --> I{Execute Function} I -->|Success| J[Job Result
status: completed] I -->|Failure| K[Job Error
status: failed] I -->|Timeout| L[Job Timeout
after 30s] J --> M[Append to
Message History] K --> M L --> M M --> N{More Tools?} N -->|Yes| D N -->|No| O[Return to LLM
for next turn] O --> A style D fill:#e8f0ff style G fill:#7fb5ff style I fill:#5a9cff style M fill:#e8f0ff
Tool Serialization
Tools defined as Python functions with type hints. Pydantic models generate JSON Schema for LLM consumption. Schema includes: name, description, parameters (type, required, default). LLM outputs structured JSON: {tool: "function_name", params: {...}}. Runtime deserializes and invokes via reflection.
Built-in Tools
send_message(text): Reply to user. core_memory_append(name, content): Add to memory block. core_memory_replace(name, old, new): Update memory. archival_memory_insert(content): Long-term storage. archival_memory_search(query, page): Semantic retrieval from archive.

Untapped Capabilities: RAG + GitHub Integration

graph TB subgraph "Current State" A[Discord Message] --> B[Letta Agent] B --> C[Memory Blocks Only] C --> D[Limited Context] end subgraph "RAG Extension" E[Document Upload] --> F[Chunking Strategy
512 token overlap] F --> G[Embed with
text-embedding-3-large] G --> H[Vector Store
Pinecone/Qdrant] B --> I{Query Type} I -->|Factual| J[RAG Search] J --> H H --> K[Top-K Chunks
K=5, threshold=0.8] K --> L[Inject into Context] L --> M[Claude with
Augmented Knowledge] end subgraph "GitHub Automation" N[User: Create PR] --> O[Parse Intent] O --> P{GitHub Tool} P -->|create_branch| Q[gh api
POST /repos/{owner}/{repo}/git/refs] P -->|commit_changes| R[gh api
PUT /repos/{owner}/{repo}/contents/{path}] P -->|create_pr| S[gh api
POST /repos/{owner}/{repo}/pulls] P -->|list_prs| T[gh api
GET /repos/{owner}/{repo}/pulls] Q --> U[Job: Branch Created] R --> V[Job: Files Committed] S --> W[Job: PR #123 Created] T --> X[Job: List of PRs] U --> Y[Return to Agent] V --> Y W --> Y X --> Y end M --> Z[Enhanced Response] Y --> Z style H fill:#7fb5ff style M fill:#5a9cff style W fill:#7fb5ff style Z fill:#e8f0ff
RAG Implementation
Add document_upload tool: chunk PDF/MD with 512 token overlap, embed with text-embedding-3-large, store in Qdrant with metadata (source, page, timestamp). At query time: embed user question, vector search with MMR (maximal marginal relevance) for diversity, inject top-K chunks into system prompt before memory blocks. Cost: ~$0.13 per 1M tokens for embeddings.
GitHub Tool Suite
github_create_branch(repo, branch_name, from_branch="main"): Create feature branch. github_commit_file(repo, branch, path, content, message): Commit changes. github_create_pr(repo, title, body, head, base): Open pull request. github_list_prs(repo, state="open"): List PRs with status. github_merge_pr(repo, pr_number, method="squash"): Merge PR. All tools async with 60s timeout.
Implementation Path
1) Install PyGithub/ghapi for GitHub API client. 2) Register tools in Letta with JSON schemas. 3) Add GitHub token to agent environment variables. 4) Test with create_branch โ†’ commit_file โ†’ create_pr workflow. 5) For RAG: pip install qdrant-client openai, implement chunking with langchain.text_splitter, connect to Qdrant Cloud (free tier: 1GB). 6) Update agent system prompt to use RAG when user asks about documents.

Architecture

Discord
WebSocket gateway for real-time message streaming. Discord.py async event loop. User authentication and channel routing.
Letta
Stateful agent orchestration. Vector embeddings for semantic memory. PostgreSQL persistence layer. Context injection middleware.
Claude
Anthropic API integration. Sonnet 4.5 inference engine. Context-aware reasoning with augmented prompts. Token streaming support.

Stack

Interface Layer
Discord.py ยท WebSocket ยท REST API
Memory Layer
Letta Platform ยท Vector DB ยท PostgreSQL
LLM Provider
Anthropic Claude API ยท Sonnet 4.5
Infrastructure
Nginx ยท Docker ยท Linux
Protocol
HTTP/2 ยท TLS 1.3 ยท WebSocket

Protocol Flow

01
User message received via Discord WebSocket
02
Bot extracts content, metadata, and channel context
03
HTTP POST to Letta agent endpoint
04
Vector search retrieves semantic conversation history
05
Context-augmented prompt construction
06
Claude API request with injected memory context
07
Inference with chain-of-thought reasoning
08
Response stored in conversation memory
09
Formatted response returned to Discord
10
Message delivered via channel webhook

Features

Persistent Memory
Long-term context retention. User profiles and preferences. Semantic retrieval across sessions.
Vector Embeddings
Contextual awareness through semantic search. Dynamic memory injection based on relevance scoring.
Async Processing
Non-blocking event loops. Concurrent request handling. Real-time typing indicators.