How GhostLoop Learns Your Voice
A deep dive into voice extraction, self-learning feedback loops, and why your AI gets better every week.
TL;DR
- Day 1: GhostLoop extracts an 11-dimension voice profile from your posts using Claude Opus (tone, hooks, closers, anti-patterns, vocabulary).
- It learns at three speeds: minutes (Redis, same-session), days (database patterns), weeks (LLM aggregation into a learning brief).
- Acceptance rate climbs from ~65% in week 1 to 90%+ by month 3. Every approve, reject, and edit teaches the system.
- The learning brief overrides the initial voice profile. What you show the system (through actions) beats what it assumed.
Day 1 posts need editing. Day 7 posts barely need a glance. Here's what happens in between.
Most AI writing tools give you the same quality on day 100 that they gave you on day 1. You paste a prompt, you get output, you fix it. Next time, same thing. The AI learns nothing from the interaction. Every session starts from zero.
We built GhostLoop to work differently. The system that generates your posts today is not the same system that generated them last week. It has watched what you approve, what you reject, what you edit, and how you edit it. It has adjusted. And the gap between "AI output" and "something you would actually post" gets smaller with every interaction.
This post is a technical walkthrough of how AI voice learning works inside GhostLoop. Not marketing claims. The actual architecture, the actual numbers, and the actual trajectory from "this needs work" to "this is ready to post."
The Voice Profile: What Gets Extracted on Day 1
Before GhostLoop generates a single draft, it builds a voice profile from your existing content. You paste in tweets (or we pull your recent posts), and the system runs a one-time extraction using Claude Opus, the most capable model available for this kind of nuanced language analysis.
This is not a "describe your tone in three words" form. It is an 11-dimension profile that captures the structural DNA of how you write. Here is what gets extracted:
Tone Axes
Four axes, each scored 1-10 with evidence citations from your actual posts: formality, humor, confidence, and provocativeness. The scoring matters because "confident" means different things at 6/10 versus 9/10. A 6 is assured. A 9 is combative. The model cites specific tweets to justify each score, so the profile is grounded in your real output, not guesswork.
Sentence Structure
Average sentence length, variety patterns, fragment usage. Some creators write in short punchy fragments. Others build complex sentences with multiple clauses. Some alternate between the two for rhythm. The extraction captures your specific pattern, not a generic "short sentences work best on Twitter" template.
Vocabulary Fingerprint
Reading level, jargon domain, signature phrases you use repeatedly, and (this one matters) a list of 15+ words to avoid. Every writer has words they never use. If you never say "leverage" or "synergy" or "game-changer," the profile encodes that as a hard constraint. One wrong word can make a post feel like it was written by someone else.
Hooks and Openers
Preferred hook types (do you lead with questions? Bold claims? Stories? Data?), first-word analysis, and 5+ banned openers. If you never start a tweet with "Did you know..." or "Here's the thing about...," those get flagged immediately. The draft generator will not use them.
Closers and Endings
How you end posts: with a call to action, a question, a one-liner, a trailing thought. The system detects engagement bait patterns (like "Repost if you agree") and checks whether you use them or avoid them. It catalogs 5+ examples of your actual closers and builds a list of banned closing patterns.
Anti-Patterns
This is the dimension most AI tools skip entirely. Anti-patterns are the structures, behaviors, and conventions you never use. Minimum 5 rules per category. Maybe you never use hashtags. Maybe you never thread. Maybe you never use emojis mid-sentence. Maybe you never ask rhetorical questions. These "never do" rules are as important as the "always do" ones. Generic AI content fails because it ignores what makes you different, and what makes you different is often what you refuse to do.
Voice Feel
Messiness level, energy, self-awareness, vulnerability. These are the qualities that make writing feel human. Some creators are polished. Others are raw. Some are self-deprecating. Others are relentlessly positive. The voice feel dimension captures the emotional texture of your writing, the quality that makes readers say "that sounds like them."
Calibration Samples
The extraction identifies 3 authentic samples (posts that represent your voice at its most recognizable), 3 outlier samples (posts that deviate from your typical style), and generates 1 intentionally bad draft example. The bad example is a teaching tool for the generation model: "this is what NOT to produce for this user."
All of this runs once, using Claude Opus because the voice profile is the foundation everything else builds on. A sloppy profile means every draft starts from a bad baseline. We use the best model for this step and cheaper models for everything downstream.
Why One-Time Setup Is Not Enough for AI Voice Learning
If voice profiles were static, we could stop here. Build the profile, inject it into prompts, done. But voice is not static.
Your writing evolves. You pick up new phrases from people you follow. You drop old habits. Your confidence on certain topics grows. Your audience shifts, and you adjust how you talk to them. The topics you cover change with the news cycle, with your business, with your mood.
A voice profile from 60 days ago is already drifting. A profile from 6 months ago might be actively wrong.
Beyond natural evolution, there is the problem of preferences that only emerge through use. You might not know that you hate question hooks until you see 5 posts that start with questions and reject all of them. You might not realize you prefer 200-character posts until you consistently edit 280-character posts down to 200. These preferences are invisible at setup. They only surface through interaction.
This is why learning how AI learns your writing style matters. A one-time extraction gets you to 60-70% accuracy. The other 30% comes from feedback over time. And that 30% is the difference between "useful tool" and "this actually sounds like me."
Three-Layer Learning: Minutes, Days, Weeks
GhostLoop's learning architecture operates on three timescales. Each one captures a different type of signal, and they work together to continuously refine your voice profile.
Reject at 9am, 11am post already reflects it
Patterns detected across recent interactions
Deep voice profile updates from all feedback
Each layer builds on the others. Speed where it matters, depth where it counts.
Layer 1: Minutes (Redis Cache)
When you reject a post, the feedback is cached in Redis immediately. If the system is generating your next batch of posts in that same session, the new batch already reflects what you just told it.
Example: you reject a post because the opening feels clickbaity. You click reject and add the note "too clickbait." The next posts generated in that session will pull back on sensational openers. No waiting for a nightly batch. No "we'll incorporate this feedback next week." Minutes.
This layer handles the frustration problem. Nothing kills trust in an AI tool faster than rejecting something and seeing the same mistake in the next output. Redis caching makes the system feel responsive, like it is listening in real time.
Layer 2: Days (PostgreSQL Storage)
Every interaction is stored in PostgreSQL with full context: the draft, the action (approve/edit/reject), the timestamp, any notes, and if edited, the before-and-after diff. Pattern detection runs on your recent history.
This layer catches the patterns you do not explicitly state. Maybe you have approved 8 out of 10 posts that use a specific structure (bold claim, supporting evidence, one-line closer). Maybe you have rejected 6 out of 8 posts about a certain topic. The database sees these patterns accumulate over days and adjusts generation parameters accordingly.
The database layer also handles feedback weighting. Not all feedback is equal:
- Edit with >20% change: 3.0x weight. You rewrote a significant chunk. Strong signal.
- Edit with <20% change: 2.0x weight. Minor tweaks. Moderate signal.
- Reject with a reason: 2.0x weight. You told us why. Valuable.
- Direction feedback: 1.5x weight. Implicit preference signal.
- Clean approve: 1.0x weight. Baseline positive signal.
- Reject without a reason: 0.5x weight. We know something was wrong, but not what.
All feedback decays over time with a 30-day half-life. What you preferred a month ago matters less than what you preferred yesterday. This prevents old patterns from overriding your current voice.
Layer 3: Weeks (LLM Aggregation)
Once a week, an LLM reviews all feedback from the past week and generates structured updates to your voice profile. This is where deeper pattern shifts get captured.
The weekly aggregation produces a learning brief: a structured output containing voice corrections, content preferences, banned phrases, hook preferences, format preferences, positive patterns, and edit patterns. This gets compressed into a ~280 token injection that is added directly to the generation prompt.
Think of it this way: Layer 1 (minutes) is reflexes. Layer 2 (days) is habits. Layer 3 (weeks) is identity. Reflexes handle immediate corrections. Habits handle recurring patterns. Identity handles who you are becoming as a writer.
The weekly re-extraction also refreshes voice samples from your latest organic tweets, so the profile stays synchronized with how you are actually writing on the platform, not just how you were writing when you signed up.
How the Feedback Loop Works in Practice
Abstract architecture is meaningless without concrete examples. Here is how the learning loop plays out in a real scenario.
Scenario: You Hate Question Hooks
Monday morning. GhostLoop delivers 5 posts. Two of them open with questions: "Ever wonder why..." and "What if I told you..." You reject both. You approve the other three, which all open with direct statements.
Minutes layer: The Redis cache flags "question openers rejected." If more posts generate that afternoon (via brief mode or the next slot), they will already avoid question hooks.
Days layer: The database records 2 rejections for question-hook posts, 3 approvals for direct-statement posts. The system detects a preference signal but does not yet make it permanent (could be coincidence).
Wednesday. Two more question-hook posts appear (from earlier generation). You reject both again. That is now 4 rejections of question hooks.
Days layer: Pattern threshold hit. Three or more validated rejections of the same pattern triggers a systemic change. Question hooks move from "available opener type" to "deprioritized opener type."
Weeks layer: At the weekly aggregation, the LLM sees the pattern across all feedback. It adds "question openers" to the banned hooks list in your voice profile. The injection text now includes an explicit rule: "Do not open with questions. Use direct statements or bold claims."
From this point forward, question hooks are hard-banned. They will not appear in your posts unless the voice profile is re-extracted (which would only happen if your actual organic tweets start using question hooks again).
Scenario: Your Edits Reveal a Length Preference
Over two weeks, you approve 20 posts with edits. The system compares your edited versions to the originals. Pattern: you consistently shorten posts by 20-30%. You remove filler sentences. You cut the setup and get to the point faster.
The edit pattern detection flags this: "User consistently shortens by 20-30%. Remove unnecessary buildup. Tighten phrasing." The next generation cycle produces posts that are already 20% shorter than the model's default output.
You stop needing to edit for length. The approval rate for post length goes from ~60% to ~90%.
Scenario: Banned Phrases Accumulate
You reject a post that ends with "What do you think?" Next week, another post ends with "Drop your take below." You reject that too. A third post ends with "Agree or disagree?" Rejected.
The system detects: three different engagement-bait closers, all rejected. Each one is now hard-banned individually. But more than that, the pattern "engagement bait closers" is flagged as a category to avoid. Future posts will not end with any variation of "tell me what you think" prompts.
The rule: if the same closer pattern appears in 3+ rejected posts, it is hard-banned. If the same opener pattern appears in 2+ rejections, it is banned. These are not suggestions. They are constraints that the generation model cannot override.
Brief Mode as Accelerated Learning
Auto posts teach GhostLoop what you like and do not like. But brief mode (where you give the AI a topic or idea and it generates a post on demand) is a faster teacher.
Why? Because briefs create denser feedback loops. With auto posts, you review 5 posts in the morning and move on. With brief mode, you might generate a post, refine it with a direction ("shorter," "more direct," "spicier"), refine again, then approve or start over. That is 3-4 interactions in 2 minutes, each one a training signal.
Direction feedback is its own category of learning data. When you click "shorter," the system records: this post was too long. When you click "more direct," it records: too much buildup. "Add hook" means the opening was weak. "Different angle" means the framing was wrong.
Each direction click carries a 1.5x weight in the feedback system. And when the same direction appears 3+ times across different briefs, it triggers a systemic change to generation defaults. If you keep clicking "shorter" on brief outputs, the model learns to generate shorter first posts across the board, not just in brief mode.
The available directions tell you something about what the system can learn:
- More direct: Cut the buildup, get to the point
- Add hook: Start with something that stops the scroll
- Shorter: Tighter, every word earns its place
- Different angle: Same idea, completely different approach
- Spicier: More opinionated, might ruffle feathers
- Softer: Less aggressive, more approachable
- More personal: Add "I" statements, make it human
Each refinement attempt is free (you get 3 per brief, they do not count against your daily brief limit). This encourages experimentation. And every experiment teaches the system something.
Users who use brief mode daily see their acceptance rate climb roughly twice as fast as users who only review auto posts. More interactions, more data, faster convergence on your voice.
The Acceptance Rate Trajectory
Numbers matter more than promises. Here is what the learning curve looks like for a typical GhostLoop user:
Every approve, reject, and edit compounds.
- Week 1: 60-70% acceptance. The voice profile carries most of the weight. Posts sound roughly like you but miss nuances. You are editing frequently.
- Week 2: 70-75%. The system has learned what to stop doing. Banned phrases and patterns are accumulating. Fewer obvious misses.
- Week 3: 75-80%. Now it knows what to do more of. Positive patterns from your approvals shape generation. Posts start feeling proactive, not just reactive.
- Week 4: 80-85%. Performance data from your actual posts feeds back in. The system knows which of your approved posts performed well on X and weights those patterns higher.
- Month 2: 85-90%. Voice profile re-extraction runs on your latest organic tweets. The profile catches up to how you write now, not how you wrote at signup. Prompt self-optimization kicks in.
- Month 3+: 90%+. The system is effectively writing as you. Most posts need a glance, not an edit. The remaining 10% are edge cases: new topics, experimental formats, areas where your voice has not been established yet.
90%+ does not mean perfect. It means 9 out of 10 posts are either ready to publish or need a single small tweak. The goal is not to replace your judgment. It is to make your judgment the only step between an idea and a published post.
What "Voice Accuracy" Actually Means
There is a common misconception that AI voice learning means perfect mimicry. That the goal is to produce output indistinguishable from the human. That is not what we are building, and it is not what works.
Perfect mimicry is a dead end for three reasons:
- Your voice has range. You do not write the same way about every topic. A post about a personal failure sounds different from a post about an industry trend. Mimicking a single "average" of your voice flattens this range into something that sounds monotone.
- Your best posts are better than your average. If the AI reproduces your average output, it is not useful. The system should produce posts at the quality level of your best work, in your voice. That means understanding not just how you write, but how you write when you are writing well.
- Augmented authenticity beats synthetic authenticity. The goal is posts that you read and think "I would say this, but I might not have thought to say it this way." The AI should bring ideas, angles, and structures that surprise you, while still sounding like you. If every post is something you would have written anyway, the tool is just saving you typing time. It should save you thinking time too.
Voice accuracy in GhostLoop means: the post matches your tone, vocabulary, structure, and anti-patterns closely enough that you could publish it without anyone noticing, AND it brings something you did not already have in your head. That is the target. Not a photocopy. An augmented version of you.
Under the Hood: How Learning Gets Injected into Generation
The learning system produces a structured output after each weekly aggregation. The output is a JSON object containing:
- voice_corrections: Specific adjustments to tone, formality, or energy based on recent feedback
- content_preferences: Topics, angles, and formats that perform well
- banned_phrases: Words, phrases, and patterns that are hard-banned from generation
- hook_preferences: Which opener types to use and which to avoid
- format_preferences: Length, structure, thread vs. single post preferences
- positive_patterns: Structures and approaches from approved posts
- edit_patterns: Consistent changes you make when editing (shorten, remove hashtags, replace X with Y)
This gets compressed into a ~280 token prompt injection that is prepended to every generation call. Two hundred and eighty tokens is not a lot, but it is dense. Every word in that injection is earned through your feedback. No filler, no generic instructions, just the distilled patterns of what makes your voice yours.
The injection updates weekly, so the generation prompt is a living document. The base voice profile provides the foundation. The injection provides the ongoing refinements. Together, they produce posts that reflect both who you are and who you are becoming.
Edit Pattern Detection: The Richest Signal
Of all the feedback types, edits are the most valuable. A rejection tells us "no." An approval tells us "yes." An edit tells us "almost, but here is exactly what to change."
When you edit a post, the system compares the original to your version. It looks for consistent patterns across your edits:
- "User consistently removes hashtags" (they find them cringy or performative)
- "User shortens by 20-30%" (posts are too long by default)
- "User replaces question hooks with direct statements" (prefers assertion over inquiry)
- "User adds personal anecdotes to openings" (wants more lived experience, less abstraction)
- "User removes calls to action from closers" (does not want to appear needy)
Each of these patterns gets encoded into the learning brief and injected into future prompts. The edit comparison is why edits with more than 20% change carry 3x the weight of a clean approve. A heavy edit is the user literally showing the AI how to write better.
This is also why GhostLoop encourages editing over rejecting. Rejection is a binary signal. Editing is a gradient. The more you edit, the faster the system learns the specific gap between its output and your preference.
Why Most AI Tools Cannot Do This
If this learning architecture sounds straightforward, you might wonder why other tools do not do it. A few reasons:
Most AI writing tools are prompt-and-response. You type, it generates, the conversation ends. There is no persistent state. No memory between sessions. No feedback loop. Every interaction starts from scratch. Building a learning system requires infrastructure (caching layers, databases, scheduled aggregation jobs) that prompt-based tools do not have. Feedback loops are the missing ingredient in AI content tools, and building them requires a fundamentally different architecture.
Voice extraction is hard to do well. Most tools that claim "voice matching" are doing surface-level analysis: tone (formal/casual), length (short/long), maybe emoji usage. An 11-dimension profile with anti-patterns, calibration samples, and a bad-example teaching draft is a different level of depth. It requires a capable model (Opus is not cheap for this), careful prompt engineering, and validation against real output.
Feedback weighting is non-obvious. Treating every user action as equal signal produces noisy learning. A rejected draft without a reason is almost meaningless. An edit with 30% changes is gold. Most teams do not invest in the nuance of differential feedback weighting because the payoff is not immediate. It takes weeks to show up in acceptance rates. But over months, it is the difference between a tool that plateaus at 70% and one that reaches 90%+.
Multi-layer architecture is complex. Running Redis for immediate feedback, PostgreSQL for pattern accumulation, and weekly LLM aggregation for profile updates requires coordinating three different systems on three different timescales. Most teams pick one layer and stop. One layer gets you partial learning. Three layers get you compound learning.
What Happens When Your Voice Changes
People worry about AI locking them into a past version of themselves. If the system learned your voice from tweets you wrote 6 months ago, what happens when your voice evolves?
Three mechanisms handle this:
Recency decay. All feedback has a 30-day half-life. Patterns from 30 days ago carry half the weight of patterns from today. Patterns from 60 days ago carry a quarter. The system naturally forgets old preferences as new ones emerge.
Voice sample refresh. Twice a month, GhostLoop scrapes your latest organic tweets and runs a partial re-extraction. If your writing style has shifted (shorter posts, different topics, new vocabulary), the voice profile updates to reflect the change. This is not a full Opus re-extraction (that would be expensive). It is a targeted update that catches drift.
Feedback override. Your active feedback always beats the historical profile. If the profile says you like question hooks but you have rejected 4 question hooks this week, the feedback wins. The profile is a prior. Your feedback is the evidence. Evidence updates the prior.
The net effect: GhostLoop's model of your voice is always a blend of your historical baseline and your recent behavior, weighted toward the recent. It follows you as you evolve, rather than anchoring you to who you were.
The 8-Minute Setup and What Comes After
Setting up GhostLoop takes about 8 minutes. Paste your competitor handles, paste some of your tweets (or let us pull them), and the voice extraction runs. First posts arrive the same day.
Those first posts will be good, not great. 60-70% acceptance rate. You will edit some, reject some, approve some. And every action teaches the system.
By the end of week 1, you will notice posts that feel closer to your voice. By week 3, you will be approving posts with minor edits or no edits at all. By month 2, you will open your post queue, scan it, approve most of it, and move on with your day.
The curve is not theoretical. It is built into the architecture. Three learning layers. Weighted feedback. Edit pattern detection. Banned phrase accumulation. Recency decay. Voice sample refresh. Each mechanism compounds on the others.
This is what separates an AI that writes generic content from an AI that writes your content. Not a better base model. Not a cleverer prompt. A system that watches, learns, and adapts, every day, getting closer to the version of you that your audience already follows.
GhostLoop learns your voice through an 11-dimension profile, three-layer feedback architecture, and weighted learning signals. Setup takes 8 minutes. The improvement curve does the rest.
Set up in 8 minutes and see the improvement curve yourself →
