Beyond CSAT: the Freeday Conversation Principles

At Freeday we already showed how AI predicts CSAT for 100% of conversations, even the ones that never get a survey click. That system gives us a predicted 1–5 score plus a short explanation for many conversations that would otherwise be silent. CSAT is our compass, because it tells us directly how satisfied users are, and higher CSAT is strongly linked to loyalty and retention.

But CSAT alone doesn’t always tell the full story. It’s still an outcome. To truly improve we need to understand the why. A conversation might end with a low satisfaction score: was the assistant too formal, unclear, off-topic, or unhelpful when something went wrong?

That’s where the Freeday Conversation Principles come in.

This framework lets us assess how our digital employees communicate, focusing on empathy, clarity, understanding, and more. Together with CSAT, it gives us both the outcome and the explanation behind it.

Why we built the framework

Measuring the quality of communication in conversations has always been tricky. We already had CSAT to show whether users were satisfied, but CSAT doesn’t always pinpoint what needs to be improved.

Imagine your digital employee finishes a conversation and the CSAT prediction comes back as 2 out of 5. You know the user wasn’t happy, but why not? Was the assistant unclear in its answers? Too formal in tone? Did it misunderstand the request?

Manual reviews using the Conversation Principles could, in theory, answer those questions. But going through thousands of conversations per day by hand is slow, inconsistent, and not scalable.

This left us with three main challenges:

We lacked a clear picture of communication quality.
It was hard to spot patterns and improvement areas.
CSAT gave us the outcome, but not the deeper insight.

To keep raising satisfaction, we needed a way to measure how conversations were being handled; reliably, consistently, and at scale.

Principles of the Perfect Digital Conversation Style

To go beyond CSAT and understand why conversations succeed or fail, we created the Freeday Conversation Principles: a practical framework for evaluating and improving how our digital employees interact with users.

Rather than just providing correct answers, our digital employees should be: clear, relevant, empathetic, and helpful. Each principle focuses on a key element of great communication. By applying these principles, we can not only measure but also consistently raise conversation quality at scale

The six principles are:

🪞 Empathy — Recognize user emotions and respond appropriately. Acknowledge frustration or confusion, match tone to context, and make the user feel heard.
👂🏼 Understanding — Grasp the user’s true intent, not just the literal words. Ask clarifying questions when needed and address the root cause of the problem.
🎯 Relevancy — Stay focused on the user’s question and provide only useful information. Avoid unnecessary details and ensure responses are accurate and context-appropriate.
✨ Simple & Clear — Keep language straightforward and structured. Use short sentences, logical flow, and formatting like bullets or numbered steps.
🗺️ Proactive — Guide the conversation with timely prompts and next steps. Anticipate user needs and keep interactions flowing naturally.
🚑 Failure Recovery — Quickly acknowledge errors and provide a clear path forward. Offer solutions, alternatives, or escalation when necessary.

Together, these principles help our digital employees not just answer questions, but deliver meaningful, human-like experiences that feel natural, helpful, and reliable.

How we measure them: AI + a scorecard

Every conversation is scored on the six principles using a structured scorecard, the same kind of approach that’s commonly used to evaluate real customer support agents. Each evaluation returns:

A principle-specific score (e.g., Empathy: Good / Adequate / Unacceptable; Understanding: Excellent / Good / Adequate / Unacceptable).
A short explanation of the reasoning behind the score, similar to how a human agent might explain their judgment.

The score itself isn’t a final verdict, the explanation is what makes it actionable. If the evaluator can justify their judgment in a clear, human-like way, the score becomes meaningful; if not, we refine the scoring rubric and alignment.

Using this scorecard approach gives us three key advantages at scale:

Diagnostics: not just “low CSAT,” but why (e.g., low empathy or weak failure recovery).
Trends: track which principles degrade over time.
Actionability: identify the improvements that truly affect the user experience.

A quick example

Customer: “I can no longer log in to my account.”

Digital Employee: Provides instructions for password reset, email/phone re-verification, and 2FA issues.

Customer: “I’ve already tried all of that.”

Digital Employee: Escalates to a human agent: “I’m forwarding your request to a team member. You’ll receive a reply within 24 hours.”

Principles:

Empathy → Adequate (assistant did not directly acknowledge the user’s frustration; escalation was appropriate but lacked a brief acknowledgment like, “I understand this must be frustrating.”)
Understanding → Good (recognized that the user cannot log in and provided all relevant recovery options).
Relevancy → Acceptable (initial instructions included multiple options, but the user had already tried them).
Simple & Clear → Good (steps were clear and structured).
Proactive → Acceptable (provided all necessary steps and escalation, but could have confirmed resolution or acknowledged frustration before closing).
Failure Recovery → Good (correctly escalated to a human agent when automated help failed).

Action: Adjust the flow to include a short acknowledgment of frustration or repeated attempts before escalating. For example:

“I see you’ve already tried these steps. I understand that must be frustrating. I’m forwarding your request to a team member who will follow up shortly.”

The traffic-light view

To make principle scores actionable, we surface them in the our portal using a simple traffic-light logic:

🟢 Green — principle performance is healthy
🟡 Yellow — some degradation; coaching or small fixes recommended
🔴 Red — urgent attention; likely contributor to low CSAT

When AI-CSAT is low, the principle scores and their explanations highlight the root cause. And when a principle trends red across a flow, CSAT erosion usually follows, a clear signal for product or content improvements.

Clients get a quick overview via the traffic-light dashboard, and can drill down into individual conversations to see why a principle (like Understanding) scored low.

Press enter or click to view image in full size

What this unlocks for our clients

✅ Measure conversation quality like you measure agent performance, for every digital employee, every conversation.

✅ Link communication problems to CSAT.

✅ Prioritize the fixes that actually affect customer experience.

✅ Run experiments and track principle scores to validate whether a change improved how we talk, not just what we achieved.

What’s next

We’re now focusing on closing the feedback loop: giving clients the tools to act on issues themselves, from refining phrasing to improving missing content in their knowledge base.

Even at this stage, the Conversation Principles system gives us something new:

A clear, actionable, human-centered way to measure how conversations succeed or fail, not just whether they do.

This isn’t a nice-to-have. It’s a step toward a new standard in customer support, where quality is visible, measurable, and improvable.

At Freeday, we’re not just adapting to that future, we’re helping shape it, and we’ll keep sharing what we learn along the way.



The Ethics of AI Agents: Responsibility as a Strategic Business Advantage

