AI
7 min read

How we at Freeday measure Customer Satisfaction at scale

Written by
Published on
August 19, 2025
Share this article
If you intend to use this component with Finsweet's Table of Contents attributes follow these steps:
  1. Navigate the table of contents by selecting any section to jump directly to that part of the article.
  2. To add interactions which automatically expand and collapse sections in the table of contents select the content27_h-trigger element, add an element trigger and select Mouse click (tap)
  3. Click any heading in the table of contents to expand or collapse detailed information within that section.
  4. On mobile devices, the table of contents remains visible but interactions are disabled to ensure smooth scrolling.

At Freeday, our digital employees handle a high volume of customer conversations every single day, across chat, email and voice. They help customers schedule mechanic appointments, answer questions, recommend holiday destinations, create support tickets, and more.

They’re fast, tireless, and consistent. But like any member of your support team, they carry your brand in every interaction. Which raises the essential question:

How do we know they’re doing a good job?

The Survey Blind Spot

Customer Satisfaction (CSAT) surveys are one of the most widely used ways to measure support quality. After a conversation, customers are asked to rate their experience, typically on a scale from 1 (very dissatisfied) to 5 (very satisfied), with an optional comment or thumbs up/down.

The problem isn’t that CSAT is bad. In fact, with 10–15% response rates, it’s actually better than many other survey-based metrics. The problem is that the majority of customers, 85–90%, don’t respond at all.

What about all those silent conversations? We have no direct feedback, no score, and often no idea whether the customer left happy, neutral, or frustrated.

And even when customers do respond, there’s another challenge:

  • Scores often sit at the extremes: the very happy or the very unhappy.
  • It’s hard to know why someone gave a particular score, especially if they left no comment.

The result? We end up with an incomplete, skewed view of the real customer experience. One that risks overrepresenting the loudest voices and ignoring the “quiet middle,” where most interactions actually happen.

For a high-volume support operation, whether digital or human, that’s a serious risk:

  • Subtle issues affecting the average customer go unnoticed.
  • Consistently “just okay” experiences never get flagged for improvement.
  • Teams may end up optimizing for edge cases instead of the overall experience.

In other words, you’re making decisions with only part of the story.

We needed a way to fill in those blanks, to get reliable CSAT-like insights for 100% of conversations, not just the 10–15% where someone clicked a button.

So we built one.

From Gut Feeling to Data

Instead of relying only on survey responses, we use AI to simulate how a customer might rate their own experience, even if they never filled out a survey.

The AI reviews the entire conversation, from start to finish, and predicts a score from 1 to 5. Alongside the score, it also provides a short explanation, just like a customer might if you asked them “Why did you give that rating?”

To make these predictions more accurate, the AI considers five core factors:

  • Resolution: Was the issue actually solved?
  • Sentiment trajectory: Did the user’s mood improve or decline over time?
  • Tone & clarity: Was the interaction polite, clear, and easy to follow?
  • Efficiency: Did the conversation flow smoothly, or were there delays?
  • Accuracy: Was the information helpful?

The model also explains why it gave that score, like a customer might if you asked them.

Whenever possible, we also feed the AI extra context. That includes things like thumbs up/down, written comments, and timestamps. These signals help it better understand tone, pacing, and the final outcome.

How Close Is AI to the Real Thing?

This is the million-dollar question. To answer it, we first need to understand what CSAT scores actually tell us.

While CSAT is often presented as a five-point scale, in practice it functions more like a two-category system: happy or unhappy.

  • Positive Experience: Scores of 4 or 5
  • Negative Experience: Scores of 1, 2, or 3

What this means:

  • A one-point difference within the same category (like 1 vs. 2, 2 vs. 3, or 4 vs. 5) still points to the same outcome. Both the AI and the user agreed on whether the experience was positive or negative.
  • A one-point difference across the boundary (like 3 vs. 4) is more significant, because it flips the outcome from unhappy to happy (or vice versa).

And just like humans, exact scores are subjective. One person’s 4 might be another person’s 5. That’s why we see a 4 predicted on a 5 interaction as a success: the AI still identified a positive experience.

In short, 74% of the time the AI gets the outcome right, giving us a reliable, consistent view of every customer interaction, not just the ones who click a survey.

The obvious next question is:

Can AI ever match every customer’s score exactly?

Probably not, and that’s okay. Here’s why:

1. The Human Element

Two people can have nearly identical conversations, and give totally different scores. One is just having a good day and gives a 5. The other one has had a bad experience with a chatbot in the past (probably not a Freeday one 🙂) and gives a 2. These emotional layers are impossible to model fully, even with the best AI.

2. Less context = more guesswork

Some support chats are just a few messages long. That doesn’t leave much room for the model to read into tone or outcome. The shorter the chat, the more guesswork required.

3. The Sarcasm Barrier

Even polite sarcasm or subtle humor can be challenging for AI to interpret perfectly. For example, “Thanks… that was helpful” might be genuine or slightly frustrated — context matters.

The key point: the AI doesn’t need to be perfect to provide value. Its predictions give a reliable, consistent view of whether a customer left satisfied or dissatisfied, which is what drives meaningful improvements.

What This Unlocks for Freeday and Our Clients

With AI-generated CSAT, we now have visibility into 100% of our digital employee’s conversations. That opens up entirely new possibilities:

✅ We can spot underperforming flows or patterns early, even when no user complains.

✅ We can slice and dice the data, by channel, team, conversation type, or any other dimension, to uncover hidden insights.

✅ We get consistent benchmarks across channels, teams, and use cases.

✅ We can improve performance continuously based on this metric, not just when feedback happens to come in.

For our clients, this means we can deliver a higher level of quality control at scale.

And because this system doesn’t depend on users clicking a survey, it works quietly in the background, always on, always improving.

What’s Next

We’re continuing to refine the model, exploring things like score calibration across languages, smarter weighting of different conversation types, and improvements in edge cases like sarcasm detection.

But even in its current form, this system gives us something we didn’t have before:

A scalable, consistent, intelligent way to measure the quality of every customer interaction, not just the loud ones.

This isn’t a nice-to-have. It’s a step toward a new standard in customer support.

At Freeday, we’re not just adapting to that future, we’re helping shape it. And we’ll keep sharing what we learn along the way.

In this article

Stay updated on digital employees

Connect with Freeday on social channels

FAQ

Common questions about AI agents, automation, and enterprise deployment answered.

How do AI agents reduce costs?

AI agents handle repetitive workflows continuously without fatigue or error, eliminating the need for proportional headcount increases. Enterprises using Freeday reduce contact center costs by up to 92% while maintaining industry-leading CSAT scores. The agents process one million monthly calls with consistency that human teams cannot match, handling customer service inquiries, KYC verification, accounts payable processing, and healthcare intake simultaneously across voice, chat, and email channels.

What workflows can be automated?

Any workflow that follows consistent rules and doesn't require complex human judgment can be automated. This includes customer service inquiries, KYC verification, accounts payable processing, patient intake, appointment scheduling, booking modifications, returns management, and insurance verification. The platform connects to over 100 business applications including Salesforce, SAP, and Epic, enabling agents to access the systems your organization already uses.

Is AI deployment secure and compliant?

Freeday maintains ISO 27001 certification with full GDPR and CCPA compliance built into the platform foundation. Security and governance requirements are not afterthoughts but core architectural principles. Your customer data and business processes receive protection that matches the sensitivity of the information involved, with enterprise-grade controls for organization-wide AI deployment.

How does Performance Intelligence work?

Performance Intelligence tracks conversation metrics and auto-scores CSAT in real time, detecting issues before escalation becomes necessary. The system provides visibility into what agents are doing, why they're making decisions, and whether they're complying with regulations. This eliminates manual reporting that consumes time and introduces errors.

What makes the platform model-agnostic?

Freeday's architecture supports any AI model, protecting your investment as technology evolves. You're not locked into a single vendor's approach and can experiment with different models to choose what works best for your specific workflows. This flexibility ensures your platform remains current as the AI landscape changes.

Ready to learn more?

Reach out to our team to discuss your specific needs.