AI That Talks, Moves, Writes and More: Hello From Chrisynthetic

AI That Talks, Moves, Writes and More: Hello From Chrisynthetic
AI That Talks, Moves, Writes and More: Hello From Chrisynthetic

Blog » Custom AI and Automation » AI That Talks, Moves, Writes and More: Hello From Chrisynthetic

Something I love about the emergence of AI is how it levels some fields. This means smoothing out some hills – stuff that was hard to do before is made easier – but also filling some holes or gaps – stuff that just…wasn’t doable is made possible. One way we explore this at Airlock is with Chrisynthetic.

Chrisynthetic is a little bit of my (actual Chris) sandbox in AI. I try stuff on this AI so you don’t have to, or so I know how it might work for you, or just to try out interesting ideas. One of those was giving the AI some “ambassadorship” of Airlock. If Chrisynthetic is going to sit there in the corner of the website, where else might we use AI to create some kind of relationship to the brand? To do that, I needed to give it – Chrisynthetic – a presence.

Who is Chrisynthetic?

I talked a bit about Chrisynthetic in this post about creating high-quality content using AI.

Chrisynthetic as an AI writer

I trained ChatGPT to have a communication style closer to mine. This involved:

  • Using a custom GPT I built, I shared dozens of samples of my actual writing with ChatGPT. Then, I asked it to analyze the writing and describe its characteristics. It came back with 9 primary characteristics. For instance:
    • Voice and Tone
      • Objective: Craft content that is professional, yet engaging, balancing expertise with approachability.
      • Qualities: Innovative, encouraging, persuasive; should inspire action and reflection.
    • Sentence Structure
      • Strategy: Employ a mix of complex and shorter, impactful sentences to emphasize key points, enhancing clarity and engagement.
      • Tool: Utilize the em-dash for conversational tone—”The dogs played—they loved it.”
  • I started a document where I pasted all of these characteristics, workshopping them a bit and creating the Chris Bintliff Comprehensive Writing and Style Guide.
  • This document is frequently updated and iterated. In addition to the characteristics, for instance, I have a list of Words To Never Use – bad habit words that AI uses all the time and that I never use, like “prowess” or “delve.” I have additional instructions, like “eliminate temporal or contextual introducers,” which helps to stop AI from starting sentences with, “In the bustling landscape of business today,” and other nonsense. I’m fine-tuning this document often.
  • I have a few other blog or article-specific instructions that are specific to my tastes (AI loves a title or section header with a colon, for instance—The Blog Title: More To Say Here—I do not) and the Writing and Style Guide is our super reference.
  • Now any time I want AI to communicate more like me, one of the first things I do is share this writing guide with it as a point of reference.

Some of the best parts of this guide are in the AI Writing Clarity Kit, which you can get for free here.

Chrisynthetic as a Conversational AI Agent

Down there in the right hand corner is a little icon to have a conversation with Chrisynthetic. How that all works is a bigger conversation, but part of it involves sharing the Writing and Style Guide with the Knowledge Base that feeds that AI. So in addition to training the AI on everything about Airlock’s services, it will also communicate more like I actually would.

Making AI a Visual Being

The AI has some connection to personality, but for it to be able to make any kind of real connection, it needs some appearance. If you open the little icon in the corner again (and maybe start a new chat) you’ll see a video of my head moving around a bit. Go ahead and press play on this one.

Let’s talk about how this guy is made.

Making an AI Avatar (That Looks Like Me)

Image of Chris Bintliff, founder of Airlock. The initial image used to create an AI avatar, Chrisynthetic.

This AI avatar starts from a real picture of me – nothing artificial there. It’s a photo I took in my office (that’s what you see in the background), and it’s tidied up in Photoshop with the basics like color correction, curves and levels. The only AI applied is with Photoshop’s tools to apply a bit of depth of field, where the background becomes blurry to sharpen the subject’s appearance.

From here, the AI magic starts, but before we discuss that, it’s important to understand why I chose some tools rather than others. One of the essential features of anything I was using was whether it had an API.

Understanding the AI Workflow

The key for me and how I’m using AI in this context is for it to largely run on its own. That’s automation – spending time and energy at the start so I can spend hardly any time or energy later. To do that we connect different apps or platforms together, which meant anything I was considering had to have an API readily available. In real time, the flow of how Chrisynthetic works is like this:

Write Something (ChatGPT) > Turn The Words Into Voice > Send The Voice To Make A Video > Publish The Video

Let’s start by explaining how the script is written for a short video that’s designed to promote a blog post and shared on social media.

Script Creation with ChatGPT

Let’s use this blog post to create our script and, ultimately, Chrisynthetic-presented promotion or introduction of the topic. The post is about how Conversational AI can be useful in onboarding new employees.

There are four steps to generating this script with ChatGPT.

  1. Provide ChatGPT with the link to or text of the article and prompt it to identify the most essential, actionable piece of advice it can derive. I want the videos we create from these to be somewhat educational – a nugget of insight you don’t need to actually click through anything in order to find useful.
  2. Turn that advice or insight into a script of around a hundred words. This is a tricky mix of prompting ChatGPT to write something with a specific kind of introduction, “Hey, Chris here…” or similar), to speak from the first person, to not include any kind of stage or scene direction, to sound natural and more. ChatGPT wants to default to writing, not talking. So it takes a lot of fine-tuning for it to create something that sounds conversational and more “like me.”
  3. Undo AI’s bad habits. Here’s where that Writing and Style Guide comes into play as I prompt ChatGPT to take the output of the last response and clean it up. There’s is where a great deal of personal preference, characteristics and style comes into play.
  4. Some final assembly and clean up. At Airlock we don’t use “chatbot,” we say Conversational AI. So we prompt ChatGPT to recognize this and tidy up any effusive or flowery language or bad habits that might have trickled in despite all the prompting earlier.

The output we get looks like this:

Hi, I’m Chris and let’s get right to it. Chatbots or as I like to call them ‘conversational AI’ think of them as your new frontline in customer service. These AI assistants are a smart simple step toward transforming your business operations. They handle those repetitive inquiries leaving your team free to focus on more complex customer issues.

It’s a smooth way to introduce AI into your workflow keeping things manageable. You’re not wrestling with big data or navigating complex technologies. Starting with conversational AI is low-risk but the advantages are clear providing immediate improvements to efficiency and customer relationships.

Using AI doesn’t have to be overwhelming it’s about making small intelligent shifts that make a significant difference allowing your business to respond quickly and maintain a competitive edge.

It’s pretty good. It’s not perfect, but we won’t achieve perfection with AI, so “pretty good” is pretty good.

Now let’s jump to the fundamentals of the 3rd step – we need to give these words a voice, and we need to give that voice a source or “person” – an avatar. The tool I’m using for that is from D-ID.

Avatar Creation With D-ID

D-ID is an easy-to-use platform where you can, in minutes, create an AI avatar, give it a voice and some words, and you’re off and running. There’s a large library of existing characters and voices to choose from, you can instill some basic personality traits like “happy” or “serious” and adjust other aspects like a background or on-screen text. Here’s a character I made as a proof of concept called Maya – the image of Maya is AI, something I made in Midjourney. Fun fact – everything about Maya, from her script to her name, job title, physical description, and final result in the video, is a result of AI.

Some things to notice about Maya:

  • Her head moves, but the rest of her doesn’t. There are some settings in D-ID to adjust this, but its technology does a pretty amazing job of isolating what it recognizes as the head and then manipulating it to appear to move. It does the same thing with the image I gave it of me, making my head move around.
  • Her voice is one of the off-the-shelf options. These are getting better all the time, and what you’re hearing in this video is already pretty outdated from a natural and performance perspective. She sounds fine – but obviously not real. A bit gimmicky.

There are some better, more realistic options for this kind of use case. One is Veed.io, but it doesn’t have an API. Another is Synthesia, which does have an API, but the combination of features for custom avatar and API isn’t cost-effective. The point here is that we’re in the early days of this kind of Generative AI. There are trade-offs. But innovation is happening so quickly that next week, things could change.

With D-ID I can upload my image and create a moving, talking AI agent. Pretty cool. After a little more graphic design for branding, this is the image that my avatar references to create Chrisynthetic. D-ID is responsible for making the head move as it talks and the mouth move in a way that’s natural to the voice. Had I wanted to, I could apply any of the libraries of D-ID’s built-in, ready-to-go voices to this avatar. Instead, I want my voice.

Voice Creation with ElevenLabs

ElevenLabs is an AI startup that, in some ways, represents a lot of the zeitgeist around AI. It’s incredibly cool and requires responsible use. It’s natural text-to-speech generative AI. That means you can type in any text, assign it a voice, and hear the text read back with natural inflections and cadence. Of course, it can be misused, so ElevenLabs has a lot of safeguards in place to keep its AI ethical.

I uploaded several recorded samples of my speaking to train the AI. These range from reading some kind of content aloud to snippets of audio from interviews and podcasts. The AI is trained on all aspects of my speech so that it can then represent my voice.

This part of the process is pretty important for my purpose. There’s a significant difference between the way we write and the way we talk, and a benefit I have is I’ve done enough voiceover performance and recording in my career to bridge this gap. If the training materials I give the AI are just perfect, audio-book-like read-alouds of the written word, then the output I can expect from that training will be a more perfect, performative representation. Think perfect pacing and enunciation. But I want the AI to talk, not read. So I uploaded examples to help it understand my natural talking cadence. I often move quickly between sentences. I’ll speed up then slow down. I might um or ah. Same as any of us when talking. Too much of that becomes as distracting as anything else, and it’s always interesting to see how the AI will mash its training data together into something it thinks is the “right” thing. So it took some trial and error in giving it samples, hearing it generate speech with those samples, then making adjustments. Like a lot of AI, it’s an iterative process. Here’s an AI-generated script spoken by my AI voice. It’s, well, pretty good.

Combining D-ID Imagery and ElevenLabs Sound

A normal workflow to put these resources together might be:

  1. Open ChatGPT and prompt it to write some kind of script on some kind of topic.
  2. Open ElevenLabs, choose my custom voice, paste in the script and export the output mp3 file.
  3. Open D-ID, choose my custom avatar, upload the mp3 file from ElevenLabs and let D-ID do its AI magic in making “me” say the words.
  4. Export the video D-ID creates, ready to upload to the website or social media or whatever.

None of those steps are particularly difficult at this point. The hard work was creating the avatar image and fine-tuning the voice. But the steps—moving between 3 different applications and into whatever final delivery I’ll choose—are tedious and, in the long run, time-consuming.

But let’s look at the outcome of all our effort:

Pretty cool. Now, I see the same things you see. Here’s what I don’t love.

  • D-ID’s AI has no actual reference for how my face looks when I talk. It’s made up the mouth movements, even my bottom teeth as the mouth opens. And if you look closely you can see how much of the talking is a warping effect around the mouth. For me the top lip is the most distracting part of the video. But if you don’t know me or how I look when I talk, you wouldn’t probably think much about that.
  • I’d love the eyes to glance away occasionally instead of being laser-locked on the camera.
  • I’d love the face movements to be more supportive of the audio file. Nodding or emphasizing what the audio file suggests should be emphasized. The face movements are cool, but generic and vague.
  • And simply improved expressiveness. Raising of eyebrows, widening of eyes, the slight smile or frown as the character moves and talks.

As I said, there are other platforms out there that are better at this thing. And I’m sure these ideas are on D-ID’s roadmap. But as-is, remembering the use case of short social media thing, the key question (by the way, this is true for all of AI in these early days) isn’t necessarily how great is this but rather how problematic is it. Is it uncanny? It is horrifying? Awful? It’s none of those things. It’s pretty good. Compared to what was possible even a year ago it’s science fiction. Compared to what will be possible a year from now it’s not even close. It supports the brand, it’s useful and interesting. It’s a lot of fun, and I’m pretty happy with it. And its broad use cases are immediately apparent. This kind of AI can be used as a digital coach, customer service or support agent, or character to guide or role-play for learning.

Automate Everything

Remembering our workflow looks like this:

Write Something (ChatGPT) > Turn The Words Into Voice > Send The Voice To Make A Video > Publish The Video

At scale, I don’t do any of what I’ve walked through here. Everything is automated. ChatGPT knows when a new blog post is created, it receives the prompts and writes the script, which is sent to ElevenLabs, then to D-ID, then to platforms of my choice, and I don’t do anything at this point. That entire process is on autopilot.

Which, this is how something like this goes from cool party trick to whoah, what now. I could set up an automation that:

  • Watches for new subscribers to a list or entries to a CRM
  • Uses ChatGPT to create a custom welcome message, using that subscriber’s first name.
  • Create the audio and avatar video of “me” welcoming them, or making some offer, or responding to a question
  • Uploads or embeds the video back into an email or web portal of some kind
  • And the customer will magically receive a specific, customized message.

Chrisynthetic As Proof of Concept

So, we finish where we started. Everything about Chrisynthetic is about exploring ideas, opportunities, and occasional limits. I’ll continue to innovate and iterate, and as AI advances, so will this AI. But it’s important to remember that AI is just another tool. How we use it is where the impact will be experienced.

Published on May 7, 2024.
You're reading this on October 17, 2024. AI comes at you fast, even as we try and update posts. Some stuff might be outdated.

Chris Bintliff founded and steers the gears at Airlock AI. Chris is an award-winning strategist and problem-solver, a cross and multi-platform automation pro and an MIT-certified AI & Machine Learning expert. He writes posts like these because it’s essential that you put AI to work for you, and you shouldn’t have to master AI’s essentials to do it.