Ellis Crosby

AI Expert - CTO at Incremento

Technical Guide

Convert Youtube Video to Text: Content Writing For People Who Hate Writing Content

Explore the transformative power of AI in content creation with our latest guide. Delve into a novel method that blends sketches with GPT-4's capabilities, offering a streamlined approach to writing and developing content. Perfect for those looking to enhance efficiency and inject innovation into their content strategy.

convert youtube video to text

Alert: This is a technical guide and requires a basic understanding of python

Efficient Content Writing With AI

I absolutely loathe writing content. It feels cumbersome and counterintuitive to me. My wife suggests it might be because I lack a “voice” in my head, but who’s to say? The only time words flow freely is when I unleash a rant on my keyboard, much like with this piece! Hence, I frequently lean on AI to aid in content creation. Today, I’m sharing a novel method I’ve been experimenting with. Let’s dive in!

The idea

Last week I wrote this tweet:

Overpowered development process:

  1. Sketch out what you are thinking using @excalidraw
  2. Give a screenshot to GPT-4 in ChatGPT
  3. Have it describe what you’ve drawn (so you can check it interpreted everything correctly)
  4. Get it to build (or amend) code based on the new logic.

I got bogged down in the orchestration of a bunch of functions and the data that each one needed so I spent a couple of minutes rethinking and sketching out the logic before having GPT-4 edit my code to the new logic. This worked amazingly well on only the second try.

Tweet

Following this method, I refined a complex set of functions and data requirements by sketching out the logic, then having GPT-4 modify my code accordingly. Remarkably, it succeeded on the second attempt.

I use sketches (hat tip Excalidraw) a lot - for everything from backend development flows, frontend UIs, in explaining ideas to people while teaching and even making nice graphs to tweet. So with the success of using GPT being able to understand my sketch in this case I wanted to see if I could expand on this in another area.

The Use Case

I produced a video narrating a sales process sketch, discussing efficiency improvements via recent AI tool developments. Theoretically, providing GPT-4 with the transcript and sketch should suffice for article generation.

You can check out the the video and sketches to see what I mean.

Let’s delve into the coding aspect:

If you want to skip ahead and run this yourself you can access the Colab notebook here

Initial Setup

Firstly, there’s a couple of packages we’ll use:

!pip install openai
!pip install youtube-transcript-api

Ensure you have an OpenAI API key, available here if you don’t already have it):

from getpass import getpass

# OpenAI API Key
openai_api_key = getpass('Enter your OpenAI API Key: ')

And something you should always do when playing with the AI is keeping track of the cost to avoid bankrupting yourself. This little class can handle that:

class CostTracker:
    def __init__(self):
        self.total_cost = 0

    def track_cost(self, input_tokens, output_tokens):
        input_token_cost_per_mille = 0.01
        output_tokens_cost_per_mille = 0.03
        cost_to_add = ((input_tokens / 1000) * input_token_cost_per_mille) + ((output_tokens / 1000) * output_tokens_cost_per_mille)
        self.total_cost += cost_to_add
        print(f"This call cost {round(cost_to_add,4)}, the total cost is now {round(self.total_cost,4)}")

    def track_image_cost(self):
        image_cost = 0.04
        self.total_cost += image_cost
        print(f"This call cost {round(image_cost,4)}, the total cost is now {round(self.total_cost,4)}")

cost_tracker = CostTracker()

Step 1. Getting a youtube video transcript “with AI”

Sure, we could use Whisper or any other speech-to-text models, but that would have some cost in terms of time and money. YouTube has usually already handled this, and this Python package can extract it into a JSON. This code grabs that transcript and merges it into a single string (and yes I say “Um” a lot).

from youtube_transcript_api import YouTubeTranscriptApi

video_link = "https://www.youtube.com/watch?v=dwwHe055q2E"
video_id = video_link.split("v=")[1]

transcript = YouTubeTranscriptApi.get_transcript(video_id)

full_transcript = ' '.join([item['text'] for item in transcript])
full_transcript[:100]

> 'hey I'm Ellis the lead AI developer at incremento um and last week we released an article about um a'

Step 2. Describing the image with AI

Next, I downloaded all 4 sketches I made. We can use the GPT-4-Vision model to describe these sketches. The model can handle multiple images, although the OpenAI library only works with image urls so we’ll have to make an old-school request to give it our base64 images:

def upload_files():
  from google.colab import files
  uploaded = files.upload()
  for k, v in uploaded.items():
    open(k, 'wb').write(v)
  return list(uploaded.keys())

uploaded_screenshots = upload_files()
import base64
import requests


# Function to encode the image
def encode_image(image_path):
  with open(image_path, "rb") as image_file:
    return base64.b64encode(image_file.read()).decode('utf-8')

# Getting the base64 string
base64_images = [encode_image(image_path) for image_path in uploaded_screenshots]
message = "I have sketched out a few things here to illustrate how my company is augmenting sales teams. There are 4 sketches, one of the sales process and 3 of the various tools we have built to augment parts of this process. The first is a tool that extracts insights from Slack conversations, and can act as a chatbot to respons back with information. The second is a chatbot that checks a leads qualification status before they pass to a salesperson - the image shows two conversations, one qualified and one not, the unqualified one is directed to a 2nd chatbot that sells them a cheaper course. The last is a tool that enriches leads intelligently by reasoning through the information missing and how it can find it. Please describe the images and processes for the three tools in as much detail as possible so that we can write some content about them."

headers = {
  "Content-Type": "application/json",
  "Authorization": f"Bearer {openai_api_key}"
}

content = [
        {
          "type": "text",
          "text": message
        },
      ]

for base64_image in base64_images:
  content.append(
        {
          "type": "image_url",
          "image_url": {
            "url": f"data:image/jpeg;base64,{base64_image}"
          }
        })

payload = {
  "model": "gpt-4-vision-preview",
  "messages": [
    {
      "role": "user",
      "content": content
    }
  ],
  "max_tokens": 3000
}

response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)

image_descriptions = response.json()['choices'][0]['message']['content']

# Don't forget to track the cost!
input_tokens = response.json()['usage']['prompt_tokens']
output_tokens = response.json()['usage']['completion_tokens']
cost_tracker.track_cost(input_tokens,output_tokens)

image_descriptions[:300]

> 'Certainly, let's go through each image and describe the processes and tools illustrated.

The first image appears to be a detailed sketch of an interactive and responsive sales tool that's based on extracting insights from Slack conversations. The sketch portrays a conversation between two individua'

Converting your script into text, article ideas & structures with AI

Next we’ll start putting the article together. A best practice is to separate the writing process into 1) ideas 2) structure and 3) the writing. This will give you more interesting ideas, a more coherent structure and a longer article than if the article is being generated with less guidance. This also means we can have some human feedback in the loop on which idea is more interesting. Here I’ve added the instruction in the system prompt and the instruction subject to the message. I’ve merged the ideas and structure step here as the two can go together without a lot of additional token overhead.

There are some of the usual prompt engineering best practices here such as:

  • Giving role context (“You are a content writer…“)
  • Giving task context (“For each article I’ll send you…“)
  • Output constraints (“using only the information from”)
  • Well labeled and clearly separated task context (“Description of sketches:” & “\n########\n”)
  • Giving a goal (“To get leads from Sales Directors”)
  • Detailing the task last to ensure full comprehension (“Create 3 ideas for this article”)
  • And even the “take a deep breath” modifier that sometimes can improve the output

Note that we want some creativity in this ideation step so I’ve set the temperature to 0.85. I want the model to stop writing once it has completed the task, rather than run out of tokens so I’ve set max_tokens to 3000 too

from openai import OpenAI

openai_client = OpenAI(api_key=openai_api_key)

rough_idea_for_the_article = "How to 10X sales for a salesteam using AI"
goal_for_the_article = "To get leads Sales Directors who are interested in us building similar solutions for them"

system_prompt = "You are a content writer working for Incremento. Your boss Ellis is lazy and doesn't write content by himself, instead he draws sketches and records himself talking about the content."
system_prompt += "For each article I'll send you a description of his sketches, a transcript of his narration, the idea he has for the article, and the goal of the article, and your current task in creating the article."
system_prompt += "Your job is to make sense of it all and write a great piece of content using only the information from the narration and the sketches. First take a deep breath, I understand that this is a frustrating job."

message="Description of sketches: \n"
message+=image_descriptions
message+="\n########\n"

message+="Narration transcript: \n"
message+=full_transcript
message+="\n########\n"

message+="Idea for the article: \n"
message+=rough_idea_for_the_article
message+="\n########\n"

message+="Goal for the article: \n"
message+=goal_for_the_article
message+="\n########\n"

message+="Current Task: \n"
message+="Create 3 ideas for this article, including a small summary and the outline of the article. These will be sent to your boss for him to choose the best one"
message+="\n########\n"

response = openai_client.chat.completions.create(
    model = "gpt-4-turbo-preview",
    messages = [
        {"role":"system","content":system_prompt},
        {"role":"user","content":message}
    ],
    temperature=0.85,
    max_tokens=3000
)

content = response.choices[0].message.content

input_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
cost_tracker.track_cost(input_tokens,output_tokens)
print(content)

>"### Idea 1: Transform Your Sales Process: AI-Powered Strategies for Exponential Growth

**Summary:**
This article will dive into the transformative power of AI in the sales process, showcasing real-world applications and client success stories from Incremento. By integrating AI tools for lead qualification, enrichment, and sales process efficiency, businesses can significantly enhance their sales output. The ultimate goal is to spark interest among Sales Directors in exploring these AI solutions to 10X their sales team's performance.

**Outline:**
1. **Introduction** 
    - Brief overview of the challenges faced by sales teams
    - The promise of AI in revolutionizing sales processes
2. **Case Study: The AI-Powered Sales Qualification Chatbot**
    - Description of the chatbot's mechanism and its dual focus on qualifying leads and maximizing value from unqualified leads
    - Client success story highlighting increased sales and efficiency
3. **The Power of Lead Enrichment Tools**
    - How AI-driven lead enrichment can fill information gaps and improve lead quality
    - Example of a client who experienced a more refined lead pool and better-targeted sales strategies
4. **Streamlining Sales Operations with AI Insights**
    - Insight into the tool that integrates with Slack for capturing valuable sales information and tasks
    - A success story of how this tool facilitated better focus, follow-ups, and closed sales
5. **Conclusion**
    - Recap of the benefits of incorporating AI into the sales process
    - Call to action for Sales Directors to reach out for a custom AI solution consultation with Incremento

---
..."

Writing a full article using AI

The previous step gave us 3 ideas - I chose my favourite and added that back into a prompt fairly similar to the above. I’ve started this prompt from scratch rather than continuing a chat as I’ve found the accuracy in completing the task to be much higher when keeping the conversation just to the relevant prompt points, rather than things left over from earlier calls (this includes starting new chats for each message rather than continuing existing ones in the ChatGPT interface).

The task here is a pretty basic “Write an article” on with the below features:

  • Constraint of only using the information given to keep things relevant and reduce hallucination/imagination (“using only the information given above, sticking to the idea and outline given” & a temperature setting of 0.6)
  • Markdown formatting to work with the blog tool I use (” Output this with markdown formatting”)
  • Output length controls (“This article should be detailed and verbose”, “6-12 paragraphs, totalling around 2000 words” & max_tokens set to the models maximum 4096). _N.b. this isn’t likely to end up at 2000 words, but adding this wil increase it to ~800-1000 rather than 500
  • Weak tone controls (“casual, professional tone”). We could do a lot more here but the use case doesn’t call for a more controlled or stylised tone

idea_for_the_article = """**Leveraging AI for Unprecedented Sales Growth: A Guide for Sales Directors**

*Summary*: This article is a comprehensive guide for Sales Directors on leveraging AI to overhaul their sales process, focusing on Incremento's expertise in building AI solutions that streamline lead qualification, enrich leads, and enhance sales efficiency. Through detailed explanations and client success stories, Sales Directors will be encouraged to partner with Incremento for their AI needs.

"""
outline = """**Outline:**
1. **Introduction**
    - The evolving role of AI in the competitive landscape of sales
2. **Revolutionizing Lead Qualification with AI**
    - An in-depth look at AI chatbots' role in identifying and nurturing qualified leads
    - Highlighting a transformative impact on a client's sales process
3. **The Strategic Advantage of AI Lead Enrichment**
    - Discussing how AI can uncover valuable lead insights for more effective sales pitches
    - Client story: achieving a more accurate and action-driven lead database
4. **Boosting Sales Efficiency Through AI Insights**
    - Examination of how AI tools can extract actionable tasks and insights from sales communications
    - Sharing a case where improved sales team focus led to dramatic increases in sales outcomes
5. **Conclusion**
    - Reinforcing the need for Sales Directors to adopt AI for sales growth
    - A call to action for consulting with Incremento to explore bespoke AI solutions"""

system_prompt = "You are a content writer working for Incremento. Your boss Ellis is lazy and doesn't write content by himself, instead he draws sketches and records himself talking about the content."
system_prompt += "For each article I'll send you a description of his sketches, a transcript of his narration, the idea for the article, the goal of the article, the outline for this article, and your current task in creating the article."
system_prompt += "Your job is to make sense of it all and write a great piece of content using only the information from the narration and the sketches. First take a deep breath, I understand that this is a frustrating job."

message="Description of sketches: \n"
message+=image_descriptions
message+="\n########\n"

message+="Narration transcript: \n"
message+=full_transcript
message+="\n########\n"

message+="Idea for the article: \n"
message+=idea_for_the_article
message+="\n########\n"

message+="Goal for the article: \n"
message+=goal_for_the_article
message+="\n########\n"

message+="Outline for the article: \n"
message+=outline
message+="\n########\n"

message+="Current Task: \n"
message+="Write this article using only the information given above, sticking to the idea and outline given. Output this with markdown formatting. This article should be detailed and verbose, therefore it should contain between 6-12 paragraphs, totalling around 1000 words. This should have a casual, professional tone."
message+="\n########\n"

response = openai_client.chat.completions.create(
    model = "gpt-4-turbo-preview",
    messages = [
        {"role":"system","content":system_prompt},
        {"role":"user","content":message}
    ],
    temperature=0.6,
    max_tokens=4096
)

content = response.choices[0].message.content

input_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
cost_tracker.track_cost(input_tokens,output_tokens)

Woo! We now have an article. If you want to read the output in full you can do so on our blog in our [AI for sales prospecting] (https://incremen.to/ai-for-sales-prospecting) article. This has been tweaked very slightly in line with our SEO strategy but is basically the complete output of the AI.

Generating images for the article with AI

Next we’ll get an image to go alongside this on our blog. My go to strategy for blogs is 1) Get an idea from the blog with an LLM and 2) Using that idea as a prompt with an image generation model. Because of how limited our control is over LLM generated ideas/prompts, and how forgiving DALL-E 3 is with bad prompts, we’ll use that image generation model.

We can generate the image idea by passing in some relevant information on the article from above, e.g. the idea or text - I’ve gone for the full text here but if we were using this in production we’d probably want to save on tokens by just using the idea. I’ve also included: -a bit of guidance (“colourful, retro and realistic”) -a small request to fit in with the rest of our blog (“Include some robots”) -a format control, so that we can put the output straight into the image model (“Respond only with the idea”)

First the image idea:

prompt = "Think of an idea for an image for this blog post. This should be colourful, retro and realistic. It should have no text in the image. Include some robots. Respond only with the idea."
prompt += "\n Article: \n"
prompt += content

response = openai_client.chat.completions.create(
    model = "gpt-4-turbo-preview",
    messages = [
        {"role":"user","content":prompt}
    ],
    temperature=0.85,
    max_tokens=4096
)

image_prompt = response.choices[0].message.content

input_tokens = response.usage.prompt_tokens
completion_tokens = response.usage.completion_tokens
cost_tracker.track_cost(input_tokens,output_tokens)
print(image_prompt)

> 'An image featuring a vibrant, 80s-style office space with pastel and neon colors highlighting the retro aesthetic. In the center, a humanoid robot with a sleek, chrome design, reminiscent of vintage sci-fi, is seated at a desk. The desk is equipped with futuristic yet retro-looking computers and gadgets, pulsing with soft neon lights. The robot is engaging in a conversation with holographic displays that project images of potential clients, symbolizing AI's role in lead qualification and enrichment. Around the robot, other robots perform various tasks such as organizing files, analyzing data on screens, and interacting with more holograms, showcasing the efficiency and multitasking capabilities AI brings to the sales process. The background subtly features analog synthesizers and vinyl records, reinforcing the retro vibe, while the windows offer a view of a city skyline at dusk, bathed in the glow of neon signs and holographic advertisements, hinting at a world where AI seamlessly integrates into everyday business operations.'

This is quite an interesting, albeit very detailed prompt. Thankfully DALL-E 3 will automatically optimise the prompt further as it sees fit, so we should get something pretty nice.

Then we generate the image:

response = openai_client.images.generate(
      model="dall-e-3",
      prompt=image_prompt,
      quality="standard",
      size='1024x1024',
      n=1
  )
image_url = response.data[0].url
# cost_tracker.track_image_cost()
print(image_url)

Great! This is exactly the kind of thing we wanted, and it does kind of look like a sales team of robots. You can see the image below.

Conclusion

So how does this article look overall? It’s decent, maybe even good. While it isn’t a masterpiece it is a decent article that accurately showcases a few of the things we’ve built, and I did save time and effort in writing it.

We could improve on this further with more control over the tone. For example we could first have the AI describe the features/tone of one of our previous blogs and have it write this article in the same style. We could have also split up the AI calls to create each secion individually and put it together at the end to ensure a longer length.

Let’s not forget the cost:

print(f"This entire article cost: ${round(cost_tracker.total_cost,4)}")
> This entire article cost: $0.3004

Not bad! $0.30 is pretty cheap, and a lot cheaper than the cost of my time if I’d have written this. If we really wanted to we could optimise the prompts and the chaining of these to reduce the tokens used - or optimise the prompts to work with cheaper models - but as we’re not putting this into production $0.30 is an acceptable cost.

Links:

Youtube video

Colab notebook

Finished article

If you want to talk to one our our friendly AI developers to see how we could help your content strategy you can book a free 15 minute consultation through this link here.

convert youtube video to text