With more than 40% of companies investing in promotional, gross sales, and customer support initiatives, this area ranks second only to IT and cybersecurity in terms of organizational focus. Amongst the various applications of general artificial intelligence, we expect rapid growth in areas where it can effectively bridge existing communication gaps between businesses and customers.
Many executives in the advertising and marketing space often find themselves at a crossroads when it comes to integrating innovative technologies into their operations. With numerous Large Language Models (LLMs) available, users are left uncertain about which one to choose, let alone deciding between open-source and closed-source options. They are apprehensive about investing an inordinate sum on a cutting-edge technology whose potential is still uncertain.
While corporations may opt for pre-packaged conversational AI tools, building bespoke solutions within their organization becomes essential when these technologies assume a core role in their operations.
To alleviate concerns surrounding the construction process, I have decided to share the key findings from our internal analysis of the most suitable Large Language Model (LLM) to build our conversational AI. After careful consideration of various large language model (LLM) providers, it’s essential to consider the costs associated with each option, taking into account both the base price and any additional fees tied to the expected usage patterns of your target audience.
We chose to compete against OpenAI’s flagship model, DALL-E, as well as Meta’s innovative LLaMA 3. Two of the leading language models that companies are likely to consider are pitted against each other, with both being regarded as the pinnacle of high-quality options available. In addition, they empower us to pair a fully saturated supply () with an undersaturated supply (LLM), thereby facilitating seamless collaboration.
The cost of Large Language Models (LLMs) used in conversational AI is typically calculated based on the model’s complexity, scale, and desired level of customization. Here are some key factors to consider: The size of the model – bigger models require more computational resources and memory, making them more expensive.
When selecting a Large Language Model (LLM), two key monetary considerations arise: the upfront arrangement value and subsequent processing costs.
Determine comprehensive costs covering everything necessary to deploy the Large Language Model (LLM), including all expenses related to development and operational expenditures aligned with your desired outcome. The processing value represents the exact value of each dialogue once your software has been deployed.
The cost-to-value ratio of arranging an LLM will depend on its intended use and frequency of application. If speed is paramount in deploying your product, you may be willing to pay a premium for a model that offers minimal setup and configuration out of the box, such as GPT-4o. While preparing Llama 3, it may take weeks to finalize the arrangement, by which time you would likely have had ample opportunity to refine and optimize a GPT product for its market debut.
Regardless of the scope of your operations, if you’re managing numerous customers or require greater control over your Large Language Model (LLM), you may find it advantageous to absorb the higher pricing structure upfront to reap long-term benefits.
We plan to assess dialogue processing costs by examining token usage, which allows for the most straightforward comparison possible. Large language models such as GPT-40 and LLaMA 3 utilize a core measurement termed “token” – a fundamental unit of textural information that these architectures process as input and output. Tokens lack standardised notation across various large language models. Tokens are calculated across various units, including phrases, subphrases, characters, and other permutations.
Given the diverse nature of large language models, it is challenging to establish a straightforward comparison among them; nonetheless, we endeavored to normalize their inherent costs to facilitate a more meaningful evaluation.
While GPT-4o may boast lower upfront costs, our analysis reveals that Llama 3’s long-term expenses actually decline at an exponential rate. Let’s examine the setup and get started then?
The fundamental tenets of every Large Language Model (LLM) encompass a trifecta of parameters: Vocabulary, Grammar, and Contextual Understanding. These building blocks form the very foundation upon which an LLM’s capabilities are constructed, facilitating its ability to process, generate, and comprehend vast volumes of linguistic data.
Before diving into the cost-per-dialogue of each Large Language Model (LLM), we must consider the total investment required to reach that milestone.
The GPT-4o is a closed-supply model hosted by OpenAI. Because of this, your goal is to prime your software for seamless interaction with GPT’s infrastructure and knowledge bases via a straightforward API interface? There’s minimal setup.
The Llama 3 is an open-source model that must be self-hosted on your own servers or via a cloud infrastructure provider. You’re being offered an opportunity to acquire model components at no cost; subsequently, it’s up to you to find a host for them.
The value of internet hosting is a crucial factor to consider here. You typically don’t buy personal servers from scratch, but instead pay a cloud provider for access to their infrastructure, with each provider offering a unique pricing structure that may vary.
Many cloud-based internet hosting providers offer on-demand computing resources, charging customers by the minute or second for access to processing power. AWS’s ml.g5.12xlarge instance, for example, incurs costs per hour of usage. Service providers may offer bundled solutions that package usage into fixed-rate plans, charging customers annually or monthly depending on their storage requirements.
While Amazon Bedrock’s pricing structure may appear unconventional at first glance, its primary calculation method is indeed based on token processing volume, making it a potentially cost-effective solution for businesses, regardless of their usage levels. Bedrock is a managed, serverless platform offered by AWS, streamlining development by handling the underlying infrastructure complexities.
To deploy your conversational AI on LLaMA 3, you’ll not only need to pay the upfront costs but also invest considerable resources and time into operational setup and maintenance, including selecting and configuring a suitable server or serverless solution, and ensuring ongoing upkeep. In the event of unforeseen issues, you may need to allocate additional resources to support the LLM servers, including tools for error logging and system notifications.
The key considerations when calculating the foundational cost-to-value ratio involve considering deployment time; the scope of product utilization, where high volumes of usage can quickly offset initial setup costs; and the level of control desired over the product and data, with open-source models often being the most suitable.
The prices per dialogue for main large language models (LLMs) vary depending on the specific model, provider, and usage scenario. However, here is a rough estimate of the costs:
* Google’s LaMDA: $0.005 to $0.015 per character
* Meta AI’s DALL-E: $0.0125 to $0.0375 per token (depending on the plan)
* Microsoft’s Turing-NLG: $0.01 to $0.03 per token
* IBM’s Watson Assistant: $0.004 to $0.012 per character
* Amazon’s Alexa LLaMA: $0.005 to $0.015 per turn
* Stanford University’s language model: $0.0025 to $0.0075 per word
Please note that these prices are approximate and may change over time. Additionally, there might be additional costs for things like data storage, transfer, or API usage fees. It’s always best to check with the provider directly for the most up-to-date pricing information.
Now we’re empowered to unearth the intrinsic worth of every unit of dialogue.
Based on this heuristic, we modelled our data with the assumption that 1,000 phrases equate to approximately 7,515 characters and 1,870 tokens.
The dialogue was simulated to encompass a conversation between the AI and the customer that spanned 16 exchanges. The input consisted of approximately 29,920 tokens, which yielded a total output of roughly 470 tokens, resulting in a combined total of around 30,390 tokens. The increase is significantly attributed to prompt directives and logical frameworks.
On GPT-4, the cost per 1,000 input tokens is $0.005, while the cost per 1,000 output tokens stands at $0.015, ultimately resulting in a benchmark dialogue that incurs a rough estimate of $0.016.
Enter tokens | 29,920 | $0.00500 | $0.14960 |
Output tokens | 470 | $0.01500 | $0.00705 |
For Llama 3-70B on AWS Bedrock, the cost per 1,000 entered tokens is $0.00265, while the cost per 1,000 output tokens is $0.00350, resulting in a total estimated expense of approximately $0.08 within the “benchmark” dialog.
Enter tokens | 29,920 | $0.00265 | $0.07929 |
Output tokens | 470 | $0.00350 | $0.00165 |
Once the two fashions are fully integrated, the cost of a dialogue run on LLaMA 3 is expected to be approximately 50% lower than one on GPT-4, for comparable results. Regardless of circumstances, server costs must be factored into the Llama 3 calculation.
This statement represents the totality of Large Language Models’ worth. As you craft your product, various factors converge to shape its unique characteristics, including the choice between a multi-prompt and single-prompt approach.
While firms may consider building conversational AI in-house as a core service, it’s possible that the investment required to develop such technology might not be justified by the potential returns, especially when comparing it to the quality of pre-existing solutions available on the market.
Regardless of the path chosen, incorporating a conversational AI will be incredibly beneficial. Ensure you remain constantly informed by what constitutes strategic value within your organization’s unique context and the needs of your clients.
Welcome to the VentureBeat neighborhood!
Data Decision Makers is a platform where experts in consulting and technology collaborate to share groundbreaking data-driven insights, innovations, and best practices.
Join our community at DataDecisionMakers to explore pioneering ideas, stay informed about the latest insights, discover best practices, and shape the future of data and technology.
Wouldn’t you consider taking into account your own personal factors?