Understanding the Unique Challenges of LLM Projects in Established Businesses
In the realm of Large Language Models (LLMs), the landscape for enterprise projects differs significantly from the startup-to-consumer paradigm. While there's no shortage of advice for LLM projects, it often stems from a startup perspective. In this blog series, we'll delve into the world of LLMs in enterprise, shedding light on the unique challenges and strategies for success. The author, an experienced B2B software vendor, draws from extensive experience to guide builders and leaders in the enterprise space.
Enterprise: A Distinct Universe
Before we dive into the specifics, it's crucial to understand the meaning of "enterprise" in this context. If you've recently consumed content related to LLMs, you'll likely notice a trend – it predominantly targets consumer users and startup providers. This is partly due to the online prevalence of publicly applicable products and the startup ecosystem's agility. So, where does one find guidance tailored to enterprises, and is it trustworthy?
A simple provider-user matrix highlighting (⭐) where enterprise challenges and risks are involved. Note that a product provided to an enterprise user could end up with a consumer end-user, e.g. a chatbot provided to a financial services company for their clients.
Surprisingly, very few enterprises have completed LLM projects. Traditional consulting organizations, for instance, often repackage web content instead of relying on actual experience. The lack of accumulated enterprise experience results from the relatively recent adoption of LLMs in this sphere. Consequently, there's a gap in guidance specific to enterprises.
In this blog series, titled "LLMs in Enterprise," the author aims to bridge this guidance gap. Drawing from extensive experience in developing and implementing LLM projects for enterprises, the author will educate enterprise builders and leaders on the path to project success.
Data: The Heart of Enterprise LLM Projects
In this specific post, "Enterprise is Different: Data," we will explore one of the most critical dimensions that distinguish enterprise LLM projects from their startup and consumer counterparts – data. Data is the linchpin of data-backed applications, which are prevalent in enterprise LLM use cases. Harnessing LLMs to work with data offers immense value, making it essential to understand how enterprise data differs and what implications it carries.
Qualitative comparison of the data behind LLM applications in consumer and enterprise environments along several axes. It’s a cartoon so relative sizes are approximate and there are many exceptions.
Open vs. Closed Domains
Enterprise data tends to fall into closed domains, specific to particular industries or fields. Conversely, consumer applications often deal with open-domain data, covering a wide array of topics. Closed-domain data includes specialized terminology, acronyms, and concepts not found in open-domain data.
The implications for your project are significant. Neglecting closed-domain data can render your application ineffective, as most commercial LLMs are trained on open-domain data. However, be cautious not to jump into costly retraining projects unnecessarily. Assess whether your domain is genuinely closed and whether structured information about it exists.
Real-world examples:Enterprise Data
Pharmaceutical R&D documentation 🚪
E-commerce product listings 🌍/🚪
Customer support records 🌍/🚪
Company policies 🚪
Quarterly earnings reports 🌍
Consumer Data
Social media posts 🌍
Personal finance records 🌍
Podcast transcripts 🌍/🚪
Recipes 🌍
Travel journals 🌍
Size Matters: Population vs. Unit
Enterprises handle more data than individual consumers. They operate over longer periods, resulting in denser records. Understanding whether your project deals with a single entity or many is essential. Population-level data can push your project beyond manageable limits, making it necessary to filter data down to a unit level.
Real-world examples:
- Enterprise Data
- E-commerce product listings 👪
- Customer support records 👪
- Customer support records for a particular customer 🧑
- A company’s meeting transcripts 👪
- A team’s meeting transcripts 🧑
- A company’s GitHub repositories 👪
- Startup/Consumer Data
- personal call transcripts 🧑
- GitHub repositories 🧑
Archival vs. Fresh Data
Enterprises love bookkeeping, accumulating vast amounts of archival data. This can introduce noise and challenges related to highly similar records. Addressing this issue from the start is crucial, whether through curation, data slicing, or instructing your generation system.
Real-world examples:
- Enterprise Data
- Quarterly voice of customer reports 💾
- Project report draft versions 💾
- Product spec versions 💾
- Quarterly OKR meeting transcripts 💾
- Startup/Consumer Data
- Recipes 🍎
- Travel journals 🍎
- Social media posts 🍎
Modalities: Many vs. Fewer
Enterprises often deal with a multitude of data modalities, from text to audio, images, and structured data like tables and graphs. Consumer applications tend to have fewer overall categories but greater variety within them. The advice here is to operate only on the types of content necessary for your use case, treating different modalities as separate projects.
Real-world examples:
- Enterprise Data 🌈
- Project kickoff reports
- Click behavior data
- CRM data
- Product spec sheets
- Quarterly OKR meeting transcripts
- Property brochures
- Startup/Consumer Data 🏁
- Notion pages
- To-do lists
- Phone calls
- Social media posts
Data Regularity
In enterprise, data often follows reliable norms and procedures, resulting in more structured content. Recognizing these patterns can help you design your system more effectively.
Access Control
Access control in enterprise applications is complex, with numerous access groups and tiers. Implementing access control correctly requires careful engineering and integration with identity providers.
Understanding these dimensions of enterprise LLM projects can save you time and guide you in forming a well-informed approach. Stay tuned for more in this blog series, where we'll explore use cases, security, compliance, and various strategies for success in enterprise LLM projects. If you found this helpful, share your thoughts on what you'd like to learn more about in future posts.
We research, curate and publish daily updates from the field of AI. Paid subscription gives you access to paid articles, a platform to build your own generative AI tools, invitations to closed events, and open-source tools.
Consider becoming a paying subscriber to get the latest!