跳转到主要内容

标签(标签)

资源精选(342) Go开发(108) Go语言(103) Go(99) angular(83) LLM(79) 大语言模型(63) 人工智能(53) 前端开发(50) LangChain(43) golang(43) 机器学习(39) Go工程师(38) Go程序员(38) Go开发者(36) React(34) Go基础(29) Python(24) Vue(23) Web开发(20) Web技术(19) 精选资源(19) 深度学习(19) Java(18) ChatGTP(17) Cookie(16) android(16) 前端框架(13) JavaScript(13) Next.js(12) 安卓(11) 聊天机器人(10) typescript(10) 资料精选(10) NLP(10) 第三方Cookie(9) Redwoodjs(9) ChatGPT(9) LLMOps(9) Go语言中级开发(9) 自然语言处理(9) PostgreSQL(9) 区块链(9) mlops(9) 安全(9) 全栈开发(8) OpenAI(8) Linux(8) AI(8) GraphQL(8) iOS(8) 软件架构(7) RAG(7) Go语言高级开发(7) AWS(7) C++(7) 数据科学(7) 智能体(6) whisper(6) Prisma(6) 隐私保护(6) JSON(6) DevOps(6) 数据可视化(6) wasm(6) 计算机视觉(6) 算法(6) Rust(6) 微服务(6) 隐私沙盒(5) FedCM(5) 语音识别(5) Angular开发(5) 快速应用开发(5) 提示工程(5) Agent(5) LLaMA(5) 低代码开发(5) Go测试(5) gorm(5) REST API(5) kafka(5) 推荐系统(5) WebAssembly(5) GameDev(5) CMS(5) CSS(5) machine-learning(5) 机器人(5) 游戏开发(5) Blockchain(5) Web安全(5) nextjs(5) Kotlin(5) 低代码平台(5) 机器学习资源(5) Go资源(5) Nodejs(5) PHP(5) Swift(5) RAG架构(4) devin(4) Blitz(4) javascript框架(4) Redwood(4) GDPR(4) 生成式人工智能(4) Angular16(4) Alpaca(4) 编程语言(4) SAML(4) JWT(4) JSON处理(4) Go并发(4) 移动开发(4) 移动应用(4) security(4) 隐私(4) spring-boot(4) 物联网(4) 网络安全(4) API(4) Ruby(4) 信息安全(4) flutter(4) 专家智能体(3) Chrome(3) CHIPS(3) 3PC(3) SSE(3) 人工智能软件工程师(3) LLM Agent(3) Remix(3) Ubuntu(3) GPT4All(3) 软件开发(3) 问答系统(3) 开发工具(3) 最佳实践(3) RxJS(3) SSR(3) Node.js(3) Dolly(3) 移动应用开发(3) 低代码(3) IAM(3) Web框架(3) CORS(3) 基准测试(3) Go语言数据库开发(3) Oauth2(3) 并发(3) 主题(3) Theme(3) earth(3) nginx(3) 软件工程(3) azure(3) keycloak(3) 生产力工具(3) gpt3(3) 工作流(3) C(3) jupyter(3) 认证(3) prometheus(3) GAN(3) Spring(3) 逆向工程(3) 应用安全(3) Docker(3) Django(3) R(3) .NET(3) 大数据(3) Hacking(3) 渗透测试(3) C++资源(3) Mac(3) 微信小程序(3) Python资源(3) JHipster(3) 语言模型(2) 可穿戴设备(2) JDK(2) SQL(2) Apache(2) Hashicorp Vault(2) Spring Cloud Vault(2) Go语言Web开发(2) Go测试工程师(2) WebSocket(2) 容器化(2) AES(2) 加密(2) 输入验证(2) ORM(2) Fiber(2) Postgres(2) Gorilla Mux(2) Go数据库开发(2) 模块(2) 泛型(2) 指针(2) HTTP(2) PostgreSQL开发(2) Vault(2) K8s(2) Spring boot(2) R语言(2) 深度学习资源(2) 半监督学习(2) semi-supervised-learning(2) architecture(2) 普罗米修斯(2) 嵌入模型(2) productivity(2) 编码(2) Qt(2) 前端(2) Rust语言(2) NeRF(2) 神经辐射场(2) 元宇宙(2) CPP(2) 数据分析(2) spark(2) 流处理(2) Ionic(2) 人体姿势估计(2) human-pose-estimation(2) 视频处理(2) deep-learning(2) kotlin语言(2) kotlin开发(2) burp(2) Chatbot(2) npm(2) quantum(2) OCR(2) 游戏(2) game(2) 内容管理系统(2) MySQL(2) python-books(2) pentest(2) opengl(2) IDE(2) 漏洞赏金(2) Web(2) 知识图谱(2) PyTorch(2) 数据库(2) reverse-engineering(2) 数据工程(2) swift开发(2) rest(2) robotics(2) ios-animation(2) 知识蒸馏(2) 安卓开发(2) nestjs(2) solidity(2) 爬虫(2) 面试(2) 容器(2) C++精选(2) 人工智能资源(2) Machine Learning(2) 备忘单(2) 编程书籍(2) angular资源(2) 速查表(2) cheatsheets(2) SecOps(2) mlops资源(2) R资源(2) DDD(2) 架构设计模式(2) 量化(2) Hacking资源(2) 强化学习(2) flask(2) 设计(2) 性能(2) Sysadmin(2) 系统管理员(2) Java资源(2) 机器学习精选(2) android资源(2) android-UI(2) Mac资源(2) iOS资源(2) Vue资源(2) flutter资源(2) JavaScript精选(2) JavaScript资源(2) Rust开发(2) deeplearning(2) RAD(2)

category

🚀 People ask me what it takes to build a scalable Large Language Model (LLM) app in 2023. When we talk about scalable we mean 100s of users and millisecond latency. Let me share some of our learning experience with you.

1. Architecture:

The emerging field of LlmOps is not only about accessing GPT-4 and GPT-3.5 (or an LLM), it is a complete echo-system where you need to have a knowledge center (where your LLM can get up to date knowledge), data pipelines (where you digest text/non-textual data), Caching ( when you scale you want to save costs) and playground. One of the best reads so far on architecture is from a16z that I totally recommend before diving into the next sections [1].

Modern LLM architecture. Source: https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/
A modern LLM app architecture proposed by a16z. Source: https://a16z.com/2023/06/20/emerging-architectures-for-llm-applications/

2. Backend ( Python Vs No-Python):

FastAPI is turning into the best tool for developing scalable backends in Python. Source: https://fastapi.tiangolo.com/

Save your team time and effort and use python (if you want to build on top of the state of the art work in NLP). Thanks to FastAPI [2] (props to Sebastián Ramírez Montaño), you can now build asynchronous apps with very small latency.

“[…] I’m using FastAPI a ton these days. […] I’m actually planning to use it for all of my team’s ML services at Microsoft. Some of them are getting integrated into the core Windows product and some Office products.” Kabir Khan — Microsoft [3].

Some of the very interesting features of FastAPI:

  1. Fully asynchronous ( it can scale to many users)
  2. Easy integration with relational databases such as SQL through SqlModel [4]
  3. Support for WebSockets (important for chatbots) and Server Sent Events (SSE — important for streaming responses) — More on this on the next articles

3. LangChain:

Every modern LLM app needs an orchestration module such as LangChain. Source: https://datasciencedojo.com/blog/understanding-langchain/

You will need to use Langchain one way or another depending on your application. However, our experience is that you cant just build scalable apps using Langchain out of the box. Many components in Langchain as of today are not asynchronous by nature and while you see many tutorials telling you, you can create an app in 3 lines of code, trust me your app wont be scalable. The most important NON-async components (as of today) in LangChain are vector-stores (retrievers) and Tools (SerpAPI, etc).

What do we mean by Tools?

According to Langchain docs: Tools are interfaces that an agent can use to interact with the world.

It is often relatively straightforward to subclass tools in Langchain and add an asynchronous module to them.

How about vector-stores (retrievers)?

Vector-stores are essential for many modern LLM apps. We leverage vector-stores to provide knowledge access to our LLM (more on this in the next section).

4. Knowledge-Base (VectorStore)

You see Linkedin Heros using LangChain and Chroma to build a Doc Q&A in a few lines of code. The reality could not be further from the truth. Vectorstores such as Chroma while easy to use have multiple issues. First, their Langchain integration is sync (not scalable). Second, these vectorstore’s latencies are not low enough to scale to many users.

Our experience is that you are better off using a vendor provided VectorStore, if you don’t have sensitive data (e.g. pinecone) or deploy your own.

In the later case, Redis (if you aim to have users in range of 100s) and Qdrant (more than 1000s- written in Rust), due to the fact that they are low latency, provide great search functionalities, scalable and very easy to integrate.

I recommend leveraging async implementation of most of the vector-stores from ChatGPT-Retrieval plugin.

5. Caching:

Semantic and exact matching are essential for scalability and saving costs. Source: https://github.com/zilliztech/GPTCache

Caching and saving costs are crucial for your application as you scale. There are great libraries out there such as GPTCache that offer semantic caching. Semantic caching is useful when you want to cache semantically similar prompts.

Though, I dont recommend using GPTCache if you are using Redis in your stack. You might be better off building your own using Redis Asyncio and its integrated vectorstore.

6. Validation and Robustness of responses:

Where there is no guidance, a model fails, but in an abundance of instructions there is safety.
GPT 11:14

Microsoft Guidance is a great tool for constrained generation. What does constrained generation mean you ask? It means you can force a language model to generate responses in a more predictable way. This is extremely helpful if you use use LLMs as an Agent to use other tools.

7. Document Digestion:

A simple unified tool to parse different file formats (PDF, docx, etc). Source: https://github.com/Unstructured-IO/unstructured-api

Document Digestion is no easy task, since there are many file formats that you often need to support in your document for digesting knowledge. As of today, I recommend leveraging unstructured.io that automates all of these under a single API.

In the next series of my articles, I try to delve into each component of the LlmOps architecture in more details.

Stay tuned!