跳转到主要内容

标签(标签)

资源精选(342) Go开发(108) Go语言(103) Go(99) angular(83) LLM(79) 大语言模型(63) 人工智能(53) 前端开发(50) LangChain(43) golang(43) 机器学习(39) Go工程师(38) Go程序员(38) Go开发者(36) React(34) Go基础(29) Python(24) Vue(23) Web开发(20) Web技术(19) 精选资源(19) 深度学习(19) Java(18) ChatGTP(17) Cookie(16) android(16) 前端框架(13) JavaScript(13) Next.js(12) 安卓(11) 聊天机器人(10) typescript(10) 资料精选(10) NLP(10) 第三方Cookie(9) Redwoodjs(9) ChatGPT(9) LLMOps(9) Go语言中级开发(9) 自然语言处理(9) PostgreSQL(9) 区块链(9) mlops(9) 安全(9) 全栈开发(8) OpenAI(8) Linux(8) AI(8) GraphQL(8) iOS(8) 软件架构(7) RAG(7) Go语言高级开发(7) AWS(7) C++(7) 数据科学(7) 智能体(6) whisper(6) Prisma(6) 隐私保护(6) JSON(6) DevOps(6) 数据可视化(6) wasm(6) 计算机视觉(6) 算法(6) Rust(6) 微服务(6) 隐私沙盒(5) FedCM(5) 语音识别(5) Angular开发(5) 快速应用开发(5) 提示工程(5) Agent(5) LLaMA(5) 低代码开发(5) Go测试(5) gorm(5) REST API(5) kafka(5) 推荐系统(5) WebAssembly(5) GameDev(5) CMS(5) CSS(5) machine-learning(5) 机器人(5) 游戏开发(5) Blockchain(5) Web安全(5) nextjs(5) Kotlin(5) 低代码平台(5) 机器学习资源(5) Go资源(5) Nodejs(5) PHP(5) Swift(5) RAG架构(4) devin(4) Blitz(4) javascript框架(4) Redwood(4) GDPR(4) 生成式人工智能(4) Angular16(4) Alpaca(4) 编程语言(4) SAML(4) JWT(4) JSON处理(4) Go并发(4) 移动开发(4) 移动应用(4) security(4) 隐私(4) spring-boot(4) 物联网(4) 网络安全(4) API(4) Ruby(4) 信息安全(4) flutter(4) 专家智能体(3) Chrome(3) CHIPS(3) 3PC(3) SSE(3) 人工智能软件工程师(3) LLM Agent(3) Remix(3) Ubuntu(3) GPT4All(3) 软件开发(3) 问答系统(3) 开发工具(3) 最佳实践(3) RxJS(3) SSR(3) Node.js(3) Dolly(3) 移动应用开发(3) 低代码(3) IAM(3) Web框架(3) CORS(3) 基准测试(3) Go语言数据库开发(3) Oauth2(3) 并发(3) 主题(3) Theme(3) earth(3) nginx(3) 软件工程(3) azure(3) keycloak(3) 生产力工具(3) gpt3(3) 工作流(3) C(3) jupyter(3) 认证(3) prometheus(3) GAN(3) Spring(3) 逆向工程(3) 应用安全(3) Docker(3) Django(3) R(3) .NET(3) 大数据(3) Hacking(3) 渗透测试(3) C++资源(3) Mac(3) 微信小程序(3) Python资源(3) JHipster(3) 语言模型(2) 可穿戴设备(2) JDK(2) SQL(2) Apache(2) Hashicorp Vault(2) Spring Cloud Vault(2) Go语言Web开发(2) Go测试工程师(2) WebSocket(2) 容器化(2) AES(2) 加密(2) 输入验证(2) ORM(2) Fiber(2) Postgres(2) Gorilla Mux(2) Go数据库开发(2) 模块(2) 泛型(2) 指针(2) HTTP(2) PostgreSQL开发(2) Vault(2) K8s(2) Spring boot(2) R语言(2) 深度学习资源(2) 半监督学习(2) semi-supervised-learning(2) architecture(2) 普罗米修斯(2) 嵌入模型(2) productivity(2) 编码(2) Qt(2) 前端(2) Rust语言(2) NeRF(2) 神经辐射场(2) 元宇宙(2) CPP(2) 数据分析(2) spark(2) 流处理(2) Ionic(2) 人体姿势估计(2) human-pose-estimation(2) 视频处理(2) deep-learning(2) kotlin语言(2) kotlin开发(2) burp(2) Chatbot(2) npm(2) quantum(2) OCR(2) 游戏(2) game(2) 内容管理系统(2) MySQL(2) python-books(2) pentest(2) opengl(2) IDE(2) 漏洞赏金(2) Web(2) 知识图谱(2) PyTorch(2) 数据库(2) reverse-engineering(2) 数据工程(2) swift开发(2) rest(2) robotics(2) ios-animation(2) 知识蒸馏(2) 安卓开发(2) nestjs(2) solidity(2) 爬虫(2) 面试(2) 容器(2) C++精选(2) 人工智能资源(2) Machine Learning(2) 备忘单(2) 编程书籍(2) angular资源(2) 速查表(2) cheatsheets(2) SecOps(2) mlops资源(2) R资源(2) DDD(2) 架构设计模式(2) 量化(2) Hacking资源(2) 强化学习(2) flask(2) 设计(2) 性能(2) Sysadmin(2) 系统管理员(2) Java资源(2) 机器学习精选(2) android资源(2) android-UI(2) Mac资源(2) iOS资源(2) Vue资源(2) flutter资源(2) JavaScript精选(2) JavaScript资源(2) Rust开发(2) deeplearning(2) RAD(2)

category

For the past 6 months, I’ve been working on LLM-powered applications using GPT and other AI-as-a-Service providers. Along the way, I produced a set of illustrations to help visualize and explain some general architectural concepts.

Below is the first batch, I’m hoping to add more later on.

Table of Contents

1. Basic Prompt
2. Dynamic Prompt
3. Prompt Chaining
4. Question-answering without hallucinations (RAG)
5. Vector database
6. Chat Prompt
7. Chat Conversation
8. Compressing long discussions

1. Basic Prompt

The most fundamental LLM concept:

  • you send a piece of text (called prompt) to the model,
  • and it responds with another piece of text (often called completion).

It’s essential to remember that, fundamentally, LLMs are only trying the most statistically probable next words to complete your prompt. Hence the term “completion”. This is key to understanding and leveraging some behaviors or these models.

2. Dynamic Prompt

A common practice in LLM apps is to create a “prompt template” and dynamically replace parts of it with the user data before sending the final prompt to the LLM.

3. Prompt Chaining

In some cases, a single LLM call may not be enough. Maybe because the task is complex or the full response wouldn’t fit in the context window (maximum tokens/words the model can read and write per request).

That’s when you can use prompt chaining: incorporating the response from the first call into the prompt of the next one.

4. Question-answering without hallucinations (RAG)

Because LLMs answer questions instinctively, they sometimes come up with incorrect answers called “hallucinations”. This is a problem for fact-related use cases like customer support.

However, while LLMs are not good with facts, they’re excellent at data extraction and rephrasing. The solution to mitigate hallucinations is to provide the model with multiple resources and ask it to either find the answer with those resources or respond with “I don’t know”.

The technique of retrieving resources to augment the prompt and improve the output is known as RAG (Retrieval-Augmented Generation).

5. Vector database

Retrieving relevant resources for the use case above can be challenging with traditional databases and word-to-word matching. The same word can have different meanings in different contexts.

In a Vector Database, items (words, sentences, or full documents) are indexed by a sequence of numbers representing their meaning.

These numbers are coordinates in a multi-dimension space (often with 1500+ dimensions). Some of these dimensions can capture concepts such as “natural ←→ synthetic”, “positive ←→ negative”, “colorful ←→ bland”, or “round ←→ pointy”.

Then, using geometry, it's easy to calculate the distance between the meaning of two items and find the closest matches based on our input.

6. Chat Prompt

Introduced by OpenAI with their popular GPT-3.5-turbo model, chat prompts use a slightly different format representing a discussion, where the completion would be the next message in the sequence.

☝ It’s important to note that, while GPT’s chat API expects a sequence of role/message and are finetuned (optimized) with this specific structure, a chat prompt is equivalent to a text prompt formatted as follows:

SYSTEM: You're an AI customer support agent...
AI: Hi, how may I help you today?
USER: My internet connection is very slow
AI:

☝ Also, because OpenAI’s chat models are more performant and cheaper than their non-chat ones, we often use chat models for tasks that don’t involve discussion. You can place your instruction in the first system message and get your response in the first AI message.

For all the examples below, you can use chat prompts and chat-formated text prompts interchangeably.

7. Chat Conversation

LLMs are stateless, meaning they don’t store data from previous calls and treat each request as unique. However, when you chat with an AI it needs to be aware of what was said before your last question.

To address this, you can store the chat history in your app and include it with each request.

8. Compressing long discussions

LLMs have a limit on the number of tokens (≈ words) they can read and write in a single request known as the “context window”.

E.g., if a model with a window size of 4096 tokens receives a 3900-tokens prompt, its completion will be limited to 196 tokens.

To avoid hitting this limit and to reduce API costs (since providers charge based on input/output tokens), a solution is to summarize older messages using an intermediate prompt.

🧐 Because this summarization costs you a second LLM request, there is a soft spot to find between doing it too often or not enough.

☝ Also know that larger context windows are rarely the solution. Models with larger context windows are more expensive, larger requests are slower, and too much context can be counter-productive.

You can already read Part 2 of Building LLM-Powered Products, which explores how to give the AI tools it can interact with.