[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

Suchun-sv · 2024-09-06T11:18:09Z

Ⅰ. Describe what this PR did

给ai-cache插件添加基于语文本向量相似度召回缓存的能力

Ⅱ. Does this pull request fix one issue?

update update: 注意在使用http协议的时候不要用tls update: add lobechat add: makefile for ai-proxy fix bugs fix bugs fix: redis connection fix: dashvector and dashscope cluster fix: change vdb collection feat: add chroma logic docs: 增加 api 说明 update: no callback version fix: change to callback fix: finish chrome remove: key update: gitignore

codecov-commenter · 2024-09-06T11:39:52Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.49%. Comparing base (ef31e09) to head (28c629c).
Report is 168 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1290      +/-   ##
==========================================
+ Coverage   35.91%   43.49%   +7.57%     
==========================================
  Files          69       76       +7     
  Lines       11576    12320     +744     
==========================================
+ Hits         4157     5358    +1201     
+ Misses       7104     6626     -478     
- Partials      315      336      +21

see 69 files with indirect coverage changes

plugins/wasm-go/extensions/ai-cache/cache/provider.go

plugins/wasm-go/extensions/ai-cache/cache/redis.go

plugins/wasm-go/extensions/ai-cache/config/config.go

plugins/wasm-go/extensions/ai-cache/vector/dashvector.go

plugins/wasm-go/extensions/ai-cache/main.go

plugins/wasm-go/extensions/ai-cache/embedding/dashscope.go

plugins/wasm-go/extensions/ai-cache/config/config.go

plugins/wasm-go/extensions/ai-cache/core.go

plugins/wasm-go/extensions/ai-cache/vector/dashvector.go

plugins/wasm-go/extensions/ai-cache/core.go

plugins/wasm-go/extensions/ai-cache/embedding/dashscope.go

plugins/wasm-go/extensions/ai-cache/vector/dashvector.go

plugins/wasm-go/extensions/ai-cache/cache/redis.go

plugins/wasm-go/extensions/ai-cache/README.md

 ## 配置说明
+配置分为 3 个部分：向量数据库（vector）；文本向量化接口（embedding）；缓存数据库（cache），同时也提供了细粒度的 LLM 请求/响应提取参数配置等。


CH3CHO · 2024-10-24T05:58:56Z

在配置中添加了cacheKeyStrategy配置，决定如何根据历史问题生成缓存键的策略。可选值: "lastQuestion" (使用最后一个问题), "allQuestions" (拼接所有问题) 或 "disable" (禁用缓存)。在main.go中添加相应的逻辑为:

	var key string
	if config.CacheKeyStrategy == "lastQuestion" {
		key = bodyJson.Get("[email protected]").String()
	} else if config.CacheKeyStrategy == "allQuestions" {
		// Retrieve all user messages and concatenate them
		messages := bodyJson.Get("messages").Array()
		var userMessages []string
		for _, msg := range messages {
			if msg.Get("role").String() == "user" {
				userMessages = append(userMessages, msg.Get("content").String())
			}
		}
		key = strings.Join(userMessages, " ")
	} else if config.CacheKeyStrategy == "disable" {
		log.Debugf("[onHttpRequestBody] cache key strategy is disabled")
		ctx.DontReadRequestBody()
		return types.ActionContinue
	} else {
		log.Warnf("[onHttpRequestBody] unknown cache key strategy: %s", config.CacheKeyStrategy)
		ctx.DontReadRequestBody()
		return types.ActionContinue
	}

这里的"disabled"选项对应的就是不缓存，直接resume request了，不知道是否可行？

我觉得可以。配成 disabled 也就是让 ai-cache 插件不工作了，对吧？ @johnlanni 觉得呢？

CH3CHO

So far so good.

plugins/wasm-go/extensions/ai-cache/README.md

Suchun-sv · 2024-10-26T00:26:51Z

添加了对当前版本配置的支持
在config.go的FromJson函数中调用

func (c *PluginConfig) FromJson(json gjson.Result) {
	c.vectorProviderConfig.FromJson(json.Get("vector"))
	c.embeddingProviderConfig.FromJson(json.Get("embedding"))
	c.cacheProviderConfig.FromJson(json.Get("cache"))
	if json.Get("redis").Exists() {
		// compatible with legacy config
		c.cacheProviderConfig.ConvertLegacyJson(json.Get("redis"))
	}

ConvertLegacyJson函数定义为:

func (c *ProviderConfig) ConvertLegacyJson(json gjson.Result) {
	c.FromJson(json)
	c.typ = "redis"
}

测试文件为

{
  "redis": {
    "serviceName": "redis_cluster",
    "timeout": 2000
  }
}

测试通过

其次，添加对其他非redis配置的支持，主要包括：

	"cacheKeyFrom.requestBody"=>        "cacheKeyFrom",
	"cacheValueFrom.requestBody"=>      "cacheValueFrom",
	"cacheStreamValueFrom.requestBody"=> "cacheStreamValueFrom",
	"returnResponseTemplate"=>           "responseTemplate",
	"returnStreamResponseTemplate"=>     "streamResponseTemplate",

现采用方法为在config.go的FromJson函数中的最后一行加上 convertLegacyMapFields(c, json, log), 对应的实现为：

func convertLegacyMapFields(c *PluginConfig, json gjson.Result, log wrapper.Log) {
	keyMap := map[string]string{
		"cacheKeyFrom.requestBody":         "cacheKeyFrom",
		"cacheValueFrom.requestBody":       "cacheValueFrom",
		"cacheStreamValueFrom.requestBody": "cacheStreamValueFrom",
		"returnResponseTemplate":           "responseTemplate",
		"returnStreamResponseTemplate":     "streamResponseTemplate",
	}

	for oldKey, newKey := range keyMap {
		if json.Get(oldKey).Exists() {
			log.Debugf("[convertLegacyMapFields] mapping %s to %s", oldKey, newKey)
			setField(c, newKey, json.Get(oldKey).String(), log)
		} else {
			log.Debugf("[convertLegacyMapFields] %s not exists", oldKey)
		}
	}
}

func setField(c *PluginConfig, fieldName string, value string, log wrapper.Log) {
	switch fieldName {
	case "cacheKeyFrom":
		c.CacheKeyFrom = value
	case "cacheValueFrom":
		c.CacheValueFrom = value
	case "cacheStreamValueFrom":
		c.CacheStreamValueFrom = value
	case "responseTemplate":
		c.ResponseTemplate = value
	case "streamResponseTemplate":
		c.StreamResponseTemplate = value
	}
	log.Debugf("[setField] set %s to %s", fieldName, value)
}

但是这种写法只针对returnStreamResponseTemplate和returnResponseTemplate这两个不加点的配置项有效，其他配置项无效

plugins/wasm-go/extensions/ai-cache/config/config.go

plugins/wasm-go/extensions/ai-cache/main.go

CH3CHO · 2024-10-26T01:48:28Z

plugins/wasm-go/extensions/ai-cache/config/config.go

@@ -0,0 +1,225 @@
+package config


Bug 1: LLM 响应中如果包含 "\n"，基于响应生成的响应格式会有问题

将字符串转义，并trim两边的"来兼容现有的模版

// Escape the response to ensure consistent formatting escapedResponse := strings.Trim(strconv.Quote(response), "\"") ctx.SetContext(CACHE_KEY_CONTEXT_KEY, nil) if stream { proxywasm.SendHttpResponseWithDetail(200, "ai-cache.hit", [][2]string{{"content-type", "text/event-stream; charset=utf-8"}}, []byte(fmt.Sprintf(c.StreamResponseTemplate, escapedResponse)), -1) } else { proxywasm.SendHttpResponseWithDetail(200, "ai-cache.hit", [][2]string{{"content-type", "application/json; charset=utf-8"}}, []byte(fmt.Sprintf(c.ResponseTemplate, escapedResponse)), -1) }

CH3CHO · 2024-10-26T01:52:21Z

plugins/wasm-go/extensions/ai-cache/config/config.go

@@ -0,0 +1,225 @@
+package config
+


Bug 2: 原始请求使用 stream 响应时，缓存结果为空

应该是对于"data: [DONE]"的判定出现的问题，删除后目前的测试"stream": true返回结果符合预期。

plugins/wasm-go/extensions/ai-cache/README.md

plugins/wasm-go/extensions/ai-cache/core.go

plugins/wasm-go/extensions/ai-cache/main.go

plugins/wasm-go/extensions/ai-cache/util.go

plugins/wasm-go/extensions/ai-cache/main.go

plugins/wasm-go/extensions/ai-cache/core.go

plugins/wasm-go/extensions/ai-cache/main.go

plugins/wasm-go/extensions/ai-cache/util.go

CH3CHO · 2024-10-26T11:08:45Z

plugins/wasm-go/extensions/ai-cache/util.go

+	// Check if the ResponseBody field exists
+	if !responseBody.Exists() {
+		// Return an empty string if we cannot extract the content
+		log.Warnf("[%s] [processSSEMessage] cannot extract content from message: %s", PLUGIN_NAME, message)


stopReason不为空的时候，是不是会进到这个分支？

确实，此处修改为log.Warn，不抛出错误

那是不是每次都会记个warn。。。

我仔细看了一下SSE的chunk流程，其实finish_reason为null或者stop也可以提取出空字符串，只有传入的数据只是标志SSE结束的"data: [DONE]"才会解析不出来，我在processSSEMessage函数中加了特判:

if strings.TrimSpace(bodyJson) == "[DONE]" { return "", nil }

修改响应处理逻辑：当当前responseBody无法提取但已有缓存内容时，只记录 debug 级别日志并返回空字符串；如果当前没有缓存任何内容，则返回 error，并跳过后续的 resp 处理。

if ctx.GetContext(CACHE_CONTENT_CONTEXT_KEY) != nil { log.Debugf("[%s] [processSSEMessage] unable to extract content from message; cache content is not nil: %s", PLUGIN_NAME, message) return "", nil } return "", fmt.Errorf("[%s] [processSSEMessage] unable to extract content from message; cache content is nil: %s", PLUGIN_NAME, message)

CH3CHO

LGTM

johnlanni

LGTM

johnlanni and others added 17 commits August 1, 2024 15:09

fix bugs

4f7bfbd

fix bugs

0f9e816

fix bugs

ff1bce6

fix conflict

f2a9ff6

Merge branch 'alibaba:main' into main

5cbae03

alter some errors

27b2f71

fix: embedding error

130f2ee

fix bugs && update interface design

56314d7

fix bugs && refine the variable names

85549d0

update design for cache to support extension

8444f5e

Merge branch 'alibaba:main' into main

a655bc4

Refined the code; README.md content needs to be updated.

d68fa88

fix bugs, README.md to be updated

5179392

fix bugs, refine variable name, update README.md

ece7e2f

Merge branch 'alibaba:main' into main

e868a1a

delete folder

138a526

Suchun-sv requested review from johnlanni, WeixinX and CH3CHO as code owners September 6, 2024 11:18

Suchun-sv and others added 3 commits September 6, 2024 12:59

fix typos

e8ad550

fix typos

c83f5c4

change append to appendMsg

f3d3292

CH3CHO requested changes Sep 8, 2024

View reviewed changes

Suchun-sv and others added 2 commits September 11, 2024 00:52

fix bugs and refine code

b0cf29d

Merge branch 'main' into main

4a18f96

CH3CHO reviewed Sep 11, 2024

View reviewed changes

plugins/wasm-go/extensions/ai-cache/embedding/dashscope.go Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-cache/config/config.go Outdated Show resolved Hide resolved

CH3CHO reviewed Sep 11, 2024

View reviewed changes

plugins/wasm-go/extensions/ai-cache/core.go Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-cache/vector/dashvector.go Show resolved Hide resolved

plugins/wasm-go/extensions/ai-cache/core.go Outdated Show resolved Hide resolved

CH3CHO reviewed Sep 12, 2024

View reviewed changes

plugins/wasm-go/extensions/ai-cache/core.go Show resolved Hide resolved

EnableAsync added 2 commits October 24, 2024 09:37

update

81bde6d

fix: bugs

ea34f4a

CH3CHO requested changes Oct 24, 2024

View reviewed changes

Suchun-sv added 4 commits October 24, 2024 09:31

Merge branch 'main' into main

784740f

add support for skip-cache

f5b50fd

update README.md and change to FQDNCluster

a1fe701

change to FQDNCluster

730d951

CH3CHO reviewed Oct 24, 2024

View reviewed changes

plugins/wasm-go/extensions/ai-cache/README.md Outdated Show resolved Hide resolved

Suchun-sv and others added 4 commits October 25, 2024 23:36

provide support for the legacy configuration

335c04c

simplify resp func, add func name when debug

59bddf6

Merge branch 'alibaba:main' into main

e4901d9

change *.typ to *

36f0d77

Suchun-sv added 2 commits October 26, 2024 01:13

add support for legacy config

009a1b1

update content_type in stream resp

4515f43

CH3CHO requested changes Oct 26, 2024

View reviewed changes

CH3CHO reviewed Oct 26, 2024

View reviewed changes

plugins/wasm-go/extensions/ai-cache/main.go Show resolved Hide resolved

plugins/wasm-go/extensions/ai-cache/util.go Outdated Show resolved Hide resolved

CH3CHO requested changes Oct 26, 2024

View reviewed changes

plugins/wasm-go/extensions/ai-cache/main.go Outdated Show resolved Hide resolved

plugins/wasm-go/extensions/ai-cache/main.go Outdated Show resolved Hide resolved

fix bugs

c048280

CH3CHO requested changes Oct 26, 2024

View reviewed changes

Suchun-sv and others added 5 commits October 26, 2024 11:15

add support for legacy configuration

0ec24f3

fix bugs

a658bfe

handle the data: [DONE] and return in escaped string

a199144

dont read resp when ERROR_PARTIAL_MESSAGE_KEY not nil

77f05d6

Update redis_wrapper.go

28c629c

CH3CHO approved these changes Oct 27, 2024

View reviewed changes

johnlanni requested changes Oct 27, 2024

View reviewed changes

johnlanni approved these changes Oct 27, 2024

View reviewed changes

johnlanni merged commit acec48e into alibaba:main Oct 27, 2024
12 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

Suchun-sv commented Sep 6, 2024

codecov-commenter commented Sep 6, 2024 •

edited

Loading

This comment was marked as resolved.

CH3CHO commented Oct 24, 2024

CH3CHO left a comment

Suchun-sv commented Oct 26, 2024 •

edited

Loading

CH3CHO Oct 26, 2024

Suchun-sv Oct 26, 2024

CH3CHO Oct 26, 2024

Suchun-sv Oct 26, 2024

CH3CHO Oct 26, 2024

Suchun-sv Oct 26, 2024

CH3CHO Oct 26, 2024

Suchun-sv Oct 26, 2024 •

edited

Loading

CH3CHO left a comment

johnlanni left a comment

		## 配置说明
		配置分为 3 个部分：向量数据库（vector）；文本向量化接口（embedding）；缓存数据库（cache），同时也提供了细粒度的 LLM 请求/响应提取参数配置等。

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

Conversation

Suchun-sv commented Sep 6, 2024

Ⅰ. Describe what this PR did

Ⅱ. Does this pull request fix one issue?

codecov-commenter commented Sep 6, 2024 • edited Loading

Codecov Report

This comment was marked as resolved.

CH3CHO commented Oct 24, 2024

CH3CHO left a comment

Choose a reason for hiding this comment

Suchun-sv commented Oct 26, 2024 • edited Loading

CH3CHO Oct 26, 2024

Choose a reason for hiding this comment

Suchun-sv Oct 26, 2024

Choose a reason for hiding this comment

CH3CHO Oct 26, 2024

Choose a reason for hiding this comment

Suchun-sv Oct 26, 2024

Choose a reason for hiding this comment

CH3CHO Oct 26, 2024

Choose a reason for hiding this comment

Suchun-sv Oct 26, 2024

Choose a reason for hiding this comment

CH3CHO Oct 26, 2024

Choose a reason for hiding this comment

Suchun-sv Oct 26, 2024 • edited Loading

Choose a reason for hiding this comment

CH3CHO left a comment

Choose a reason for hiding this comment

johnlanni left a comment

Choose a reason for hiding this comment

codecov-commenter commented Sep 6, 2024 •

edited

Loading

Suchun-sv commented Oct 26, 2024 •

edited

Loading

Suchun-sv Oct 26, 2024 •

edited

Loading