Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ai-cache] Implement a WASM plugin for LLM result retrieval based on vector similarity #1290

Merged
merged 56 commits into from
Oct 27, 2024

Conversation

Suchun-sv
Copy link
Contributor

Ⅰ. Describe what this PR did

给ai-cache插件添加基于语文本向量相似度召回缓存的能力

Ⅱ. Does this pull request fix one issue?

fixes #1040

@codecov-commenter
Copy link

codecov-commenter commented Sep 6, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 43.49%. Comparing base (ef31e09) to head (28c629c).
Report is 168 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1290      +/-   ##
==========================================
+ Coverage   35.91%   43.49%   +7.57%     
==========================================
  Files          69       76       +7     
  Lines       11576    12320     +744     
==========================================
+ Hits         4157     5358    +1201     
+ Misses       7104     6626     -478     
- Partials      315      336      +21     

see 69 files with indirect coverage changes

plugins/wasm-go/extensions/ai-cache/cache/provider.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/cache/provider.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/cache/provider.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/config/config.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/vector/dashvector.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/vector/dashvector.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/main.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/main.go Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/main.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/embedding/dashscope.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/vector/dashvector.go Outdated Show resolved Hide resolved
## 配置说明
配置分为 3 个部分:向量数据库(vector);文本向量化接口(embedding);缓存数据库(cache),同时也提供了细粒度的 LLM 请求/响应提取参数配置等。

This comment was marked as resolved.

@CH3CHO
Copy link
Collaborator

CH3CHO commented Oct 24, 2024

在配置中添加了cacheKeyStrategy配置,决定如何根据历史问题生成缓存键的策略。可选值: "lastQuestion" (使用最后一个问题), "allQuestions" (拼接所有问题) 或 "disable" (禁用缓存)。 在main.go中添加相应的逻辑为:

	var key string
	if config.CacheKeyStrategy == "lastQuestion" {
		key = bodyJson.Get("[email protected]").String()
	} else if config.CacheKeyStrategy == "allQuestions" {
		// Retrieve all user messages and concatenate them
		messages := bodyJson.Get("messages").Array()
		var userMessages []string
		for _, msg := range messages {
			if msg.Get("role").String() == "user" {
				userMessages = append(userMessages, msg.Get("content").String())
			}
		}
		key = strings.Join(userMessages, " ")
	} else if config.CacheKeyStrategy == "disable" {
		log.Debugf("[onHttpRequestBody] cache key strategy is disabled")
		ctx.DontReadRequestBody()
		return types.ActionContinue
	} else {
		log.Warnf("[onHttpRequestBody] unknown cache key strategy: %s", config.CacheKeyStrategy)
		ctx.DontReadRequestBody()
		return types.ActionContinue
	}

这里的"disabled"选项对应的就是不缓存,直接resume request了,不知道是否可行?

我觉得可以。配成 disabled 也就是让 ai-cache 插件不工作了,对吧? @johnlanni 觉得呢?

Copy link
Collaborator

@CH3CHO CH3CHO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far so good.

plugins/wasm-go/extensions/ai-cache/README.md Outdated Show resolved Hide resolved
@Suchun-sv
Copy link
Contributor Author

Suchun-sv commented Oct 26, 2024

添加了对当前版本配置的支持
config.goFromJson函数中调用

func (c *PluginConfig) FromJson(json gjson.Result) {
	c.vectorProviderConfig.FromJson(json.Get("vector"))
	c.embeddingProviderConfig.FromJson(json.Get("embedding"))
	c.cacheProviderConfig.FromJson(json.Get("cache"))
	if json.Get("redis").Exists() {
		// compatible with legacy config
		c.cacheProviderConfig.ConvertLegacyJson(json.Get("redis"))
	}

ConvertLegacyJson函数定义为:

func (c *ProviderConfig) ConvertLegacyJson(json gjson.Result) {
	c.FromJson(json)
	c.typ = "redis"
}

测试文件为

{
  "redis": {
    "serviceName": "redis_cluster",
    "timeout": 2000
  }
}

测试通过


其次,添加对其他非redis配置的支持,主要包括:

	"cacheKeyFrom.requestBody"=>        "cacheKeyFrom",
	"cacheValueFrom.requestBody"=>      "cacheValueFrom",
	"cacheStreamValueFrom.requestBody"=> "cacheStreamValueFrom",
	"returnResponseTemplate"=>           "responseTemplate",
	"returnStreamResponseTemplate"=>     "streamResponseTemplate",

现采用方法为在config.go的FromJson函数中的最后一行加上 convertLegacyMapFields(c, json, log), 对应的实现为:

func convertLegacyMapFields(c *PluginConfig, json gjson.Result, log wrapper.Log) {
	keyMap := map[string]string{
		"cacheKeyFrom.requestBody":         "cacheKeyFrom",
		"cacheValueFrom.requestBody":       "cacheValueFrom",
		"cacheStreamValueFrom.requestBody": "cacheStreamValueFrom",
		"returnResponseTemplate":           "responseTemplate",
		"returnStreamResponseTemplate":     "streamResponseTemplate",
	}

	for oldKey, newKey := range keyMap {
		if json.Get(oldKey).Exists() {
			log.Debugf("[convertLegacyMapFields] mapping %s to %s", oldKey, newKey)
			setField(c, newKey, json.Get(oldKey).String(), log)
		} else {
			log.Debugf("[convertLegacyMapFields] %s not exists", oldKey)
		}
	}
}

func setField(c *PluginConfig, fieldName string, value string, log wrapper.Log) {
	switch fieldName {
	case "cacheKeyFrom":
		c.CacheKeyFrom = value
	case "cacheValueFrom":
		c.CacheValueFrom = value
	case "cacheStreamValueFrom":
		c.CacheStreamValueFrom = value
	case "responseTemplate":
		c.ResponseTemplate = value
	case "streamResponseTemplate":
		c.StreamResponseTemplate = value
	}
	log.Debugf("[setField] set %s to %s", fieldName, value)
}

但是这种写法只针对returnStreamResponseTemplate和returnResponseTemplate这两个不加点的配置项有效,其他配置项无效

plugins/wasm-go/extensions/ai-cache/config/config.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/main.go Outdated Show resolved Hide resolved
@@ -0,0 +1,225 @@
package config
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug 1: LLM 响应中如果包含 "\n",基于响应生成的响应格式会有问题

image

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

将字符串转义,并trim两边的"来兼容现有的模版

// Escape the response to ensure consistent formatting
escapedResponse := strings.Trim(strconv.Quote(response), "\"")

ctx.SetContext(CACHE_KEY_CONTEXT_KEY, nil)

if stream {
proxywasm.SendHttpResponseWithDetail(200, "ai-cache.hit", [][2]string{{"content-type", "text/event-stream; charset=utf-8"}}, []byte(fmt.Sprintf(c.StreamResponseTemplate, escapedResponse)), -1)
} else {
proxywasm.SendHttpResponseWithDetail(200, "ai-cache.hit", [][2]string{{"content-type", "application/json; charset=utf-8"}}, []byte(fmt.Sprintf(c.ResponseTemplate, escapedResponse)), -1)
}

@@ -0,0 +1,225 @@
package config

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug 2: 原始请求使用 stream 响应时,缓存结果为空

image

image

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

应该是对于"data: [DONE]"的判定出现的问题,删除后目前的测试"stream": true返回结果符合预期。
image

image image

plugins/wasm-go/extensions/ai-cache/README.md Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/README.md Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/core.go Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/main.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/main.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/core.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/core.go Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/main.go Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/main.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/util.go Outdated Show resolved Hide resolved
plugins/wasm-go/extensions/ai-cache/util.go Show resolved Hide resolved
// Check if the ResponseBody field exists
if !responseBody.Exists() {
// Return an empty string if we cannot extract the content
log.Warnf("[%s] [processSSEMessage] cannot extract content from message: %s", PLUGIN_NAME, message)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stopReason不为空的时候,是不是会进到这个分支?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

确实,此处修改为log.Warn,不抛出错误

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

那是不是每次都会记个warn。。。

Copy link
Contributor Author

@Suchun-sv Suchun-sv Oct 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我仔细看了一下SSE的chunk流程,其实finish_reason为null或者stop也可以提取出空字符串,只有传入的数据只是标志SSE结束的"data: [DONE]"才会解析不出来,我在processSSEMessage函数中加了特判:

if strings.TrimSpace(bodyJson) == "[DONE]" {
return "", nil
}

修改响应处理逻辑:当当前responseBody无法提取但已有缓存内容时,只记录 debug 级别日志并返回空字符串;如果当前没有缓存任何内容,则返回 error,并跳过后续的 resp 处理。

if ctx.GetContext(CACHE_CONTENT_CONTEXT_KEY) != nil {
	log.Debugf("[%s] [processSSEMessage] unable to extract content from message; cache content is not nil: %s", PLUGIN_NAME, message)
	return "", nil
}
return "", fmt.Errorf("[%s] [processSSEMessage] unable to extract content from message; cache content is nil: %s", PLUGIN_NAME, message)

Copy link
Collaborator

@CH3CHO CH3CHO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Collaborator

@johnlanni johnlanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@johnlanni johnlanni merged commit acec48e into alibaba:main Oct 27, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

【开源之夏】实现基于向量相似度实现LLM结果召回的WASM插件
5 participants