Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Add Spark llm support for plugins/ai-proxy #1139

Merged
merged 7 commits into from
Aug 8, 2024
Merged

Conversation

urlyy
Copy link
Contributor

@urlyy urlyy commented Jul 20, 2024

Ⅰ. Describe what this PR did

Support SparkLLM ai model, api documentation: https://www.xfyun.cn/doc/spark/HTTP%E8%B0%83%E7%94%A8%E6%96%87%E6%A1%A3.html

Ⅱ. Does this pull request fix one issue?

fixes #949

Ⅲ. Why don't you add test cases (unit test/integration test)?

Ⅳ. Describe how to verify it

  1. ai-proxy/docker-compose.yaml
version: '3.7'
services:
  envoy:
    image: higress-registry.cn-hangzhou.cr.aliyuncs.com/higress/gateway:v1.4.0-rc.1
    entrypoint: /usr/local/bin/envoy
    # 注意这里对wasm开启了debug级别日志,正式部署时则默认info级别
    command: -c /etc/envoy/envoy.yaml --component-log-level wasm:debug
    depends_on:
    - httpbin
    networks:
    - wasmtest
    ports:
    - "10000:10000"
    volumes:
    - ./envoy.yaml:/etc/envoy/envoy.yaml
    - ./plugin.wasm:/etc/envoy/plugin.wasm

  httpbin:
    image: kennethreitz/httpbin:latest
    networks:
    - wasmtest
    ports:
    - "12345:80"

networks:
  wasmtest: {}
  1. ai-proxy/envoy.yaml
admin:
  address:
    socket_address:
      protocol: TCP
      address: 0.0.0.0
      port_value: 9901
static_resources:
  listeners:
    - name: listener_0
      address:
        socket_address:
          protocol: TCP
          address: 0.0.0.0
          port_value: 10000
      filter_chains:
        - filters:
            - name: envoy.filters.network.http_connection_manager
              typed_config:
                "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
                scheme_header_transformation:
                  scheme_to_overwrite: https
                stat_prefix: ingress_http
                # Output envoy logs to stdout
                access_log:
                  - name: envoy.access_loggers.stdout
                    typed_config:
                      "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
                # Modify as required
                route_config:
                  name: local_route
                  virtual_hosts:
                    - name: local_service
                      domains: [ "*" ]
                      routes:
                        - match:
                            prefix: "/"
                          route:
                            cluster: spark
                            timeout: 300s
                http_filters:
                  - name: wasmtest
                    typed_config:
                      "@type": type.googleapis.com/udpa.type.v1.TypedStruct
                      type_url: type.googleapis.com/envoy.extensions.filters.http.wasm.v3.Wasm
                      value:
                        config:
                          name: wasmtest
                          vm_config:
                            runtime: envoy.wasm.runtime.v8
                            code:
                              local:
                                filename: /etc/envoy/plugin.wasm
                          configuration:
                            "@type": "type.googleapis.com/google.protobuf.StringValue"
                            value: |
                              {
                               "provider": {
                                 "type": "spark",
                                 "apiTokens": [
                                    "APIKey:APISecret"
                                  ],
                                 "timeout": 1200000,
                                 "modelMapping": {
                                  "gpt-4o": "generalv3.5",
                                   "gpt-4": "generalv3",
                                   "*": "general"
                                 }
                               }
                              }
                  - name: envoy.filters.http.router
  clusters:
    - name: httpbin
      connect_timeout: 30s
      type: LOGICAL_DNS
      # Comment out the following line to test on v6 networks
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: httpbin
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: httpbin
                      port_value: 80
    - name: spark
      connect_timeout: 30s
      type: LOGICAL_DNS
      dns_lookup_family: V4_ONLY
      lb_policy: ROUND_ROBIN
      load_assignment:
        cluster_name: spark
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: spark-api-open.xf-yun.com
                      port_value: 443
      transport_socket:
        name: envoy.transport_sockets.tls
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
          "sni": "spark-api-open.xf-yun.com"
  1. 编写并运行ai-proxy/build.sh
go mod tidy
tinygo build -o plugin.wasm -scheduler=none -target=wasi -gc=custom -tags="custommalloc nottinygc_finalizer" ./main.go
sudo docker compose up
  1. 发送请求
  • openai协议,非流式
curl --location 'http://127.0.0.1:10000/v1/chat/completions' \
--header 'Content-Type:  application/json' \
--data '{
    "model":"gpt-4o",
    "messages": [
        {
            "role": "system",
            "content": "你是一个专业的开发人员!"
        },
        {
            "role": "user",
            "content": "你好,你是谁?"
        }
    ]
}'

回复

{
    "id": "cha000c26dd@dx190ef0e2e28b8f2532",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "您好!我是一名专业的软件开发人员,拥有丰富的编程经验和技术知识。我擅长解决各种编程难题,无论是前端设计还是后端开发,我都能够提供专业的解决方案。有什么可以帮到您的吗?"
            }
        }
    ],
    "created": 1721997605,
    "model": "generalv3.5",
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 41,
        "total_tokens": 51
    }
}
  • openai协议,流式
curl --location 'http://127.0.0.1:10000/v1/chat/completions' \
--header 'Content-Type:  application/json' \
--data '{
    "model":"gpt-4o",
    "messages": [
        {
            "role": "system",
            "content": "你是一名专业的开发人员!"
        },
        {
            "role": "user",
            "content": "你好,你是谁?"
        }
    ],
    "stream": true
}'

回复

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"你好"}}],"created":1721997642,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"!我是一名"}}],"created":1721997642,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"专业的开发人员,拥有"}}],"created":1721997642,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"深厚的"}}],"created":1721997642,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"编程知识和丰富的实践经验"}}],"created":1721997643,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"。"}}],"created":1721997643,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"我擅长解决各种技术难题,从复杂的后端系统开发到精致的前端用户体验设计,我都能够提供高效和创新的解决方案。"}}],"created":1721997645,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"无论是在构建大型应用程序、优化数据库性能还是实现最新的技术趋势方面,我都能确保项目的质量和进度。"}}],"created":1721997646,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c278c@dx190ef0ec91db8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"如果你有任何开发相关的问题或需要技术支持,随时可以向我咨询。"}}],"created":1721997646,"model":"generalv3.5","object":"chat.completion","usage":{"prompt_tokens":10,"completion_tokens":76,"total_tokens":86}}
curl --location 'http://127.0.0.1:10000/v1/chat/completions' \
--header 'Content-Type:  application/json' \
--data '{
    "model":"generalv3.5",
    "messages": [
        {
            "role": "system",
            "content": "你是一名专业的开发人员!"
        },
        {
            "role": "user",
            "content": "你好,你是谁?"
        }
    ]
}'

回复

{
    "id": "cha000c2941@dx190ef106619b8f2532",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "你好!我是一名专业的开发人员,擅长编写和优化代码,解决技术难题,并协助团队在软件开发项目中实现目标。如果你有任何编程或开发相关的问题,我很乐意为你提供帮助。"
            }
        }
    ],
    "created": 1721997751,
    "model": "generalv3.5",
    "object": "chat.completion",
    "usage": {
        "prompt_tokens": 10,
        "completion_tokens": 39,
        "total_tokens": 49
    }
}
  • spark协议,流式
curl --location 'http://127.0.0.1:10000/v1/chat/completions' \
--header 'Content-Type:  application/json' \
--data '{
    "model":"generalv3.5",
    "messages": [
        {
            "role": "system",
            "content": "你是一名专业的开发人员!"
        },
        {
            "role": "user",
            "content": "你好,你是谁?"
        }
    ],
    "stream": true
}'

回复

data: {"id":"cha000c2a8c@dx190ef11a5cdb8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"你好"}}],"created":1721997830,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c2a8c@dx190ef11a5cdb8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"!我是一名"}}],"created":1721997830,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c2a8c@dx190ef11a5cdb8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"专业的软件开发人员"}}],"created":1721997830,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c2a8c@dx190ef11a5cdb8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":",拥有丰富的编程"}}],"created":1721997831,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c2a8c@dx190ef11a5cdb8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"经验和技术知识。"}}],"created":1721997831,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c2a8c@dx190ef11a5cdb8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"我擅长解决各种计算机相关的问题,无论是关于编程语言、软件开发流程、系统架构还是性能优化等方面的问题,我都乐于为你提供帮助。"}}],"created":1721997833,"model":"generalv3.5","object":"chat.completion","usage":{}}

data: {"id":"cha000c2a8c@dx190ef11a5cdb8f2532","choices":[{"index":0,"delta":{"role":"assistant","content":"有什么我可以帮助你的吗?"}}],"created":1721997834,"model":"generalv3.5","object":"chat.completion","usage":{"prompt_tokens":10,"completion_tokens":49,"total_tokens":59}}

Ⅴ. Special notes for reviews

我有一些问题:

  • 我在上方的envoy.yaml中没有放入真实可用的key和secret,如果有需要的话,我可以重新修改这个pr的文本内容。
  • 似乎在使用spark的Lite模型时,该模型不会利用到system prompt
  • 我不是很懂这个请求头的删除和对cotextCache的使用,依葫芦画瓢写的,如有错误还请多多指教。

编程中用到了通义灵码的代码优化建议

image

@CLAassistant
Copy link

CLAassistant commented Jul 20, 2024

CLA assistant check
All committers have signed the CLA.

@cr7258
Copy link
Collaborator

cr7258 commented Jul 22, 2024

我在envoy.yaml中设置了一个没有用到的apiTokens值,似乎不设置会报错。可能我可以把我新增的sparkAuthSecret替换为apiTokens[0]?

proxywasm.ReplaceHttpRequestHeader(authorizationKey, "Bearer "+p.config.sparkAuthKey+":"+p.config.sparkAuthSecret)

因为你是用 sparkAuthKey 和 sparkAuthSecret 来设置 Authorization Header 的,没有用到 apiTokens。
讯飞星火的 API Token 相当于是用 APIKey 和 APISecret 拼接起来的,所以我觉得你可以不用额外新增 sparkAuthKey 和 sparkAuthSecret 这两个字段,而是仍然使用通用的 apiTokens 字段,只需要在你的示例中告诉用户 apiTokens 的值是 APIKey:APISecret 的形式即可。 这样有个好处是可以配置多个 apiTokens。

Comment on lines 254 to 270
// Copied from request_helper::insertContextMessage
fileMessage := chatMessage{
Role: roleSystem,
Content: content,
}
var firstNonSystemMessageIndex int
for i, message := range request.Messages {
if message.Role != roleSystem {
firstNonSystemMessageIndex = i
break
}
}
if firstNonSystemMessageIndex == 0 {
request.Messages = append([]chatMessage{fileMessage}, request.Messages...)
} else {
request.Messages = append(request.Messages[:firstNonSystemMessageIndex], append([]chatMessage{fileMessage}, request.Messages[firstNonSystemMessageIndex:]...)...)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里为什么要拷贝 insertContextMessage 函数里的内容,而不是直接调用 insertContextMessage 函数呢?

Copy link
Contributor Author

@urlyy urlyy Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

因为我看insertContextMessage的参数要求是chatCompletionRequest,而不是我自定义的sparkRequest。我的理解是chatCompletionRequest是属于OpenAI协议的请求对象吧

Copy link
Contributor Author

@urlyy urlyy Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

抱歉,似乎没能成功添加system prompt这个问题似乎是星火的Lite模型没有利用到system prompt,我刚试出来openaispark两种协议下Lite模型都没有变成system prompt中设置的角色,改为Max模型就成功变了。之前没发现这个可能是Max的回答被缓存再利用了。

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

光从回答的内容看不出 system role是不是成功设置了,就算是使用 max,有时候回答也看不出来。

curl -i -k -X POST 'https://spark-api-open.xf-yun.com/v1/chat/completions' \
--header 'Authorization: Bearer <APIKEY>:<APISECERT>' \
--header 'Content-Type: application/json' \
--data '{
    "model":"generalv3.5",
    "messages": [
        {
            "role": "system",
            "content": "你是一名专业的翻译!"
        },
        {
            "role": "user",
            "content": "你是谁?"
        }
    ]
}'
HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 588
Connection: keep-alive
Access-Control-Allow-Origin: *
Cache-Control: no-cache
Date: Fri, 26 Jul 2024 00:42:58 GMT
X-Kong-Upstream-Latency: 6252
X-Kong-Proxy-Latency: 1
Via: kong/1.3.0

{"code":0,"message":"Success","sid":"cha000c7216@dx190ec7d9771b8f2532","choices":[{"message":{"role":"assistant","content":"您好,我是讯飞星火认知大模型,由科大讯飞构建。\n作为一款先进的认知智能大模型,我的设计宗旨在于模仿人类的认知过程,以便更自然地与人交流、解决问题,并在各种领域内提供智能支持。我的能力覆盖了从语言理解到复杂问题解答的广泛范围,旨在帮助用户高效完成各类任务。"},"index":0}],"usage":{"prompt_tokens":8,"completion_tokens":76,"total_tokens":84}}%                                                                                                                                                                                                                 

@urlyy
Copy link
Contributor Author

urlyy commented Jul 22, 2024

我重新修改了一下,移除了spark的authKey和authSecret。但我在本地如果不重新go mod tidy,会出现"github.com/higress-group/proxy-wasm-go-sdk/proxywasm/types"这个依赖爆红,我不知道其他人有没有这个情况。我这次commit把两个go.sum的更改也提交了,如果不应该改的话,我可以把go.sum改回去。

@johnlanni johnlanni requested a review from cr7258 July 25, 2024 12:46
@urlyy
Copy link
Contributor Author

urlyy commented Jul 26, 2024

@cr7258 非常感谢您的耐心review!根据您的建议我进行了修改

  1. 将请求路径换成固定的那个了,并在OnRequestHeaders中设置路径
  2. 将模型从需要Lite这种改为需要general这种,将modelMapping之外的模型名映射删掉了
  3. 移除了insertXXContext,因为星火的completions接口不支持上传文件
  4. 在流式内改为设置Delta
  5. 将上方pr里的内容和README.md同步修改了

除此之外,虽然是否返回调用的模型名是大模型厂商应该做的,我还是根据用户的model参数设置了response里的model字段值。但我发现星火如果用original协议,且请求的模型名不存在的话,也会调用generalv3,这可能会导致higress使用protocol : original,同时request { model : qwer },higress返回的是response { model : qwer },但实际是generalv3被调用了。不知道这里是否要处理一下

@cr7258
Copy link
Collaborator

cr7258 commented Jul 27, 2024

image 使用 NextChat 验证了一下,可以正常使用。

@cr7258
Copy link
Collaborator

cr7258 commented Jul 27, 2024

除此之外,虽然是否返回调用的模型名是大模型厂商应该做的,我还是根据用户的model参数设置了response里的model字段值。但我发现星火如果用original协议,且请求的模型名不存在的话,也会调用generalv3,这可能会导致higress使用protocol : original,同时request { model : qwer },higress返回的是response { model : qwer },但实际是generalv3被调用了。不知道这里是否要处理一下

这个我觉得可以不用处理,因为星火的原始响应里面没有包含 model 信息,所以我们只能根据用户设置的 model参数来设置响应的 model。

Copy link
Collaborator

@cr7258 cr7258 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🐉 LGTM

@codecov-commenter
Copy link

codecov-commenter commented Jul 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 36.03%. Comparing base (ef31e09) to head (a22bd38).
Report is 17 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1139      +/-   ##
==========================================
+ Coverage   35.91%   36.03%   +0.12%     
==========================================
  Files          69       69              
  Lines       11576     9483    -2093     
==========================================
- Hits         4157     3417     -740     
+ Misses       7104     5750    -1354     
- Partials      315      316       +1     

see 67 files with indirect coverage changes

@cr7258 cr7258 requested a review from johnlanni August 6, 2024 13:04
Copy link
Collaborator

@johnlanni johnlanni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@johnlanni johnlanni merged commit c78ef70 into alibaba:main Aug 8, 2024
12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

AI 代理 Wasm 插件对接讯飞星火认知大模型
5 participants