vllm-project / vllm Public

Notifications
Fork 9k
Star 53.4k

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Pull requests: vllm-project/vllm

Labels 56 Milestones 0

New pull request New

859 Open 10,437 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

[Misc] create/add dp bundles with current platform's device key v1

#21726 opened Jul 28, 2025 by thxCode

Fix Arcee model weight loading: Add custom load_weights

#21725 opened Jul 28, 2025 by alyosha-swamy

[Bugfix] [AMD] [ROCm] pd disagg for rocm fix rocm

Related to AMD ROCm

#21724 opened Jul 28, 2025 by seungrokj

3 of 4 tasks

fix the mxfp4 packed qk weight loading issue for llama4 llama

Related to Llama models

#21722 opened Jul 28, 2025 by xuebwang • Draft

4 tasks

[Frontend] Add LLM.reward specified to reward models documentation

Improvements or additions to documentation

frontend

#21720 opened Jul 28, 2025 by noooop

4 tasks

[Model] Prioritize Transformers fallback over suffix matching multi-modality

Related to multi-modality (#4194)

new-model

Requests to new models

ready

ONLY add when PR is ready to merge/full CI is needed

#21719 opened Jul 28, 2025 by DarkLight1337

1 of 4 tasks

[Bugfix] Fix Ernie4_5_MoeForCausalLM shared experts

#21717 opened Jul 28, 2025 by jeejeelee

4 tasks

[Feat] Support Flashinfer TRT-LLM FP8-query/output Attention Kernel llama

Related to Llama models

performance

Performance-related issues

rocm

Related to AMD ROCm

#21716 opened Jul 28, 2025 by elvischenv • Draft

4 tasks

[Bugfix][Frontend] Fix the problem that the non-existent handler returns a 200 status code frontend

#21705 opened Jul 28, 2025 by kebe7jun

3 of 4 tasks

[XPU] IPEX-optimized Punica Wrapper on XPU

#21703 opened Jul 28, 2025 by chaojun-zhang

update flashinfer to v0.2.9rc2 ci/build

#21701 opened Jul 28, 2025 by weireweire

3 of 4 tasks

[Benchmark] Support ready check timeout in vllm bench serve performance

Performance-related issues

ready

ONLY add when PR is ready to merge/full CI is needed

#21696 opened Jul 28, 2025 by yeqcharlotte

3 of 4 tasks

[Misc] Add unit tests for chunked local attention v1

#21692 opened Jul 27, 2025 by sarckk

3 of 4 tasks

[BugFix] Potential fix for FlashMLA full cuda-graph + DP v1

#21691 opened Jul 27, 2025 by LucasWilkinson • Draft

4 tasks

Deprecate V0 ci/build v1

#21690 opened Jul 27, 2025 by WoosukKwon

[Kernel][Triton] add bfloat16 support for awq

#21688 opened Jul 27, 2025 by mandeeplearning

4 tasks

Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema ready

ONLY add when PR is ready to merge/full CI is needed

#21684 opened Jul 27, 2025 by bbeckca

[feature] add log non default args in LLM frontend

#21680 opened Jul 27, 2025 by lengrongfu

4 tasks

[Misc] Remove duplicate code and fix comment errors to improve code readability v1

#21673 opened Jul 27, 2025 by tanruixiang

3 of 4 tasks

[Model] [Draft PR] Add support for SmallThinker model series documentation

Improvements or additions to documentation

new-model

Requests to new models

#21670 opened Jul 27, 2025 by SorryMaker2022 • Draft

4 tasks

Introduce RayPPCommunicator for ray-based PP

#21660 opened Jul 26, 2025 by ruisearch42

3 of 4 tasks

Keep reasoning content before applying chat template frontend

#21655 opened Jul 26, 2025 by lhdeng-gh

3 of 4 tasks

Limit concurrent long partial prefills via max_long_partial_prefills v1

#21651 opened Jul 26, 2025 by pansicheng

3 of 4 tasks

Fix(benchmarks): Correct tqdm import to resolve TypeError in benchmark_w8a8_block_fp8.py performance

Performance-related issues

#21650 opened Jul 26, 2025 by Aymendje

4 tasks done

[Misc] refactor code return slice without brackets frontend llama

Related to Llama models

new-model

Requests to new models

performance

Performance-related issues

structured-output tool-calling tpu

Related to Google TPUs

#21649 opened Jul 26, 2025 by andyxning

4 tasks

Previous 1 2 3 4 5 … 34 35 Next

Previous Next

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Uh oh!