-
-
Notifications
You must be signed in to change notification settings - Fork 9k
Pull requests: vllm-project/vllm
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
[Misc] create/add dp bundles with current platform's device key
v1
#21726
opened Jul 28, 2025 by
thxCode
[Bugfix] [AMD] [ROCm] pd disagg for rocm fix
rocm
Related to AMD ROCm
#21724
opened Jul 28, 2025 by
seungrokj
3 of 4 tasks
[Frontend] Add LLM.reward specified to reward models
documentation
Improvements or additions to documentation
frontend
#21720
opened Jul 28, 2025 by
noooop
4 tasks
[Model] Prioritize Transformers fallback over suffix matching
multi-modality
Related to multi-modality (#4194)
new-model
Requests to new models
ready
ONLY add when PR is ready to merge/full CI is needed
#21719
opened Jul 28, 2025 by
DarkLight1337
1 of 4 tasks
[Feat] Support Flashinfer TRT-LLM FP8-query/output Attention Kernel
llama
Related to Llama models
performance
Performance-related issues
rocm
Related to AMD ROCm
v1
#21716
opened Jul 28, 2025 by
elvischenv
•
Draft
4 tasks
[Bugfix][Frontend] Fix the problem that the non-existent handler returns a 200 status code
frontend
#21705
opened Jul 28, 2025 by
kebe7jun
3 of 4 tasks
[Benchmark] Support ready check timeout in Performance-related issues
ready
ONLY add when PR is ready to merge/full CI is needed
vllm bench serve
performance
#21696
opened Jul 28, 2025 by
yeqcharlotte
3 of 4 tasks
[Misc] Add unit tests for chunked local attention
v1
#21692
opened Jul 27, 2025 by
sarckk
3 of 4 tasks
[BugFix] Potential fix for FlashMLA full cuda-graph + DP
v1
#21691
opened Jul 27, 2025 by
LucasWilkinson
•
Draft
4 tasks
Migrate InternVLImageInputs and InternVLVideoInputs to TensorSchema
ready
ONLY add when PR is ready to merge/full CI is needed
#21684
opened Jul 27, 2025 by
bbeckca
[Misc] Remove duplicate code and fix comment errors to improve code readability
v1
#21673
opened Jul 27, 2025 by
tanruixiang
3 of 4 tasks
[Model] [Draft PR] Add support for SmallThinker model series
documentation
Improvements or additions to documentation
new-model
Requests to new models
#21670
opened Jul 27, 2025 by
SorryMaker2022
•
Draft
4 tasks
Keep reasoning content before applying chat template
frontend
#21655
opened Jul 26, 2025 by
lhdeng-gh
3 of 4 tasks
Limit concurrent long partial prefills via max_long_partial_prefills
v1
#21651
opened Jul 26, 2025 by
pansicheng
3 of 4 tasks
Fix(benchmarks): Correct tqdm import to resolve TypeError in benchmark_w8a8_block_fp8.py
performance
Performance-related issues
#21650
opened Jul 26, 2025 by
Aymendje
4 tasks done
[Misc] refactor code return slice without brackets
frontend
llama
Related to Llama models
new-model
Requests to new models
performance
Performance-related issues
structured-output
tool-calling
tpu
Related to Google TPUs
v1
#21649
opened Jul 26, 2025 by
andyxning
4 tasks
Previous Next
ProTip!
Type g p on any issue or pull request to go back to the pull request listing page.