vLLM Worker Controller

📄️ Background

This project, conducted as my Final Year Project at NTU, re-engineers the vLLM worker-controller architecture. My goal was to reduce cold start latency in large language model inference. You can view the repository here

📄️ Proposed Solution

Worker Controller

📄️ Technical Implementation

The Worker Controller project was built with a focus on leveraging Python's native capabilities for process management and inter-process communication, primarily utilizing the multiprocessing module. This approach allowed for fine-grained control over GPU worker lifecycles and efficient resource sharing.

📄️ Results

Summary