Deploying Many Models Efficiently with Ray Serve

Model Composition

Recent patterns: one application, multiple models.

Efficient hardware usage
- Share resources
- Independent scaling
Operational overhead
- Testing / DevOps
- Observability (Monitoring)
- Independent upgrades

existing-solution-monolith

The ‘shove all of your models into one box’ solution

existing-solution-microservice

Separating models into individual microservices

Capabilities
- Independent scaling
- Independent upgrades
Limitations
- Cannot share resources between models
- Need to set up a lot of things:
  - Intra-service communications
  - DB for each microservice
  - Monitoring each microservice
  - Testing each microservice

ray-serve-model-composition

One application, multiple models composed together in a single Ray cluster
Different HW resource requirements for each model
Autoscaled independently
Fractional resource allocations
For one application, still lacks the capability to upgrade models independently

multi-app-model-composition

Anyscale. (2023, October 12). Deploying Many Models Efficiently with Ray Serve. YouTube. https://www.youtube.com/watch?v=QUYucglQzBw