Model Composition
Recent patterns: one application, multiple models.
Challenges
- Efficient hardware usage
- Share resources
- Independent scaling
- Operational overhead
- Testing / DevOps
- Observability (Monitoring)
- Independent upgrades
Existing Solutions
Monolith
The ‘shove all of your models into one box’ solution
- Limitations
- No independent scaling
- No independent upgrades
Microservices
Separating models into individual microservices
- Capabilities
- Independent scaling
- Independent upgrades
- Limitations
- Cannot share resources between models
- Need to set up a lot of things:
- Intra-service communications
- DB for each microservice
- Monitoring each microservice
- Testing each microservice
Ray Serve Model Composition
- One application, multiple models composed together in a single Ray cluster
- Different HW resource requirements for each model
- Autoscaled independently
- Fractional resource allocations
- For one application, still lacks the capability to upgrade models independently
Multi Application
- Multiple endpoints (apps) per cluster
- Different lifecycle per application
- Easily add, delete, or update applications independently of other applications
- Applications can share HW resources
References
- Anyscale. (2023, October 12). Deploying Many Models Efficiently with Ray Serve. YouTube. https://www.youtube.com/watch?v=QUYucglQzBw