Model Composition

Recent patterns: one application, multiple models.

Challenges

  • Efficient hardware usage
    • Share resources
    • Independent scaling
  • Operational overhead
    • Testing / DevOps
    • Observability (Monitoring)
    • Independent upgrades

Existing Solutions

Monolith

existing-solution-monolith

The ‘shove all of your models into one box’ solution

  • Limitations
    • No independent scaling
    • No independent upgrades

Microservices

existing-solution-microservice

Separating models into individual microservices

  • Capabilities
    • Independent scaling
    • Independent upgrades
  • Limitations
    • Cannot share resources between models
    • Need to set up a lot of things:
      • Intra-service communications
      • DB for each microservice
      • Monitoring each microservice
      • Testing each microservice

Ray Serve Model Composition

ray-serve-model-composition

  • One application, multiple models composed together in a single Ray cluster
  • Different HW resource requirements for each model
  • Autoscaled independently
  • Fractional resource allocations
  • For one application, still lacks the capability to upgrade models independently

Multi Application

multi-app-model-composition

  • Multiple endpoints (apps) per cluster
  • Different lifecycle per application
  • Easily add, delete, or update applications independently of other applications
  • Applications can share HW resources

References