aiorhuman-gpt

Config and Deploy Infrastructure

←Back

Image source: GitHub - allegroai/clearml-serving

This setup orchestrates a scalable ML serving infrastructure using ClearML, integrating Kafka for message streaming, ZooKeeper for Kafka coordination, Prometheus for monitoring, Alertmanager for alerts, Grafana for visualization, and Triton for optimized model serving.

COMPLETED:

Setup your ClearML Server or use the Free tier Hosting
Setup local access (if you haven’t already), see instructions here

Install clearml-serving CLI:

pip install clearml-serving #you prob did this already

Create the Serving Service Controller.

clearml-serving create --name "aiorhu demo"`

The new serving service UID should be printed New Serving Service created: id=ooooahah12345 Lets look at this in ClearML.

Write down the Serving Service ID

Edit the docker-compose-triton.yml file.

clearml-serving/docker/docker-compose-triton.yml

find the var: CLEARML_EXTRA_PYTHON_PACKAGES and add the packages you need for your model. we’ll add ours here.

CLEARML_EXTRA_PYTHON_PACKAGES: ${CLEARML_EXTRA_PYTHON_PACKAGES:-textstat empath torch transformers nltk openai datasets diffusers benepar spacy sentence_transformers optuna interpret markdown bs4}

Edit the environment variables file

(docker/example.env) with your clearml-server credentials and Serving Service UID. For example, you should have something like

  CLEARML_WEB_HOST="https://app.clear.ml"
  CLEARML_API_HOST="https://api.clear.ml"
  CLEARML_FILES_HOST="https://files.clear.ml"
  CLEARML_API_ACCESS_KEY="<access_key_here>"
  CLEARML_API_SECRET_KEY="<secret_key_here>"
  CLEARML_SERVING_TASK_ID="<serving_service_id_here>"

Spin the clearml-serving containers

using docker-compose (or if running on Kubernetes use clearml-serving Helm Chart)
We are deploying a Pytorch model. So we want to use NVIDIA Triton Inference https://developer.nvidia.com/triton-inference-server, made for gpu, but it will work on cpu dev machine (my laptop in this case). In production using k8 and help charts is the ay to go. https://github.com/allegroai/clearml-helm-charts

cd docker && docker-compose --env-file example.env -f docker-compose-triton.yml up 

Notice: Any model that registers with “Triton” engine, will run the pre/post processing code on the Inference service container, and the model inference itself will be executed on the Triton Engine container.