Cloud native ML pipelines using Kubeflow
Kubeflow pipelines + Argo workflows on Kubernetes
Kubeflow is now GA with certain components are stable, while others are still beta. Based on my experience trying some of the features so far, Kubeflow could provide a full-fledged cloud native ML solution.
Most cloud providers (GCP, Azure, AWS, IBM etc.) are actively contributing to Kubeflow.
Coming to the Kubeflow components, the central dashboard covers only some of the components and it’s still a work in progress.
Kubeflow pipelines + Argo has the potential to be a game changer
For me, the most interesting component is the pipelines. The pipelines component uses Agro as the workflow Orchestrator. Argo itself has four options Argo Workflows, Argo Events, Argo CD, Argo Rollouts. Additionally, Argo has better UI and API integration than Airflow, which makes a difference where Airflow is found lacking.
Kubeflow pipelines samples code is currently closely tied to GCP. As it’s based on kubernetes, it should be relatively easy to setup this on AWS, Azure and OpenShift.
You can create your own pipelines or re-use the pipelines created by others. To create a pipeline you need to use the pipelines SDK . You can version your pipelines as well.
Experiments and Runs
You can schedule your serving/experiments/runs (cron format) or just do one-off runs. There is Kubeflow Pipelines API as well, which could potentially be integrated with Argo event triggers (Kafka, AWS SNS/SQS, File watcher etc.) and Argo rollouts (Blue/Green, Canary etc.). All the information about pipelines, experiments, runs is stored in metadata (no additional setup is required) and there is an API server for metadata.
The pipeline could be a simple data preparation or featurization process or a very complex ML pipeline (A/B testing, parallel experiments, automated recursive serving etc.)
There is another update coming soon in end of June, which is expected to enhance the pipelines component.
Summary after initial trail run : Kubeflow has the potential to super charge and democratize the whole Machine Learning industry. From enterprises to startups to research organizations could benefit from this opensource initiative.