Background: I tried Kubeflow 1.0 in May 2020, with a narrow focus of Cloud native ML pipelines.
With the latest Kubeflow 1.3 release, they streamlined the setup process, improved security and user experience. Even with these updates, there is still a learning curve for non-technical/non-engineering users. Another improvement is the ability to pick and choose the components you want to install.
IMO, the ideal use-case is a cross-functional Data Science team with a mix of Platform Engineers, ML Engineers and Data Scientists.
History: Feast has been through several revisions in the past year. With the current version (0.9), its possible to setup end-to-end on a barebones k8s cluster.
Feast team is currently working on version 0.10 to be released in April 2021 (which is expected to further simplify the architecture and the setup). There are companies around the world that are already using Feast or in the process of integrating.
Background (From Feast website): Feast (Feature Store) is an operational data system for managing and serving machine learning features to models in production.
As you can see below, even with the basic…
RayML + Kubernetes = Finally, a truly scalable Distributed ML solution
Background (From Ray website): Ray is an open-source distributed execution framework that makes it easy to scale your applications and to leverage state of the art machine learning libraries. Ray provides a simple, universal API for building distributed applications (supports Python and Java API).
I first came across RayML on Software Engineering daily podcast in July 2020.
In September 2020, I attended Ray Summit 2020 organized by Anyscale (A startup founded by the creators of Ray from the UC Berkeley RISELab, the successor to the AMPLab, that created Apache…
Kubeflow pipelines + Argo workflows on Kubernetes
Kubeflow is now GA with certain components are stable, while others are still beta. Based on my experience trying some of the features so far, Kubeflow could provide a full-fledged cloud native ML solution.
Most cloud providers (GCP, Azure, AWS, IBM etc.) are actively contributing to Kubeflow.
Coming to the Kubeflow components, the central dashboard covers only some of the components and it’s still a work in progress.
For me, the most interesting component is the pipelines. The pipelines component uses Agro as the workflow Orchestrator. Argo itself has four options Argo Workflows…
Why #BlackLivesMatter : To better understand systematic anti-black racism and black perspective
This book list is obviously highly subjective and based on my limited readings
This is by far one my favorite books ever. Malcolm X’s deep honesty, moral clarity and magnetic personality grabs your attention.
2. Americanah by Chimamanda Ngozi Adichie
Great perspective from an Immigrant Black Woman. Even though it’s a fictional story, it rings true. Chimamanda’s other books We Should All Be Feminists and Half of a Yellow Sun are great reads as well.
How Agile transformed Tech industry and super charged software delivery
Background : I first came across Agile methodologies during my graduate school program (Distributed and Multimedia Information Systems at Heriot-Watt University in Edinburgh, UK) . My first thoughts after learning the concepts such as Agile unified process (AUP), Extreme programming (XP), Test-driven development (TDD) and Pair programming was, why would companies/teams use anything other than these new software development practices.
Later, when I worked at PayPal, Yodlee and CareFirst BCBS, I could clearly see the benefits of using Agile methodologies in the real world. …
Basic mis-understandings of each others work
Background: For the past 3 years, I have been working at the intersection of Cloud (AWS, OpenShift/Kubernetes, Docker, Snowflake), Software Engineering (UI and Microservices), and Data Science (Python, Spark, H2O, R, SAS). Its been a great learning experience working with talented engineers in building AIR9 Data Science Platform.
This post is about my observations working with Data Engineers and Data Scientists/Analysts, and their blind-spots when it comes to Machine learning projects.
Side note #1: There is an interesting back story to the word “data scientist”. …
Setup Kubernetes on a AWS EC2 instance
What? : https://microk8s.io/
sudo snap list
sudo snap install microk8s --classic --channel=stable
3. Check if the cluster is up and running
sudo microk8s.kubectl cluster-info
4. Enable DNS, Storage, Dashboard, Istio and Prometheus
sudo microk8s.enable dns storage dashboard ingress istio prometheus
5. Setup kubectl
sudo snap alias microk8s.kubectl kubectl
sudo usermod -a -G microk8s ubuntu
6. Setup kube config
sudo chown -f -R ubuntu ~/.kube
sudo microk8s.kubectl config view…