I introduced KServe as a scalable, cloud native, open source model server in the previous article. This tutorial will walk you through all the steps required to install and configure KServe on a Google Kubernetes Engine cluster powered by Nvidia T4 GPUs. We will then deploy a TensorFlow model to perform inference.
Step 1 – Launch a GKE Cluster with T4 GPU Node
Assuming you have access to Google Cloud Platform, run the following command to launch a 3-node cluster configured to use one Nvidia T4 GPU. Replace the project, zone, and other values appropriately to reflect your environment.
<br />
gcloud beta container clusters create “tns-kserve”<br />
–project “janakiramm-sandbox”<br />
–zone “asia-southeast1-c”<br />
–no-enable-basic-auth<br />
–cluster-version “1.22.4-gke.1501″<br />
–machine-type “n1-standard-4″<br />
–accelerator “type=nvidia-tesla-t4,count=1″<br />
–num-nodes “3”<br />
–image-type “UBUNTU_CONTAINERD”<br />
–disk-type “pd-standard”<br />
–disk-size “100”<br />
–scopes “https://www.googleapis.com/auth/devstorage.read_only”,”https://www.googleapis.com/auth/logging.write”,”https://www.googleapis.com/auth/monitoring”,”https://www.googleapis.com/auth/servicecontrol”,”https://www.googleapis.com/auth/service.management.readonly”,”https://www.googleapis.com/auth/trace.append”
1 2 3 4 5 6 7 8 9 10 11 12 | gcloud beta container clusters create “tns-kserve” —project “janakiramm-sandbox” —zone “asia-southeast1-c” —no–enable–basic–auth —cluster–version “1.22.4-gke.1501” —machine–type “n1-standard-4” —accelerator “type=nvidia-tesla-t4,count=1” —num–nodes “3” —image–type “UBUNTU_CONTAINERD” —disk–type “pd-standard” —disk–size “100” —scopes “https://www.googleapis.com/auth/devstorage.read_only”,“https://www.googleapis.com/auth/logging.write”,“https://www.googleapis.com/auth/monitoring”,“https://www.googleapis.com/auth/servicecontrol”,“https://www.googleapis.com/auth/service.management.readonly”,“https://www.googleapis.com/auth/trace.append”

Add a cluster-admin role for the GCP user.
<br />
kubectl create clusterrolebinding cluster-admin-binding<br />
–clusterrole=cluster-admin<br />
–user=$(gcloud config get-value core/account)
1 2 3 | kubectl create clusterrolebinding cluster–admin–binding —clusterrole=cluster–admin —user=$(gcloud config get–value core/account)
Install the device plugin for Nvidia T4 GPU and validate that it is accessible.
<br />
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml
1 | kubectl apply –f https://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nvidia-driver-installer/ubuntu/daemonset-preloaded.yaml
<br />
kubectl get pods -n kube-system -l k8s-app=nvidia-gpu-device-plugin
1 | kubectl get pods –n kube–system –l k8s–app=nvidia–gpu–device–plugin
Create a pod to test the access based on the Nvidia CUDA image.
<br />
apiVersion: v1<br />
kind: Pod<br />
metadata:<br />
name: my-gpu-pod<br />
spec:<br />
containers:<br />
– name: my-gpu-container<br />
image: nvidia/cuda:11.0.3-runtime-ubuntu20.04<br />
command: [“/bin/bash”, “-c”, “–“]<br />
args: [“while true; do sleep 600; done;”]<br />
resources:<br />
limits:<br />
nvidia.com/gpu: 1
1 2 3 4 5 6 7 8 9 10 11 12 13 | apiVersion: v1 kind: Pod metadata: name: my–gpu–pod spec: containers: – name: my–gpu–container image: nvidia/cuda:11.0.3–runtime–ubuntu20.04 command: [“/bin/bash”, “-c”, “–“] args: [“while true; do sleep 600; done;”] resources: limits: nvidia.com/gpu: 1
<br />
kubectl apply -f gpu-pod.yaml
1 | kubectl apply –f gpu–pod.yaml
Run the command nvidia-smi to test GPU access
<br />
kubectl exec -it my-gpu-pod — nvidia-smi
1 | kubectl exec –it my–gpu–pod — nvidia–smi

With the infrastructure in place, let’s proceed with KServe installation.
5 DevOps Pitfalls to Watch Out for – InApps 2022
Step 2 – Installing Istio
Istio is an essential prerequisite for KServe. Knative Serving relies on Istio ingress to expose KServe API endpoints. For version compatibility, check the documentation.
Download the Istio binary and your local workstation, and run the CLI for installation.
<br />
curl -L https://istio.io/downloadIstio | sh -<br />
istioctl install –set profile=demo -y
1 2 | curl –L https://istio.io/downloadIstio | sh – istioctl install —set profile=demo –y
Verify that all pods are in running state in the istio-system namespace.

Step 3 – Installing Knative Serving
Install Knative CRDs and core services.
<br />
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-crds.yaml<br />
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-core.yaml
1 2 | kubectl apply –f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-crds.yaml kubectl apply –f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-core.yaml
To integrate Knative with Istio Ingress, run the below commands.
<br />
kubectl apply -l knative.dev/crd-install=true -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml<br />
kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml</p>
<p>kubectl apply -f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/net-istio.yaml
1 2 3 4 | kubectl apply –l knative.dev/crd–install=true –f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml kubectl apply –f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/istio.yaml kubectl apply –f https://github.com/knative/net-istio/releases/download/knative-v1.2.0/net-istio.yaml
Finally, configure the DNS for Knative that points to the sslip.io domain.
<br />
kubectl apply -f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-default-domain.yaml
1 | kubectl apply –f https://github.com/knative/serving/releases/download/knative-v1.2.0/serving-default-domain.yaml
Make sure that Knative Serving is successfully running.

Step 4 – Installing Certificate Manager
Install cert manager with the following command:
<br />
kubectl apply -f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml
1 | kubectl apply –f https://github.com/cert-manager/cert-manager/releases/download/v1.7.1/cert-manager.yaml

Step 5 – Install KServe Model Server
We are now ready to install the KServe model server on the GKE Cluster.
<br />
kubectl apply -f https://github.com/kserve/kserve/releases/download/v0.7.0/kserve.yaml
1 | kubectl apply –f https://github.com/kserve/kserve/releases/download/v0.7.0/kserve.yaml
<br />
kubectl get pods -n kserve
1 | kubectl get pods –n kserve

KServe also installs a couple of custom resources. Check them out with the below command:
<br />
kubectl get crd | grep “kserve”
1 | kubectl get crd | grep “kserve”

Step 5 – Configuring Google Cloud Storage Bucket and Uploading a TensorFlow Model
KServe can pull models from a Google Cloud Storage (GCS) Bucket to serve them for inference. Let’s create the bucket and upload the model.
Update Six of the Best Open Source Data Mining Tools
We will use the model from one of my previous tutorials that trained a CNN model to classify dogs and cats for this scenario. You can download the pre-trained TensorFlow model from here. Unzip the file and run the below commands to create the GCS bucket and upload the model artifacts.
<br />
gsutil mb gs://tns-kserve<br />
gsutil iam ch allUsers:objectViewer gs://tns-kserve<br />
gsutil cp -R model/ gs://tns-kserve
1 2 3 | gsutil mb gs://tns-kserve gsutil iam ch allUsers:objectViewer gs://tns-kserve gsutil cp –R model/ gs://tns-kserve

For simplicity, we enabled public access to the bucket. But you may want to secure it and add the service account key as a secret for KServe to access the private bucket.
Step 6 – Creating and Deploying the TensorFlow Inference Service
Let’s go ahead and create an inference service pointing to the model uploaded to the GCS bucket. Notice that we use a node selector to ensure that the service utilizes the GPU for acceleration.
<br />
apiVersion: “serving.kserve.io/v1beta1″<br />
kind: “InferenceService”<br />
metadata:<br />
name: “dogs-vs-cats”<br />
spec:<br />
predictor:<br />
tensorflow:<br />
storageUri: “gs://tns-kserve/model”<br />
resources:<br />
limits:<br />
nvidia.com/gpu: 1<br />
requests:<br />
nvidia.com/gpu: 1
1 2 3 4 5 6 7 8 9 10 11 12 13 | apiVersion: “serving.kserve.io/v1beta1” kind: “InferenceService” metadata: name: “dogs-vs-cats” spec: predictor: tensorflow: storageUri: “gs://tns-kserve/model” resources: limits: nvidia.com/gpu: 1 requests: nvidia.com/gpu: 1
Wait for KServe to generate the endpoint for the inference service.
<br />
kubectl get inferenceservice
1 | kubectl get inferenceservice

Step 7 – Performing Inference with KServe and TensorFlow
Install the below Python modules in a virtual environment:
<br />
pip install pillow<br />
h5py<br />
tensorflow<br />
requests<br />
numpy
1 2 3 4 5 | pip install pillow h5py tensorflow requests numpy
Execute the client code with sample images of dogs and cats to see the inference in action.
<br />
import argparse<br />
import json</p>
<p>import numpy as np<br />
import requests<br />
import tensorflow<br />
import PIL<br />
from tensorflow.keras.preprocessing import image</p>
<p>ap = argparse.ArgumentParser()<br />
ap.add_argument(“-i”, “–image”, required=True,<br />
help=”path of the image”)<br />
ap.add_argument(“-u”, “–uri”, required=True,<br />
help=”URI of model server”)</p>
<p>args = vars(ap.parse_args())</p>
<p>image_path = args[‘image’]<br />
uri = args[‘uri’]</p>
<p>img = image.img_to_array(image.load_img(image_path, target_size=(128, 128))) / 255.</p>
<p>payload = {<br />
“instances”: [{‘conv2d_input’: img.tolist()}]<br />
}</p>
<p>r = requests.post(uri+’/v1/models/dogs-vs-cats:predict’, json=payload)<br />
pred = json.loads(r.content.decode(‘utf-8’))<br />
predict=np.asarray(pred[‘predictions’]).argmax(axis=1)[0]<br />
print( “Dog” if predict==1 else “Cat” )
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | import argparse import json import numpy as np import requests import tensorflow import PIL from tensorflow.keras.preprocessing import image ap = argparse.ArgumentParser() ap.add_argument(“-i”, “–image”, required=True, help=“path of the image”) ap.add_argument(“-u”, “–uri”, required=True, help=“URI of model server”) args = vars(ap.parse_args()) image_path = args[‘image’] uri = args[‘uri’] img = image.img_to_array(image.load_img(image_path, target_size=(128, 128))) / 255. payload = { “instances”: [{‘conv2d_input’: img.tolist()}] } r = requests.post(uri+‘/v1/models/dogs-vs-cats:predict’, json=payload) pred = json.loads(r.content.decode(‘utf-8’)) predict=np.asarray(pred[‘predictions’]).argmax(axis=1)[0] print( “Dog” if predict==1 else “Cat” )


<br />
python infer.py<br />
-u http://dogs-vs-cats.default.34.126.156.171.sslip.io<br />
-i sample1.jpg
1 2 3 | python infer.py –u http://dogs-vs-cats.default.34.126.156.171.sslip.io –i sample1.jpg


<br />
python infer.py<br />
-u http://dogs-vs-cats.default.34.126.156.171.sslip.io<br />
-i sample2.jpg
1 2 3 | python infer.py –u http://dogs-vs-cats.default.34.126.156.171.sslip.io –i sample2.jpg
This concludes the end-to-end tutorial on KServe which covered everything you need to explore the popular model server.
Continuous Improvement Metrics for Scaling Engineering Teams – InApps 2022
Feature Image by Rudy and Peter Skitterians from Pixabay.
Related Articles
Want to apply these insights?
Our AI architects offer free 45-minute consultations to discuss your specific use case.
Book a Discovery Call


