Files
lab/ds/25-1/5/03_model_deployment_for_inference.ipynb
2026-02-13 14:03:28 +03:00

1852 lines
91 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

{
"cells": [
{
"cell_type": "markdown",
"id": "5c62cc95-69e4-4820-9a0c-e605ad87b25e",
"metadata": {},
"source": [
"<a href=\"https://www.nvidia.com/dli\"> <img src=\"images/DLI_Header.png\" alt=\"Header\" style=\"width: 400px;\"/> </a>"
]
},
{
"cell_type": "markdown",
"id": "fd0863b8-d8e1-4ae8-88d8-154e69e14de1",
"metadata": {
"tags": []
},
"source": [
"# Computer Vision for Industrial Inspection #"
]
},
{
"cell_type": "markdown",
"id": "157e3b9e-1612-460a-a3b1-1560f0651441",
"metadata": {
"tags": []
},
"source": [
"## Part 3 - Model Deployment for Inference ##\n",
"In this notebook, we will take our previously trained classification model, export it as a TensorRT engine, and deploy it on Triton Inference Server. TensorRT is a highly optimized package that takes trained models and optimizes them for inference. We'll learn how to create the model directory structures and configuration files within Triton Inference Server and how to send inference requests to the models deployed within it.\n",
"\n",
"**Table of Contents**\n",
"<br>\n",
"This notebook covers the below sections: \n",
"1. [Setting Up Environment](#s3-1)\n",
" * [Set Up Environment Variables](#s3-1.1)\n",
" * [TAO Toolkit Model Export](#s3-1.2)\n",
" * [TensorRT - Programmable Inference Accelerator](#s3-1.3)\n",
" * [Export the Trained Model](#s3-1.4)\n",
"2. [Introduction to Triton Inference Server](#s3-2)\n",
" * [Server](#s3-2.1)\n",
" * [Client](#s3-2.2)\n",
" * [Model Repository](#s3-2.3)\n",
" * [Exercise #1 - Model Configuration](#s3-e1)\n",
"3. [Run Inference on Triton Inference Server](#s3-3)\n",
" * [Server Health Status](#s3-3.1)\n",
" * [Prepare Data](#s3-3.2)\n",
" * [Exercise #2 - Pre-process Inputs](#s3-e2)\n",
" * [Send Inference Request to Server](#s3-3.3)\n",
" * [Measure Performance](#s3-3.4)\n",
"4. [Run Batch Inference](#s3-4)\n",
"5. [Run FP16 Inference](#s3-5)\n",
"6. [Conclusion](#s3-6)"
]
},
{
"cell_type": "markdown",
"id": "4531dd37-e33c-4d3d-a2a4-6ddc397021bb",
"metadata": {},
"source": [
"<a name='s3-1'></a>\n",
"## Set Up Environment ##"
]
},
{
"cell_type": "markdown",
"id": "458295cd-dc2a-4986-92f4-01f89ede9b25",
"metadata": {},
"source": [
"<a name='s3-1.1'></a>\n",
"### Set Up Environment Variables ###\n",
"We set up a couple of environment variables to help us mount the local directories to the TAO container. Specifically, we want to set paths for the `$LOCAL_TRAINING_DATA`, `$LOCAL_SPEC_DIR`, and `$LOCAL_PROJECT_DIR` for the output of the TAO experiment with their respective paths in the TAO container. In doing so, we can make sure that the TAO experiment generated collaterals such as checkpoints, model files (e.g. `.tlt` or `.etlt`), and logs are output to `$LOCAL_PROJECT_DIR/classification`. \n",
"\n",
"_Note that users will be able to define their own export encryption key when training from a general-purpose model. This is to protect proprietary IP and used to decrypt the `.etlt` model during deployment._"
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "1f9cf3fa-d682-4716-b03f-54d7c6d7dbb5",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"env: KEY=my_model_key\n",
"env: LOCAL_PROJECT_DIR=/dli/task/tao_project\n",
"env: LOCAL_SPECS_DIR=/dli/task/tao_project/spec_files\n",
"env: TAO_PROJECT_DIR=/workspace/tao-experiments\n",
"env: TAO_SPECS_DIR=/workspace/tao-experiments/spec_files\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# set environment variables\n",
"import os\n",
"import pandas as pd\n",
"import time\n",
"import shutil\n",
"import json\n",
"import numpy as np\n",
"from PIL import Image\n",
"import warnings\n",
"warnings.filterwarnings(\"ignore\")\n",
"\n",
"%set_env KEY=my_model_key\n",
"\n",
"%set_env LOCAL_PROJECT_DIR=/dli/task/tao_project\n",
"%set_env LOCAL_SPECS_DIR=/dli/task/tao_project/spec_files\n",
"os.environ[\"LOCAL_EXPERIMENT_DIR\"]=os.path.join(os.getenv(\"LOCAL_PROJECT_DIR\"), \"classification\")\n",
"\n",
"%set_env TAO_PROJECT_DIR=/workspace/tao-experiments\n",
"%set_env TAO_SPECS_DIR=/workspace/tao-experiments/spec_files\n",
"os.environ['TAO_EXPERIMENT_DIR']=os.path.join(os.getenv(\"TAO_PROJECT_DIR\"), \"classification\")\n",
"\n",
"# unzip\n",
"!unzip -qq data/viz_BYD_new.zip -d data\n",
"\n",
"# remove zip file\n",
"!rm data/viz_BYD_new.zip"
]
},
{
"cell_type": "markdown",
"id": "04ed95ee-95e4-4c50-bf7f-8124a919fbf7",
"metadata": {},
"source": [
"The cell below maps the project directory on your local host to a workspace directory in the TAO docker instance, so that the data and the results are mapped from in and out of the docker. This is done by creating a `.tao_mounts.json` file. For more information, please refer to the [launcher instance](https://docs.nvidia.com/tao/tao-toolkit/tao_launcher.html) in the user guide. Setting the `DockerOptions` ensures that you don't have permission issues when writing data into folders created by the TAO docker."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "d6439399-3f44-44a2-b39e-f8c6d49601a6",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# mapping up the local directories to the TAO docker\n",
"mounts_file = os.path.expanduser(\"~/.tao_mounts.json\")\n",
"\n",
"drive_map = {\n",
" \"Mounts\": [\n",
" # Mapping the data directory\n",
" {\n",
" \"source\": os.environ[\"LOCAL_PROJECT_DIR\"],\n",
" \"destination\": \"/workspace/tao-experiments\"\n",
" },\n",
" ],\n",
" \"DockerOptions\": {\n",
" \"user\": \"{}:{}\".format(os.getuid(), os.getgid())\n",
" }\n",
"}\n",
"\n",
"# writing the mounts file\n",
"with open(mounts_file, \"w\") as mfile:\n",
" json.dump(drive_map, mfile, indent=4)"
]
},
{
"cell_type": "markdown",
"id": "334e39fd-1075-4905-a461-87962a13168f",
"metadata": {},
"source": [
"<a name='s3-1.2'></a>\n",
"### TAO Toolkit Model Export ###\n",
"Once we are satisfied with our model, we can move to deployment. `classification_tf1` includes an `export` subtask to export and prepare a trained classification model for deployment. Exporting the model decouples the training process from deployment and allows conversion to TensorRT engines outside the TAO environment. TensorRT engines are specific to each hardware configuration and should be generated for each unique inference environment. This may be interchangeably referred to as the `.trt` or `.engine` file. The same exported TAO model may be used universally across training and deployment hardware. This is referred to as the `.etlt` file, or encrypted TAO file. "
]
},
{
"cell_type": "markdown",
"id": "c8ca0740-bb42-4b94-a2d5-1fb91690cf52",
"metadata": {},
"source": [
"<a name='s3-1.3'></a>\n",
"### TensorRT - Programmable Inference Accelerator\n",
"\n",
"NVIDIA [TensorRT](https://developer.nvidia.com/tensorrt) is a platform for high-performance deep learning inference. It includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for deep learning inference applications. TensorRT-based applications perform up to 40x faster than CPU-only platforms during inference. \n",
"\n",
"With TensorRT, you can optimize neural network models trained in all major frameworks, calibrate for lower precision with high accuracy, and finally deploy to hyperscale data centers, embedded, or automotive product platforms.\n",
"\n",
"How does TensorRT enable optimizations on the layer graph: \n",
"1. Elimination of layers whose outputs are not used\n",
"2. Fusion of convolution, bias and ReLU operations\n",
"3. Aggregation of operations with sufficiently similar parameters and the same source tensor \n",
" (for example, the 1x1 convolutions in GoogleNet s inception module)\n",
"4. Merging of concatenation layers by directing layer outputs to the correct eventual destination.\n",
"\n",
"Here are some great resources to learn more about TensorRT:\n",
" \n",
"* Main Page: https://developer.nvidia.com/tensorrt\n",
"* Blogs: https://devblogs.nvidia.com/speed-up-inference-tensorrt/\n",
"* Download: https://developer.nvidia.com/nvidia-tensorrt-download\n",
"* Documentation: https://docs.nvidia.com/deeplearning/sdk/tensorrt-developer-guide/index.html\n",
"* Sample Code: https://docs.nvidia.com/deeplearning/sdk/tensorrt-sample-support-guide/index.html\n",
"* GitHub: https://github.com/NVIDIA/TensorRT\n",
"* NGC Container: https://ngc.nvidia.com/catalog/containers/nvidia:tensorrt"
]
},
{
"cell_type": "markdown",
"id": "8b25c136-a7a0-4ece-bcd1-acd4792008c8",
"metadata": {},
"source": [
"<a name='s3-1.4'></a>\n",
"### Export the Trained Model ###"
]
},
{
"cell_type": "raw",
"id": "b26b3cdf-947e-4753-9789-c0dc4915e8de",
"metadata": {},
"source": [
"tao model classification_tf1 export [-h] -m <MODEL>\n",
" -e EXPERIMENT_SPEC\n",
" -o OUTPUT_FILE\n",
" -k <KEY>\n",
" [--engine_file ENGINE_FILE]\n",
" [--gen_ds_config]\n",
" [--gpus GPUS]\n",
" [--gpu_index GPU_INDEX]\n",
" [--classmap_json CLASSMAP_JSON]"
]
},
{
"cell_type": "markdown",
"id": "8fb06391-9332-4e70-9793-fd134037c68f",
"metadata": {},
"source": [
"When using the `export` subtask, the `-m` argument indicates the path to the `.hdf5` model file to be exported, the `-e` argument indicates the path to the spec file, and `-k` argument indicates the key to _load_ the model. There are a few optional arguments, `--gen_ds_config`, `--engine_file`, and `--classmap_json` that are useful for us. The `--gen_ds_config` argument indicates whether to generate a template inference configuration file and requires the `--classmap_json` argument if used. The `--engine_file` indicates the path to the serialized TensorRT engine file. \n",
"<p><img src='images/important.png' width=720></p>\n",
"\n",
"Note that the TensorRT file is hardware specific and cannot be generalized across GPUs. Since a TensorRT engine file is hardware specific, you cannot use an engine file for deployment unless the deployment GPU is identical to the training GPU. This is true in our case since the Triton Inference Server will be deployed from the same hardware. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "42f79b6a-efa8-4c91-bb88-8fb108b43e5b",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# remove any previous exports if exists\n",
"!mkdir -p $LOCAL_EXPERIMENT_DIR/export\n",
"!rm -rf $LOCAL_EXPERIMENT_DIR/export/*"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "083dcfa0-ad8d-48f3-ad52-67ad2ef8bd3e",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 181M\n",
"-rw-rw-rw- 1 root root 181M Sep 5 2024 resnet_010.hdf5\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# show trained model\n",
"!ls -ltrh $LOCAL_EXPERIMENT_DIR/resnet50/weights"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "1342e27b-1968-4b15-ad0d-0ece2257cd22",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"my_model_key\n",
"2026-02-03 10:59:41,381 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']\n",
"2026-02-03 10:59:41,485 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5\n",
"2026-02-03 10:59:41,497 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True\n",
"Using TensorFlow backend.\n",
"2026-02-03 10:59:44.597447: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12\n",
"2026-02-03 10:59:44,649 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.\n",
"2026-02-03 10:59:45,820 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 10:59:45,864 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 10:59:45,868 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:150: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_hue(img, max_delta=10.0):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:173: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_saturation(img, max_shift):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:183: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_contrast(img, center, max_contrast_scale):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:192: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_shift(x_img, shift_stddev):\n",
"2026-02-03 10:59:48.044252: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2500005000 Hz\n",
"2026-02-03 10:59:48.044717: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x856b710 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n",
"2026-02-03 10:59:48.044756: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version\n",
"2026-02-03 10:59:48.046103: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1\n",
"2026-02-03 10:59:48.265321: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 10:59:48.267659: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x8386ac0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n",
"2026-02-03 10:59:48.267706: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5\n",
"2026-02-03 10:59:48.268080: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 10:59:48.270181: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Found device 0 with properties: \n",
"name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59\n",
"pciBusID: 0000:00:1e.0\n",
"2026-02-03 10:59:48.270256: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12\n",
"2026-02-03 10:59:48.270404: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcublas.so.12\n",
"2026-02-03 10:59:48.296388: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcufft.so.11\n",
"2026-02-03 10:59:48.296581: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcurand.so.10\n",
"2026-02-03 10:59:48.399104: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusolver.so.11\n",
"2026-02-03 10:59:48.411021: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusparse.so.12\n",
"2026-02-03 10:59:48.411159: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudnn.so.8\n",
"2026-02-03 10:59:48.411324: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 10:59:48.413692: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 10:59:48.415760: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1802] Adding visible gpu devices: 0\n",
"2026-02-03 10:59:48.415829: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12\n",
"2026-02-03 10:59:48.426202: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1214] Device interconnect StreamExecutor with strength 1 edge matrix:\n",
"2026-02-03 10:59:48.426235: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1220] 0 \n",
"2026-02-03 10:59:48.426250: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] 0: N \n",
"2026-02-03 10:59:48.426479: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 10:59:48.428660: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 10:59:48.430734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1359] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 13496 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)\n",
"Using TensorFlow backend.\n",
"WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.\n",
"WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 10:59:50,268 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.\n",
"WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 10:59:50,307 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.\n",
"WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 10:59:50,311 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:150: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_hue(img, max_delta=10.0):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:173: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_saturation(img, max_shift):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:183: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_contrast(img, center, max_contrast_scale):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:192: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_shift(x_img, shift_stddev):\n",
"2026-02-03 10:59:51,297 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.app 264: Saving exported model to /workspace/tao-experiments/classification/export/resnet50_fp32.onnx\n",
"2026-02-03 10:59:51,297 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.keras_exporter 119: Setting the onnx export route to keras2onnx\n",
"2026-02-03 10:59:51,297 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.makenet.export.classification_exporter 90: Setting the onnx export rote to keras2onnx\n",
"2026-02-03 10:59:51,599 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.keras_exporter 429: Using input nodes: ['input_1']\n",
"2026-02-03 10:59:51,599 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.keras_exporter 430: Using output nodes: ['predictions/Softmax']\n",
"Loaded model\n",
"The ONNX operator number change on the optimization: 288 -> 124\n",
"2026-02-03 11:00:14,410 [TAO Toolkit] [INFO] keras2onnx 347: The ONNX operator number change on the optimization: 288 -> 124\n",
"2026-02-03 11:00:14,412 [TAO Toolkit] [WARNING] onnxmltools 71: The maximum opset needed by this model is only 11.\n",
"[02/03/2026-11:00:14] [TRT] [I] [MemUsageChange] Init CUDA: CPU +2, GPU +0, now: CPU 434, GPU 833 (MiB)\n",
"[02/03/2026-11:00:17] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +341, GPU +74, now: CPU 830, GPU 907 (MiB)\n",
"[02/03/2026-11:00:17] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.\n",
"[02/03/2026-11:00:21] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +9, GPU +8, now: CPU 930, GPU 915 (MiB)\n",
"[02/03/2026-11:00:21] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 931, GPU 925 (MiB)\n",
"[02/03/2026-11:00:21] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.\n",
"[02/03/2026-11:00:28] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.\n",
"[02/03/2026-11:00:30] [TRT] [I] Total Activation Memory: 2931164672\n",
"[02/03/2026-11:00:30] [TRT] [I] Detected 1 inputs and 1 output network tensors.\n",
"[02/03/2026-11:00:30] [TRT] [I] Total Host Persistent Memory: 82224\n",
"[02/03/2026-11:00:30] [TRT] [I] Total Device Persistent Memory: 250880\n",
"[02/03/2026-11:00:30] [TRT] [I] Total Scratch Memory: 134217728\n",
"[02/03/2026-11:00:30] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 25 MiB, GPU 1220 MiB\n",
"[02/03/2026-11:00:30] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 69 steps to complete.\n",
"[02/03/2026-11:00:30] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 1.12644ms to assign 4 blocks to 69 nodes requiring 243400704 bytes.\n",
"[02/03/2026-11:00:30] [TRT] [I] Total Activation Memory: 243400704\n",
"[02/03/2026-11:00:30] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1187, GPU 1061 (MiB)\n",
"[02/03/2026-11:00:30] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +0, GPU +124, now: CPU 0, GPU 124 (MiB)\n",
"Telemetry data couldn't be sent, but the command ran successfully.\n",
"[WARNING]: <urlopen error Url for the certificates not found.>\n",
"Execution status: PASS\n",
"2026-02-03 11:00:53,336 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# export model and TensorRT engine\n",
"!echo $KEY\n",
"!tao model classification_tf1 export -m $TAO_EXPERIMENT_DIR/resnet50/weights/resnet_010.hdf5 \\\n",
" -e $TAO_SPECS_DIR/resnet50/combined_config.txt \\\n",
" -o $TAO_EXPERIMENT_DIR/export/resnet50_fp32.onnx \\\n",
" -k $KEY \\\n",
" --classmap_json $TAO_EXPERIMENT_DIR/resnet50/classmap.json \\\n",
" --gen_ds_config \\\n",
" --engine_file $TAO_EXPERIMENT_DIR/export/resnet50_fp32.engine"
]
},
{
"cell_type": "markdown",
"id": "0a4a4479-fab6-462c-9df7-07762613fbcb",
"metadata": {},
"source": [
"<p><img src='images/check.png' width=720></p>\n",
"\n",
"Did you get the below error message? This is likely due to a bad NGC CLI configuration. Please check the NGC CLI and Docker Registry section of the [introduction notebook](00_introduction.ipynb)."
]
},
{
"cell_type": "raw",
"id": "515f6744-89ac-4a19-b042-96b947c408b7",
"metadata": {},
"source": [
"AssertionError: Config path must be a valid unix path. No file found at: /root/.docker/config.json. Did you run docker login?"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "f6211aa1-0191-47b6-ab00-4b7d88422ee5",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 218288\n",
"drwxr-xr-x 2 root root 4096 Feb 3 11:00 .\n",
"drwxrwxrwx 1 root root 4096 Feb 3 10:59 ..\n",
"-rw-r--r-- 1 root root 17 Feb 3 11:00 labels.txt\n",
"-rw-r--r-- 1 root root 204 Feb 3 11:00 nvinfer_config.txt\n",
"-rw-r--r-- 1 root root 129543195 Feb 3 11:00 resnet50_fp32.engine\n",
"-rw-r--r-- 1 root root 93963999 Feb 3 11:00 resnet50_fp32.onnx\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# check that the TensorRT engine was successfully created. \n",
"!ls -al $LOCAL_EXPERIMENT_DIR/export"
]
},
{
"cell_type": "markdown",
"id": "c1945095-75e2-4b72-b47c-d30ba6003d67",
"metadata": {
"tags": []
},
"source": [
"<a name='s3-2'></a>\n",
"## Introduction to Triton Inference Server ##\n",
"NVIDIA [Triton Inference Server](https://developer.nvidia.com/nvidia-triton-inference-server) simplifies the deployment of AI models at scale in production. Triton is an open-source, inference-serving software that lets teams deploy trained AI models from any framework, from local storage, or from Google Cloud Platform or Azure on any GPU or CPU-based infrastructure, cloud, data center, or edge. The below figure shows the Triton Inference Server high-level architecture. The model repository is a _file-system based repository_ of the models that Triton will make available for inferencing. Inference requests arrive at the server via either [HTTP/REST](https://en.wikipedia.org/wiki/Representational_state_transfer), [gRPC](https://en.wikipedia.org/wiki/GRPC), or by the C API and are then routed to the appropriate per-model scheduler. Triton implements multiple scheduling and batching algorithms that can be configured on a model-by-model basis. Each model's scheduler optionally performs batching of inference requests and then passes the requests to the backend corresponding to the model type. The backend performs inferencing using the inputs provided in the batched requests to produce the requested outputs. The outputs are then returned.\n",
"<p><img src='images/triton_server_architecture.png' width='720'/></p>"
]
},
{
"cell_type": "markdown",
"id": "0d6b5277-b33b-4a1e-96bb-1b5570b60647",
"metadata": {},
"source": [
"<a name='s3-2.1'></a>\n",
"### Server ###\n",
"Setting up the Triton Inference Server requires software for the server and the client. One can get started with Triton Inference Server by pulling the [container](https://ngc.nvidia.com/catalog/containers/nvidia:tritonserver) from the NVIDIA NGC catalog. In this lab, we already have Triton Inference Server instance running. The code to run a Triton Server Instance is shown below. More details can be found in the [QuickStart Documentation](https://github.com/triton-inference-server/server/blob/r20.12/docs/quickstart.md) and [Build Documentation](https://github.com/triton-inference-server/server/blob/r20.12/docs/build.md). \n",
"\n",
"```\n",
"docker run \\\n",
" --gpus=1 \\\n",
" --ipc=host --rm \\\n",
" --shm-size=1g \\\n",
" --ulimit memlock=-1 \\\n",
" --ulimit stack=67108864 \\\n",
" -p 8000:8000 -p 8001:8001 -p 8002:8002 \\\n",
" -v /models:/models \\\n",
" nvcr.io/nvidia/tritonserver:23.03-py3 \\\n",
" tritonserver \\\n",
" --model-repository=/models \\\n",
" --exit-on-error=false \\\n",
" --model-control-mode=poll \\\n",
" --repository-poll-secs 30\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "1b5b5615-d578-4fb8-8f61-d4eca4afa891",
"metadata": {},
"source": [
"<a name='s3-2.2'></a>\n",
"### Client ###\n",
"We've also installed the Triton Inference Server Client libraries to provide APIs that make it easy to communicate with Triton from your C++ or Python application. Using these libraries, you can send either HTTP/REST or gRPC requests to Triton to access all its capabilities: inferencing, status and health, statistics and metrics, model repository management, etc. These libraries also support using system and CUDA shared memory for passing inputs to and receiving outputs from Triton. The easiest way to get the Python client library is to use `pip` to install the `tritonclient` module, as detailed below. For more details on how to download or build the Triton Inference Server Client libraries, you can find the documentation [here](https://github.com/triton-inference-server/server/blob/r20.12/docs/client_libraries.md), as well as examples that show the use of both the C++ and Python libraries.\n",
"\n",
"```\n",
"pip install nvidia-pyindex\n",
"pip install tritonclient[all]\n",
"```"
]
},
{
"cell_type": "markdown",
"id": "9b2a84b9-2dd7-4ddb-8333-3add28b986ed",
"metadata": {},
"source": [
"<a name='s3-2.3'></a>\n",
"### Model Repository ###\n",
"Triton Inference Server serves models within a model repository. When you first run Triton Inference Server, you'll specify the model repository where the models reside:\n",
"\n",
"```\n",
"tritonserver --model-repository=/models\n",
"```\n",
"\n",
"Each model resides in its own model subdirectory within the model repository - i.e., each directory within `/models` represents a unique model. For example, in this notebook we'll be deploying our `classification_model`. All models typically follow a similar directory structure. Within each of these directories, we'll create a configuration file `config.pbtxt` that details information about the model - e.g. _batch size_, _input shapes_, _deployment backend_ (PyTorch, ONNX, TensorFlow, TensorRT, etc.) and more. Additionally, we can create one or more versions of our model. Each version lives under a subdirectory name with the respective version number, starting with `1`. It is within this subdirectory where our model files reside. \n",
"\n",
"```\n",
"root@server:/models$ tree\n",
".\n",
"├── defect_classification_model\n",
"│   ├── 1\n",
"│   │   └── model.plan\n",
"│   └── config.pbtxt\n",
"│\n",
"\n",
"```\n",
"\n",
"We can also add a file representing the names of the outputs. We have omitted this step in this notebook for the sake of brevity. For more details on how to work with model repositories and model directory structures in Triton Inference Server, please see the [documentation](https://github.com/triton-inference-server/server/blob/r20.12/docs/model_repository.md). Below, we'll create the model directory structure for our defect classification model."
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "e6839209-7b45-4972-8eb7-4f2adda8fa76",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# create directory for model\n",
"!mkdir -p models/defect_classification_model_fp32/1\n",
"\n",
"# copy resnet50_fp32.engine from model export to the model repository\n",
"!cp $LOCAL_EXPERIMENT_DIR/export/resnet50_fp32.engine models/defect_classification_model_fp32/1/model.plan"
]
},
{
"cell_type": "markdown",
"id": "105f1991-163c-4e2b-b619-207d1d0dfbc4",
"metadata": {},
"source": [
"<a name='s3-e1'></a>\n",
"### Exercise #1 - Model Configuration ###\n",
"With our model directory set up, we now turn our attention to creating the configuration file for our model. A minimal model configuration must specify the name of the model, the `platform` and/or backend properties, the `max_batch_size` property, and the `input` and `output` tensors of the model (name, data type, and shape). We can get the `output` tensor name from the `nvinfer_config.txt` [file](tao_project/classification/export/nvinfer_config.txt) we generated before under `output-blob-names`. For more details on how to create model configuration files within Triton Inference Server, please see the [documentation](https://github.com/triton-inference-server/server/blob/r20.12/docs/model_configuration.md). \n",
"\n",
"**Instructions**:<br>\n",
"* Modify the `<FIXME>`s only and execute the cell to create the `config.pbtxt` file for the defect classification model. "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "9cff7695-fbd8-48cf-b1b8-4a35fcea4815",
"metadata": {},
"outputs": [],
"source": [
"configuration = \"\"\"\n",
"name: \"defect_classification_model_fp32\"\n",
"platform: \"tensorrt_plan\"\n",
"max_batch_size: 0\n",
"input: [\n",
" {\n",
" name: \"input_1\"\n",
" data_type: TYPE_FP32\n",
" format: FORMAT_NCHW # batch N, chan C, depth D, H, W\n",
" dims: [ 3, 224, 224 ]\n",
" }\n",
"]\n",
"output: {\n",
" name: \"predictions\"\n",
" data_type: TYPE_FP32\n",
" dims: [ 2 ]\n",
" }\n",
"\"\"\"\n",
"\n",
"with open('models/defect_classification_model_fp32/config.pbtxt', 'w') as file:\n",
" file.write(configuration)"
]
},
{
"cell_type": "markdown",
"id": "e39ba239-eb7d-4c72-b8c7-9b70074ba545",
"metadata": {},
"source": [
"<a name='s3-3'></a>\n",
"## Run Inference on Triton Inference Server ##\n",
"With our model directory structures created, models defined and exported, and configuration files created, we will now wait for Triton Inference Server to load our models. We have set up this lab to use Triton Inference Server in **polling** mode. This means that Triton Inference Server will continuously poll for modifications to our models or for newly created models - once every 30 seconds. Please run the cell below to allow time for Triton Inference Server to poll for new models/modifications before proceeding. Due to the asynchronous nature of this step, we have added 15 seconds to be safe."
]
},
{
"cell_type": "markdown",
"id": "9d3800be-ce51-4bcc-a353-fdcf03806f21",
"metadata": {},
"source": [
"<a name='s3-3.1'></a>\n",
"### Server Health Status ###"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "a4645417-1f93-4456-b6e3-0c7c2e7b3412",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"!sleep 45"
]
},
{
"cell_type": "markdown",
"id": "f49aeb80-4b61-43c9-a194-f4dc0ad00e02",
"metadata": {},
"source": [
"At this point, our models should be deployed and ready to use! To confirm Triton Inference Server is up and running, we can send a `curl` request to the below URL. The HTTP request returns status _200_ if Triton is ready and _non-200_ if it is not ready. We can also send a `curl` request to our model endpoints to confirm our models are deployed and ready to use. Additionally, we will also see information about our models such:\n",
"* The name of our model\n",
"* The versions available for our model\n",
"* The backend platform (e.g., tensort_rt, pytorch_libtorch, onnxruntime_onnx)\n",
"* The inputs and outputs, with their respective names, data types, and shapes"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "9e7eb618-7d6b-48d2-8937-2a8b37ab072f",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Trying 172.18.0.3:8000...\n",
"* Connected to triton (172.18.0.3) port 8000 (#0)\n",
"> GET /v2/health/ready HTTP/1.1\n",
"> Host: triton:8000\n",
"> User-Agent: curl/7.81.0\n",
"> Accept: */*\n",
"> \n",
"* Mark bundle as not supporting multiuse\n",
"< HTTP/1.1 200 OK\n",
"< Content-Length: 0\n",
"< Content-Type: text/plain\n",
"< \n",
"* Connection #0 to host triton left intact\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"!curl -v triton:8000/v2/health/ready"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "a626d1a8-ba1c-446c-a53d-891ed78d1949",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Trying 172.18.0.3:8000...\n",
"* Connected to triton (172.18.0.3) port 8000 (#0)\n",
"> GET /v2/models/defect_classification_model_fp32 HTTP/1.1\n",
"> Host: triton:8000\n",
"> User-Agent: curl/7.81.0\n",
"> Accept: */*\n",
"> \n",
"* Mark bundle as not supporting multiuse\n",
"< HTTP/1.1 200 OK\n",
"< Content-Type: application/json\n",
"< Content-Length: 226\n",
"< \n",
"* Connection #0 to host triton left intact\n",
"{\"name\":\"defect_classification_model_fp32\",\"versions\":[\"1\"],\"platform\":\"tensorrt_plan\",\"inputs\":[{\"name\":\"input_1\",\"datatype\":\"FP32\",\"shape\":[-1,3,224,224]}],\"outputs\":[{\"name\":\"predictions\",\"datatype\":\"FP32\",\"shape\":[-1,2]}]}"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"!curl -v triton:8000/v2/models/defect_classification_model_fp32"
]
},
{
"cell_type": "markdown",
"id": "2a8c133c-c2b6-42f8-a3bc-62ff749ec5e6",
"metadata": {},
"source": [
"<a name='s3-3.2'></a>\n",
"### Prepare Data ###"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "242da754-d994-4ac9-9c27-0361ac067b0e",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>true_defect</th>\n",
" <th>defect_img_path</th>\n",
" <th>date</th>\n",
" <th>board</th>\n",
" <th>comp_id</th>\n",
" <th>img_shape</th>\n",
" <th>defect_image_name</th>\n",
" <th>comp_type</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>notdefect</td>\n",
" <td>/dli/task/data/AOI_DL_data_0908/0423318026324/...</td>\n",
" <td>908</td>\n",
" <td>423318026324</td>\n",
" <td>C1090</td>\n",
" <td>[54, 27, 3]</td>\n",
" <td>D0_C1090.jpg</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>notdefect</td>\n",
" <td>/dli/task/data/AOI_DL_data_0908/0423318026269/...</td>\n",
" <td>908</td>\n",
" <td>423318026269</td>\n",
" <td>C1090</td>\n",
" <td>[54, 27, 3]</td>\n",
" <td>D1_C1090.jpg</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>notdefect</td>\n",
" <td>/dli/task/data/AOI_DL_data_0908/0423318026523/...</td>\n",
" <td>908</td>\n",
" <td>423318026523</td>\n",
" <td>C1090</td>\n",
" <td>[54, 27, 3]</td>\n",
" <td>D1_C1090.jpg</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>notdefect</td>\n",
" <td>/dli/task/data/AOI_DL_data_0908/0423318026331/...</td>\n",
" <td>908</td>\n",
" <td>423318026331</td>\n",
" <td>C1090</td>\n",
" <td>[54, 27, 3]</td>\n",
" <td>D1_C1090.jpg</td>\n",
" <td>C</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>notdefect</td>\n",
" <td>/dli/task/data/AOI_DL_data_0908/0423318026211/...</td>\n",
" <td>908</td>\n",
" <td>423318026211</td>\n",
" <td>C1090</td>\n",
" <td>[53, 27, 3]</td>\n",
" <td>D1_C1090.jpg</td>\n",
" <td>C</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" true_defect defect_img_path date \\\n",
"0 notdefect /dli/task/data/AOI_DL_data_0908/0423318026324/... 908 \n",
"1 notdefect /dli/task/data/AOI_DL_data_0908/0423318026269/... 908 \n",
"2 notdefect /dli/task/data/AOI_DL_data_0908/0423318026523/... 908 \n",
"3 notdefect /dli/task/data/AOI_DL_data_0908/0423318026331/... 908 \n",
"4 notdefect /dli/task/data/AOI_DL_data_0908/0423318026211/... 908 \n",
"\n",
" board comp_id img_shape defect_image_name comp_type \n",
"0 423318026324 C1090 [54, 27, 3] D0_C1090.jpg C \n",
"1 423318026269 C1090 [54, 27, 3] D1_C1090.jpg C \n",
"2 423318026523 C1090 [54, 27, 3] D1_C1090.jpg C \n",
"3 423318026331 C1090 [54, 27, 3] D1_C1090.jpg C \n",
"4 423318026211 C1090 [53, 27, 3] D1_C1090.jpg C "
]
},
"execution_count": 12,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"capacitor_df=pd.read_csv('capacitor_df.csv', converters={'img_shape': pd.eval})\n",
"capacitor_df.head()"
]
},
{
"cell_type": "markdown",
"id": "62621e9b-92b0-432b-8ef2-eef7dd299432",
"metadata": {},
"source": [
"<a name='s3-e2'></a>\n",
"### Exercise #2 - Pre-process Inputs ###\n",
"Triton itself does not do anything with your input tensors, it simply feeds them to the model, same for outputs. Ensuring that the preprocessing operations used for inference are defined identically as they were when the model was trained is key to achieving high accuracy. In our case, we need to perform normalization and mean subtraction to produce the final float planar data to the TensorRT engine for inferencing. We can get the `offsets` and `net-scale-factor` from the `nvinfer_config.txt` [file](tao_project/classification/export/nvinfer_config.txt). The pre-processing function is:\n",
"\n",
"<b>y = net scale factor * (x-mean)</b>\n",
"\n",
"where: \n",
"* **x** is the input pixel value. It is an int8 with range [0,255]. \n",
"* **mean** is the corresponding mean value, read either from the mean file or as offsets[c], where c is the channel to which the input pixel belongs, and offsets is the array specified in the configuration file. It is a float. \n",
"* **net-scale-factor** is the pixel scaling factor specified in the configuration file. It is a float.\n",
"* **y** is the corresponding output pixel value. It is a float.\n",
"\n",
"**Instructions**:<br>\n",
"* Execute the below cell to load one random **defect** sample. \n",
"* Modify the `<FIXME>`s only and execute the cell below to pre-process the input image. "
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "856f253c-1dff-4784-b202-247065b34b2a",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"sample_img_file=capacitor_df[capacitor_df['true_defect']=='defect'].sample(1)['defect_img_path'].values[0]"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "707aeacd-0c8b-440d-b95f-0598e8bad7ab",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"(3, 224, 224)"
]
},
"execution_count": 14,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"def preprocess_image(file_path): \n",
" image=Image.open(file_path).resize((224, 224))\n",
" image_ary=np.asarray(image).astype(np.float32)\n",
"\n",
" image_ary[:, :, 0]=image_ary[:, :, 0]-103.939\n",
" image_ary[:, :, 1]=image_ary[:, :, 1]-116.779\n",
" image_ary[:, :, 2]=image_ary[:, :, 2]-123.68\n",
"\n",
" image_ary=np.transpose(image_ary, [2, 0, 1])\n",
" return image_ary\n",
"\n",
"sample_image_ary=preprocess_image(sample_img_file)\n",
"sample_image_ary.shape"
]
},
{
"cell_type": "markdown",
"id": "93558b9e-43d7-4d2a-8f9f-3a87129f7ca5",
"metadata": {},
"source": [
"<a name='s3-3.3'></a>\n",
"### Send Inference Request to Server ###\n",
"With our models deployed, it is now time to send inference requests to our models. First, we'll load the `tritonclient.http` module. We will also define the input and output names of our model, the name of our model, the URL where our models are deployed with Triton Inference Server (in this case the host `triton:8000`), and our model version."
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "5c3a18b3-ecdc-45c9-95b2-a5d8d3681646",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"import tritonclient.http as tritonhttpclient\n",
"\n",
"# set parameters\n",
"VERBOSE=False\n",
"input_name='input_1'\n",
"input_shape=(1, 3, 224, 224)\n",
"input_dtype='FP32'\n",
"output_name='predictions'\n",
"model_name='defect_classification_model_fp32'\n",
"url='triton:8000'\n",
"model_version='1'"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "0fe2a261-38c6-4b1e-9f83-d5f533f084c3",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"{0: 'defect', 1: 'notdefect'}"
]
},
"execution_count": 16,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# set output labels\n",
"with open(os.path.join(os.environ['LOCAL_EXPERIMENT_DIR'], 'export', 'labels.txt'), 'r') as f: \n",
" labels=f.readlines()\n",
"labels={v: k.strip() for v, k in enumerate(labels)}\n",
"labels"
]
},
{
"cell_type": "markdown",
"id": "45d03260-a4e7-4668-8cd8-0d88a371562b",
"metadata": {},
"source": [
"We'll instantiate our client `triton_client` using the `tritonhttpclient.InferenceServerClient` class access the model metadata with the `get_model_metadata()` method as well as get our model configuration with the `get_model_config()` method."
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "ac8f84f5-0c7a-4169-aa78-81283c2bf59a",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"triton_client=tritonhttpclient.InferenceServerClient(url=url, verbose=VERBOSE)\n",
"model_metadata=triton_client.get_model_metadata(model_name=model_name, model_version=model_version)\n",
"model_config=triton_client.get_model_config(model_name=model_name, model_version=model_version)"
]
},
{
"cell_type": "markdown",
"id": "548ca85d-f46f-46bc-b0b0-27a965ebb4c8",
"metadata": {},
"source": [
"We'll instantiate a placeholder for our input data using the input name, shape, and data type expected. We'll set the data of the input to be the NumPy array representation of our image. We'll also instantiate a placeholder for our output data using just the output name. Lastly, we'll submit our input to the Triton Inference Server using the `triton_client.infer()` method, specifying our model name, model version, inputs, and outputs and convert our result to a NumPy array."
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "0ef4884f-3187-4725-972b-f26aa5e80758",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[0.08604407, 0.91395587]], dtype=float32)"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# make input shape [1, num_channels, height, width]\n",
"input_ary=np.expand_dims(sample_image_ary, axis=0)\n",
"\n",
"inference_input=tritonhttpclient.InferInput(input_name, input_shape, input_dtype)\n",
"inference_input.set_data_from_numpy(input_ary)\n",
"\n",
"output=tritonhttpclient.InferRequestedOutput(output_name)\n",
"response=triton_client.infer(model_name, \n",
" model_version=model_version, \n",
" inputs=[inference_input], \n",
" outputs=[output])\n",
"predictions=response.as_numpy(output_name)\n",
"predictions"
]
},
{
"cell_type": "markdown",
"id": "a58e5de1-427f-461f-859f-02a35ffb3a0c",
"metadata": {},
"source": [
"We can iterate through our manifest to see how quickly Triton is able to perform inference. "
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "34f2ee5d-1f6f-42cb-abe3-f1b89e6a6edc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"It took 8.32 seconds to infer 1903 images.\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"time_list=[]\n",
"\n",
"for idx, row in capacitor_df.iterrows(): \n",
" image_ary=preprocess_image(row['defect_img_path'])\n",
" # make input shape [1, num_channels, height, width]\n",
" input_ary=np.expand_dims(image_ary, axis=0)\n",
" inference_input.set_data_from_numpy(input_ary)\n",
" # time the process\n",
" start=time.time()\n",
" response=triton_client.infer(model_name, \n",
" model_version=model_version, \n",
" inputs=[inference_input], \n",
" outputs=[output])\n",
" time_list.append(time.time()-start)\n",
" predictions=response.as_numpy(output_name)\n",
" capacitor_df.loc[idx, 'prediction']=labels[np.argmax(predictions)].strip()\n",
"\n",
"print('It took {} seconds to infer {} images.'.format(round(sum(time_list), 2), len(capacitor_df)))"
]
},
{
"cell_type": "markdown",
"id": "3425b3d9-eefe-4350-8fb5-e2471bbc7d25",
"metadata": {},
"source": [
"<a name='s3-3.4'></a>\n",
"### Measure Performance ###"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "3abbfaac-802b-48be-909f-af89d74b05f4",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>prediction</th>\n",
" <th>defect</th>\n",
" <th>notdefect</th>\n",
" </tr>\n",
" <tr>\n",
" <th>true_defect</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>defect</th>\n",
" <td>6</td>\n",
" <td>93</td>\n",
" </tr>\n",
" <tr>\n",
" <th>notdefect</th>\n",
" <td>0</td>\n",
" <td>1804</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"prediction defect notdefect\n",
"true_defect \n",
"defect 6 93\n",
"notdefect 0 1804"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"confusion_df=pd.crosstab(capacitor_df['true_defect'], capacitor_df['prediction'])\n",
"confusion_df.head()"
]
},
{
"cell_type": "markdown",
"id": "69e23024-4378-468f-8d20-a7f6599d698c",
"metadata": {},
"source": [
"<a name='s3-4'></a>\n",
"## Run Batch Inference ##"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "5ea20517-c171-496e-bde4-da2f4aae58b8",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# create directory for model\n",
"!mkdir -p models/defect_classification_batch_model/1\n",
"\n",
"# copy resnet-50 engine to the model repository\n",
"!cp $LOCAL_EXPERIMENT_DIR/export/resnet50_fp32.engine models/defect_classification_batch_model/1/model.plan"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "88fe8ec0-7c9c-4bb8-a27c-e25e698480ed",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"configuration = \"\"\"\n",
"name: \"defect_classification_batch_model\"\n",
"platform: \"tensorrt_plan\"\n",
"max_batch_size: 16\n",
"input: [\n",
" {\n",
" name: \"input_1\"\n",
" data_type: TYPE_FP32\n",
" format: FORMAT_NCHW\n",
" dims: [ 3, 224, 224 ]\n",
" }\n",
"]\n",
"output: {\n",
" name: \"predictions\"\n",
" data_type: TYPE_FP32\n",
" dims: [ 2 ]\n",
" }\n",
"\"\"\"\n",
"\n",
"with open('models/defect_classification_batch_model/config.pbtxt', 'w') as file:\n",
" file.write(configuration)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "118eab64-4502-4dc5-9b5b-150dc8c8588e",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"!sleep 45"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "097f08a2-aa94-4b1e-9166-63d30749c0cc",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Trying 172.18.0.3:8000...\n",
"* Connected to triton (172.18.0.3) port 8000 (#0)\n",
"> GET /v2/models/defect_classification_batch_model HTTP/1.1\n",
"> Host: triton:8000\n",
"> User-Agent: curl/7.81.0\n",
"> Accept: */*\n",
"> \n",
"* Mark bundle as not supporting multiuse\n",
"< HTTP/1.1 200 OK\n",
"< Content-Type: application/json\n",
"< Content-Length: 227\n",
"< \n",
"* Connection #0 to host triton left intact\n",
"{\"name\":\"defect_classification_batch_model\",\"versions\":[\"1\"],\"platform\":\"tensorrt_plan\",\"inputs\":[{\"name\":\"input_1\",\"datatype\":\"FP32\",\"shape\":[-1,3,224,224]}],\"outputs\":[{\"name\":\"predictions\",\"datatype\":\"FP32\",\"shape\":[-1,2]}]}"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"!curl -v triton:8000/v2/models/defect_classification_batch_model"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "75b76201-ea98-455a-aabb-79ff6867a8cf",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# set parameters\n",
"VERBOSE=False\n",
"input_name='input_1'\n",
"input_shape=(16, 3, 224, 224)\n",
"input_dtype='FP32'\n",
"output_name='predictions'\n",
"model_name='defect_classification_batch_model'\n",
"url='triton:8000'\n",
"model_version='1'"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "adac4094-5371-4e81-adfe-557d19354204",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"triton_client=tritonhttpclient.InferenceServerClient(url=url, verbose=VERBOSE)\n",
"model_metadata=triton_client.get_model_metadata(model_name=model_name, model_version=model_version)\n",
"model_config=triton_client.get_model_config(model_name=model_name, model_version=model_version)"
]
},
{
"cell_type": "markdown",
"id": "3a265721-1c77-4f6a-a9d5-bfd7b7de78b5",
"metadata": {},
"source": [
"We can iterate through our manifest to see how quickly Triton is able to perform inference. "
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "2be3f56e-a123-4597-8325-0727abc5cb0a",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"It took 7.04 seconds to infer 1903 images.\n",
"On average it took 0.0037 seconds per inference.\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"inference_input=tritonhttpclient.InferInput(input_name, input_shape, input_dtype)\n",
"output=tritonhttpclient.InferRequestedOutput(output_name)\n",
"\n",
"# time the process\n",
"start=time.time()\n",
"\n",
"batch_ary=np.empty((16, 3, 224, 224)).astype(np.float32)\n",
"images_list=[]\n",
"\n",
"time_list=[]\n",
"\n",
"for idx, row in capacitor_df.iterrows(): \n",
" image_ary=preprocess_image(row['defect_img_path'])\n",
" batch_ary[len(images_list)]=image_ary\n",
" images_list.append(idx)\n",
" if len(images_list)%16==0: \n",
" inference_input.set_data_from_numpy(batch_ary)\n",
" # time the process\n",
" start=time.time()\n",
" response=triton_client.infer(model_name, \n",
" model_version=model_version, \n",
" inputs=[inference_input], \n",
" outputs=[output])\n",
" time_list.append(time.time()-start)\n",
" predictionss=response.as_numpy(output_name)\n",
" \n",
" capacitor_df.loc[images_list, 'prediction']=[*map(labels.get, np.argmax(predictionss, axis=1).flatten())]\n",
" batch_ary=np.empty((16, 3, 224, 224)).astype(np.float32)\n",
" images_list=[]\n",
"\n",
"print('It took {} seconds to infer {} images.'.format(round(sum(time_list), 2), len(capacitor_df)))\n",
"print('On average it took {} seconds per inference.'.format(round(np.array(time_list).mean()/16, 4)))"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "9d14a174-b566-4742-8489-a6932dbb5126",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>prediction</th>\n",
" <th>defect</th>\n",
" <th>notdefect</th>\n",
" </tr>\n",
" <tr>\n",
" <th>true_defect</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>defect</th>\n",
" <td>6</td>\n",
" <td>93</td>\n",
" </tr>\n",
" <tr>\n",
" <th>notdefect</th>\n",
" <td>0</td>\n",
" <td>1804</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"prediction defect notdefect\n",
"true_defect \n",
"defect 6 93\n",
"notdefect 0 1804"
]
},
"execution_count": 28,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"confusion_df=pd.crosstab(capacitor_df['true_defect'], capacitor_df['prediction'])\n",
"confusion_df.head()"
]
},
{
"cell_type": "markdown",
"id": "6fd806e8-7d7b-41dc-b097-22f951d4760b",
"metadata": {},
"source": [
"<a name='s3-5'></a>\n",
"## Run FP16 Inference ##"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "c232c384-73aa-43d4-b86d-27a629d65fab",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"total 181M\n",
"-rw-rw-rw- 1 root root 181M Sep 5 2024 resnet_010.hdf5\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# show trained model\n",
"!ls -ltrh $LOCAL_EXPERIMENT_DIR/resnet50/weights"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "dfd83995-8908-484c-bc4e-a994cdcab22b",
"metadata": {
"scrolled": true,
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"2026-02-03 11:57:23,513 [TAO Toolkit] [INFO] root 160: Registry: ['nvcr.io']\n",
"2026-02-03 11:57:23,605 [TAO Toolkit] [INFO] nvidia_tao_cli.components.instance_handler.local_instance 360: Running command in container: nvcr.io/nvidia/tao/tao-toolkit:5.0.0-tf1.15.5\n",
"2026-02-03 11:57:23,625 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 301: Printing tty value True\n",
"Using TensorFlow backend.\n",
"2026-02-03 11:57:24.590965: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12\n",
"2026-02-03 11:57:24,641 [TAO Toolkit] [WARNING] tensorflow 40: Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.\n",
"2026-02-03 11:57:26,010 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 11:57:26,095 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 11:57:26,105 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:150: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_hue(img, max_delta=10.0):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:173: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_saturation(img, max_shift):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:183: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_contrast(img, center, max_contrast_scale):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:192: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_shift(x_img, shift_stddev):\n",
"2026-02-03 11:57:28.392244: I tensorflow/core/platform/profile_utils/cpu_utils.cc:109] CPU Frequency: 2500005000 Hz\n",
"2026-02-03 11:57:28.392637: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7bfbba0 initialized for platform Host (this does not guarantee that XLA will be used). Devices:\n",
"2026-02-03 11:57:28.392675: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version\n",
"2026-02-03 11:57:28.394017: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcuda.so.1\n",
"2026-02-03 11:57:28.585372: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 11:57:28.587320: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7a16f50 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:\n",
"2026-02-03 11:57:28.587351: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla T4, Compute Capability 7.5\n",
"2026-02-03 11:57:28.587636: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 11:57:28.589420: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1674] Found device 0 with properties: \n",
"name: Tesla T4 major: 7 minor: 5 memoryClockRate(GHz): 1.59\n",
"pciBusID: 0000:00:1e.0\n",
"2026-02-03 11:57:28.589481: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12\n",
"2026-02-03 11:57:28.589597: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcublas.so.12\n",
"2026-02-03 11:57:28.591746: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcufft.so.11\n",
"2026-02-03 11:57:28.591863: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcurand.so.10\n",
"2026-02-03 11:57:28.595535: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusolver.so.11\n",
"2026-02-03 11:57:28.596726: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcusparse.so.12\n",
"2026-02-03 11:57:28.596809: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudnn.so.8\n",
"2026-02-03 11:57:28.596941: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 11:57:28.598829: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 11:57:28.600668: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1802] Adding visible gpu devices: 0\n",
"2026-02-03 11:57:28.600723: I tensorflow/stream_executor/platform/default/dso_loader.cc:50] Successfully opened dynamic library libcudart.so.12\n",
"2026-02-03 11:57:28.609384: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1214] Device interconnect StreamExecutor with strength 1 edge matrix:\n",
"2026-02-03 11:57:28.609413: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1220] 0 \n",
"2026-02-03 11:57:28.609427: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1233] 0: N \n",
"2026-02-03 11:57:28.609619: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 11:57:28.611475: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1082] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero\n",
"2026-02-03 11:57:28.613257: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1359] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 12736 MB memory) -> physical GPU (device: 0, name: Tesla T4, pci bus id: 0000:00:1e.0, compute capability: 7.5)\n",
"Using TensorFlow backend.\n",
"WARNING:tensorflow:Deprecation warnings have been disabled. Set TF_ENABLE_DEPRECATION_WARNINGS=1 to re-enable them.\n",
"WARNING:tensorflow:TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 11:57:30,219 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use sklearn by default. This improves performance in some cases. To enable sklearn export the environment variable TF_ALLOW_IOLIBS=1.\n",
"WARNING:tensorflow:TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 11:57:30,256 [TAO Toolkit] [WARNING] tensorflow 42: TensorFlow will not use Dask by default. This improves performance in some cases. To enable Dask export the environment variable TF_ALLOW_IOLIBS=1.\n",
"WARNING:tensorflow:TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.\n",
"2026-02-03 11:57:30,260 [TAO Toolkit] [WARNING] tensorflow 43: TensorFlow will not use Pandas by default. This improves performance in some cases. To enable Pandas export the environment variable TF_ALLOW_IOLIBS=1.\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:150: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_hue(img, max_delta=10.0):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:173: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_saturation(img, max_shift):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:183: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_contrast(img, center, max_contrast_scale):\n",
"/usr/local/lib/python3.8/dist-packages/nvidia_tao_tf1/cv/makenet/utils/helper.py:192: NumbaDeprecationWarning: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. The implicit default value for this argument is currently False, but it will be changed to True in Numba 0.59.0. See https://numba.readthedocs.io/en/stable/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit for details.\n",
" def random_shift(x_img, shift_stddev):\n",
"2026-02-03 11:57:31,197 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.app 264: Saving exported model to /workspace/tao-experiments/classification/export/resnet50_fp16.onnx\n",
"2026-02-03 11:57:31,197 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.keras_exporter 119: Setting the onnx export route to keras2onnx\n",
"2026-02-03 11:57:31,197 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.makenet.export.classification_exporter 90: Setting the onnx export rote to keras2onnx\n",
"2026-02-03 11:57:31,442 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.keras_exporter 429: Using input nodes: ['input_1']\n",
"2026-02-03 11:57:31,442 [TAO Toolkit] [INFO] nvidia_tao_tf1.cv.common.export.keras_exporter 430: Using output nodes: ['predictions/Softmax']\n",
"Loaded model\n",
"The ONNX operator number change on the optimization: 288 -> 124\n",
"2026-02-03 11:57:46,197 [TAO Toolkit] [INFO] keras2onnx 347: The ONNX operator number change on the optimization: 288 -> 124\n",
"2026-02-03 11:57:46,199 [TAO Toolkit] [WARNING] onnxmltools 71: The maximum opset needed by this model is only 11.\n",
"[02/03/2026-11:57:46] [TRT] [I] [MemUsageChange] Init CUDA: CPU +3, GPU +0, now: CPU 430, GPU 1463 (MiB)\n",
"[02/03/2026-11:57:49] [TRT] [I] [MemUsageChange] Init builder kernel library: CPU +342, GPU +74, now: CPU 826, GPU 1537 (MiB)\n",
"[02/03/2026-11:57:49] [TRT] [W] onnx2trt_utils.cpp:377: Your ONNX model has been generated with INT64 weights, while TensorRT does not natively support INT64. Attempting to cast down to INT32.\n",
"[02/03/2026-11:57:53] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +9, GPU +8, now: CPU 926, GPU 1545 (MiB)\n",
"[02/03/2026-11:57:53] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +10, now: CPU 927, GPU 1555 (MiB)\n",
"[02/03/2026-11:57:53] [TRT] [I] Local timing cache in use. Profiling results in this builder pass will not be stored.\n",
"[02/03/2026-11:58:16] [TRT] [I] Some tactics do not have sufficient workspace memory to run. Increasing workspace size will enable more tactics, please check verbose output for requested sizes.\n",
"[02/03/2026-11:58:22] [TRT] [I] Total Activation Memory: 2545879040\n",
"[02/03/2026-11:58:22] [TRT] [I] Detected 1 inputs and 1 output network tensors.\n",
"[02/03/2026-11:58:23] [TRT] [I] Total Host Persistent Memory: 170080\n",
"[02/03/2026-11:58:23] [TRT] [I] Total Device Persistent Memory: 11264\n",
"[02/03/2026-11:58:23] [TRT] [I] Total Scratch Memory: 134217728\n",
"[02/03/2026-11:58:23] [TRT] [I] [MemUsageStats] Peak memory usage of TRT CPU/GPU memory allocators: CPU 47 MiB, GPU 1220 MiB\n",
"[02/03/2026-11:58:23] [TRT] [I] [BlockAssignment] Started assigning block shifts. This will take 61 steps to complete.\n",
"[02/03/2026-11:58:23] [TRT] [I] [BlockAssignment] Algorithm ShiftNTopDown took 0.94316ms to assign 4 blocks to 61 nodes requiring 187203584 bytes.\n",
"[02/03/2026-11:58:23] [TRT] [I] Total Activation Memory: 187203584\n",
"[02/03/2026-11:58:23] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 1440, GPU 1621 (MiB)\n",
"[02/03/2026-11:58:23] [TRT] [W] TensorRT encountered issues when converting weights between types and that could affect accuracy.\n",
"[02/03/2026-11:58:23] [TRT] [W] If this is not the desired behavior, please modify the weights or retrain with regularization to adjust the magnitude of the weights.\n",
"[02/03/2026-11:58:23] [TRT] [W] Check verbose logs for the list of affected weights.\n",
"[02/03/2026-11:58:23] [TRT] [W] - 58 weights are affected by this issue: Detected subnormal FP16 values.\n",
"[02/03/2026-11:58:23] [TRT] [W] - 23 weights are affected by this issue: Detected values less than smallest positive FP16 subnormal value and converted them to the FP16 minimum subnormalized value.\n",
"[02/03/2026-11:58:23] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in building engine: CPU +45, GPU +45, now: CPU 45, GPU 45 (MiB)\n",
"Telemetry data couldn't be sent, but the command ran successfully.\n",
"[WARNING]: <urlopen error Url for the certificates not found.>\n",
"Execution status: PASS\n",
"2026-02-03 11:58:45,169 [TAO Toolkit] [INFO] nvidia_tao_cli.components.docker_handler.docker_handler 363: Stopping container.\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# export model and TensorRT engine\n",
"!tao model classification_tf1 export -m $TAO_EXPERIMENT_DIR/resnet50/weights/resnet_010.hdf5 \\\n",
" -e $TAO_SPECS_DIR/resnet50/combined_config.txt \\\n",
" -o $TAO_EXPERIMENT_DIR/export/resnet50_fp16.onnx \\\n",
" -k $KEY \\\n",
" --data_type fp16 \\\n",
" --engine_file $TAO_EXPERIMENT_DIR/export/resnet50_fp16.engine \\\n",
" --classmap_json $TAO_EXPERIMENT_DIR/resnet50/classmap.json \\\n",
" --gen_ds_config"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "2b69ed0a-7b7e-426b-ae58-5f8b884adfd2",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# create directory for model\n",
"!mkdir -p models/defect_classification_model_fp16/1\n",
"\n",
"# copy resnet-50 engine to the model repository\n",
"!cp $LOCAL_EXPERIMENT_DIR/export/resnet50_fp16.engine models/defect_classification_model_fp16/1/model.plan"
]
},
{
"cell_type": "markdown",
"id": "f37e346b-e40e-40f8-910a-9029812a25d5",
"metadata": {},
"source": [
"<p><img src='images/important.png' width=720></p>\n",
"We'll also create a configuration file for the TensorRT Fp16 model. Note that our input and output data types still remain in their FP32 representation - the internal layers and activations of our neural network will use the FP16 data type but our input and output data will still be in FP32."
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "d74b0bfd-836e-448b-949f-de8d757f1444",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"configuration = \"\"\"\n",
"name: \"defect_classification_model_fp16\"\n",
"platform: \"tensorrt_plan\"\n",
"max_batch_size: 0\n",
"input: [\n",
" {\n",
" name: \"input_1\"\n",
" data_type: TYPE_FP32\n",
" format: FORMAT_NCHW\n",
" dims: [ 3, 224, 224 ]\n",
" }\n",
"]\n",
"output: {\n",
" name: \"predictions\"\n",
" data_type: TYPE_FP32\n",
" dims: [ 2 ]\n",
" }\n",
"\"\"\"\n",
"\n",
"with open('models/defect_classification_model_fp16/config.pbtxt', 'w') as file:\n",
" file.write(configuration)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "a248669c-16cb-4c74-84dc-f08cdc253aed",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"!sleep 45"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "df73f0cd-a2c2-46e6-9593-cfffd9206d28",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"* Trying 172.18.0.3:8000...\n",
"* Connected to triton (172.18.0.3) port 8000 (#0)\n",
"> GET /v2/models/defect_classification_model_fp16 HTTP/1.1\n",
"> Host: triton:8000\n",
"> User-Agent: curl/7.81.0\n",
"> Accept: */*\n",
"> \n",
"* Mark bundle as not supporting multiuse\n",
"< HTTP/1.1 200 OK\n",
"< Content-Type: application/json\n",
"< Content-Length: 226\n",
"< \n",
"* Connection #0 to host triton left intact\n",
"{\"name\":\"defect_classification_model_fp16\",\"versions\":[\"1\"],\"platform\":\"tensorrt_plan\",\"inputs\":[{\"name\":\"input_1\",\"datatype\":\"FP32\",\"shape\":[-1,3,224,224]}],\"outputs\":[{\"name\":\"predictions\",\"datatype\":\"FP32\",\"shape\":[-1,2]}]}"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"!curl -v triton:8000/v2/models/defect_classification_model_fp16"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "2a8ffe15-457e-4dc0-a9b9-1dc377456bbc",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"# set parameters\n",
"VERBOSE=False\n",
"input_name='input_1'\n",
"input_shape=(1, 3, 224, 224)\n",
"input_dtype='FP32'\n",
"output_name='predictions'\n",
"model_name='defect_classification_model_fp16'\n",
"url='triton:8000'\n",
"model_version='1'"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "06a304cd-fd88-4c3a-a9b4-7adaeb0e7b71",
"metadata": {},
"outputs": [],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"triton_client=tritonhttpclient.InferenceServerClient(url=url, verbose=VERBOSE)\n",
"model_metadata=triton_client.get_model_metadata(model_name=model_name, model_version=model_version)\n",
"model_config=triton_client.get_model_config(model_name=model_name, model_version=model_version)"
]
},
{
"cell_type": "markdown",
"id": "c902e756-9c83-4809-afb0-11c8d5562874",
"metadata": {},
"source": [
"We can iterate through our manifest to see how quickly Triton is able to perform inference. "
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "33c0fc15-1cb6-44e8-a276-88bde71bd14b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"It took 6.47 seconds to infer 1903 images.\n"
]
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"inference_input=tritonhttpclient.InferInput(input_name, input_shape, input_dtype)\n",
"output=tritonhttpclient.InferRequestedOutput(output_name)\n",
"\n",
"time_list=[]\n",
"\n",
"for idx, row in capacitor_df.iterrows(): \n",
" image_ary=preprocess_image(row['defect_img_path'])\n",
" # make input shape [1, num_channels, height, width]\n",
" input_ary=np.expand_dims(image_ary, axis=0)\n",
" inference_input.set_data_from_numpy(input_ary)\n",
" # time the process\n",
" start=time.time()\n",
" response=triton_client.infer(model_name, \n",
" model_version=model_version, \n",
" inputs=[inference_input], \n",
" outputs=[output])\n",
" time_list.append(time.time()-start)\n",
" predictions=response.as_numpy(output_name)\n",
" capacitor_df.loc[idx, 'prediction']=labels[np.argmax(predictions)].strip()\n",
"\n",
"print('It took {} seconds to infer {} images.'.format(round(sum(time_list), 2), len(capacitor_df)))"
]
},
{
"cell_type": "code",
"execution_count": 38,
"id": "3bef83d6-e139-4566-8590-0f2110fdba3b",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th>prediction</th>\n",
" <th>defect</th>\n",
" <th>notdefect</th>\n",
" </tr>\n",
" <tr>\n",
" <th>true_defect</th>\n",
" <th></th>\n",
" <th></th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>defect</th>\n",
" <td>6</td>\n",
" <td>93</td>\n",
" </tr>\n",
" <tr>\n",
" <th>notdefect</th>\n",
" <td>0</td>\n",
" <td>1804</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
"prediction defect notdefect\n",
"true_defect \n",
"defect 6 93\n",
"notdefect 0 1804"
]
},
"execution_count": 38,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# DO NOT CHANGE THIS CELL\n",
"confusion_df=pd.crosstab(capacitor_df['true_defect'], capacitor_df['prediction'])\n",
"confusion_df.head()"
]
},
{
"cell_type": "markdown",
"id": "86195126-f9a5-4a49-afc0-84f4101c66d0",
"metadata": {
"tags": []
},
"source": [
"<a name='s3-6'></a>\n",
"## Conclusion ##\n",
"Automating the inspection process with highly accurate, fast, and easy-to-use systems help save time, reduce costs, and improve yields. For manufacturing use cases, AI can deliver advantages for both vendors and users of automated optical inspection equipment: \n",
"* Algorithm development is simplified using deep-learning based computer vision - unlike traditional rules-based algorithms that require defining every product and acceptance criteria, using deep-learning can reduce time-to-market for new equipment and ongoing software-support costs. \n",
"* Better performance - AI enhanced automated optical inspection can deliver greater accuracy, reliability, and lower false positive rate than traditional systems. \n",
"* Greater flexibility - deep-learning algorithms can be quickly trained to perform new tasks. "
]
},
{
"cell_type": "markdown",
"id": "ac3cdd1f-5a5c-4d23-a095-c8c46e08efdd",
"metadata": {},
"source": [
"**Well Done!** When you're finished, please complete the assessment before moving onto the code assessment. "
]
},
{
"cell_type": "markdown",
"id": "210dc84b-ccd8-4295-87cf-897090d19c42",
"metadata": {},
"source": [
"<a href=\"https://www.nvidia.com/dli\"> <img src=\"images/DLI_Header.png\" alt=\"Header\" style=\"width: 400px;\"/> </a>"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 5
}