TFLite ModelRunner User Guide#
Overview#
ModelRunner is a benchmarking tool for running TensorFlow Lite models on NXP microcontrollers. It supports both HTTP and UART communication modes and provides detailed latency profiling for each model layer.
Supported Toolchains#
MCUXpresso IDE
IAR Embedded Workbench for ARM
Keil uVision MDK
ArmGCC (GNU Tools ARM Embedded)
Supported Boards#
FRDM-MCXN947
MCX-N5XX-EVK
MCX-N9XX-EVK
MIMXRT700-EVK
MIMXRT595-EVK
MIMXRT685-EVK
MIMXRT1060-EVK
MIMXRT1170-EVK
Running the Demo#
HTTP Mode (Default for mcxn9xxevk)#
Get Device IP
Connect device console to check the IP address:
************************************************* TFLite Modelrunner ************************************************* Initializing PHY... DHCP state : SELECTING DHCP state : REQUESTING DHCP state : BOUND IPv4 Address : 10.193.20.56 IPv4 Subnet mask : 255.255.255.0 IPv4 Gateway : 10.193.20.254 Initialized TFLiteMicro modelrunner server at port 10818
Upload Model
curl -X PUT http://10.193.20.56:10818/v1 -F 'block_content=@<path_to_model>.tflite'
users can get agent response as below:
{ "reply": "success" }
Run Latency Benchmark
curl -s -X POST http://<IP>:10818/v1?run=1
users can get agent response of inference time:
{ "timing": 511 }
Get Model Info
curl http://<IP>:10818/v1/model
users can get detail model info:
{"timing":54065000,"ktensor_arena_size":22820,"inputs":[{"name":"input_1","scale":0.58470290899276733,"zero_points":83,"datatype":"INT8","shape":[1,49,10,1]}],"outputs":[{"name":"Identity","scale":0.00390625,"zero_points":-128,"datatype":"INT8","shape":[1,12]}],"layer_count":13,"layers":[{"name":"functional_1/activation/Relu;functional_1/batch_normalization/FusedBatchNormV3;functional_1/conv2d/BiasAdd/ReadVariableOp/resource;functional_1/conv2d/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d/Conv2D1","type":"CONV_2D","avg_timing":8249000,"tensor":{"timing":8249000}},{"name":"functional_1/activation_1/Relu;functional_1/batch_normalization_1/FusedBatchNormV3;functional_1/depthwise_conv2d/depthwise;functional_1/depthwise_conv2d/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3969000,"tensor":{"timing":3969000}},{"name":"functional_1/activation_2/Relu;functional_1/batch_normalization_2/FusedBatchNormV3;functional_1/conv2d_1/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_1/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_1/Conv2D1","type":"CONV_2D","avg_timing":7373000,"tensor":{"timing":7373000}},{"name":"functional_1/activation_3/Relu;functional_1/batch_normalization_3/FusedBatchNormV3;functional_1/depthwise_conv2d_1/depthwise;functional_1/depthwise_conv2d_1/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_1/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_4/Relu;functional_1/batch_normalization_4/FusedBatchNormV3;functional_1/conv2d_2/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_2/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_2/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/activation_5/Relu;functional_1/batch_normalization_5/FusedBatchNormV3;functional_1/depthwise_conv2d_2/depthwise;functional_1/depthwise_conv2d_2/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_2/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_6/Relu;functional_1/batch_normalization_6/FusedBatchNormV3;functional_1/conv2d_3/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_3/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_3/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/activation_7/Relu;functional_1/batch_normalization_7/FusedBatchNormV3;functional_1/depthwise_conv2d_3/depthwise;functional_1/depthwise_conv2d_3/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_3/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_8/Relu;functional_1/batch_normalization_8/FusedBatchNormV3;functional_1/conv2d_4/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_4/BiasAdd;functional_1/conv2d_4/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/average_pooling2d/AvgPool","type":"AVERAGE_POOL_2D","avg_timing":401000,"tensor":{"timing":401000}},{"name":"functional_1/flatten/Reshape","type":"RESHAPE","avg_timing":6000,"tensor":{"timing":6000}},{"name":"functional_1/dense/BiasAdd","type":"FULLY_CONNECTED","avg_timing":25000,"tensor":{"timing":25000}},{"name":"Identity","type":"SOFTMAX","avg_timing":25000,"tensor":{"timing":25000}}]}
Upload Input Tensor & Get Output
curl -s -X POST "http://<IP>:10818/v1?run=1&output=Identity" -F 'block_content=@<path_to_input_tensor>.bin' |jq .outputs[0].data | xargs -i echo {$i} |base64 -d |hexdump -C
users can get output data by hexdump for further post-process:
00000000 ff 80 80 80 80 ff 80 80 80 80 80 81 |............| 0000000c
UART Mode (Default for mimxrt700evk)#
Flash the board with the firmware
check moderunner run successfully on board, and must exit uart console to make sure modelrunner agent can communicate with uart.
*************************************************
TFLite Modelrunner
*************************************************
XSPI Psram Enabled!
=>
Run HTTP-to-UART agent on x86
python3 main.pyServer runs at:
http://$ip:$port, users can modify the port id according to IT firewall policy, we set default port to 10919.* Serving Flask app 'main' * Debug mode: on WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:10919 * Running on http://10.192.208.139:10919
Connect UART cable
Linux: Check
/dev/serial/by-id/Windows: Use
COM<x>port
MCU devices without ethernet port, the connection IP will be like “http://$server_ip:$port/serial/001063836560/v1”
Benchmark Users can provide dataset according to model’s input and prepare benchmark test code for further interpretation.
4.1. Upload Model
curl -X PUT http://<IP>:<port>/serial/<serial-id>/v1 -d "block_count=1"
curl -X PUT http://<IP>:<port>/serial/<serial-id>/v1 -F 'block_content=@"${model_path}";filename="${model_name},name=block_content"'
users can get agent response as below:
{
"reply": "success"
}
4.2. Run Latency Benchmark
curl -X POST http://<IP>:<port>/serial/<serial-id>/v1?run=1
users can get agent response of inference time:
{
"timing": 511
}
4.3. Get Model Info
curl http://<IP>:<port>/serial/<serial-id>/v1/model
users can get detail model info layer by layer format with json:
{
"inputs": [
{
"data_type": "INT8",
"name": "input_1",
"scale": 0.584703,
"shape": [
1,
49,
10,
1
],
"zero_points": 83
}
],
"ktensor_arena_size": 17172,
"layer_count": 3,
"layers": [
{
"avg_timing": 485000.0,
"name": "functional_1/dense/BiasAdd",
"timing": 485000.0,
"type": "NeutronGraph"
},
{
"avg_timing": 17000.0,
"name": "Identity",
"timing": 17000.0,
"type": "SOFTMAX"
}
],
"outputs": [
{
"data_type": "INT8",
"name": "Identity",
"scale": 0.003906,
"shape": [
1,
12
],
"zero_points": -128
}
],
"timing": 502
}
4.4. Upload Input Tensor & Get Output Data
curl -X POST "http://<IP>:<port>/serial/<serial-id>/v1?run=1&output=${output_tensor_name}" \
-F 'file=@<path_to_input_tensor>.bin'
users can get output data and do post-process accordingly:
{
"outputs": [
{
"data": "/4CAgID/gICAgICB",
"datatype": "INT8",
"name": "Identity",
"shape": [
1,
12
]
}
],
"timing": 498
}
ModelRunner CLI#
connect to uart console of device, and run with cli as below:
python cli.py com20
=> reset
=> model_loadb model.tflite
=> model
=> tensor_loadb input_1 tmp.input
=> run output=Identity