tflm_modelrunner
Overview
ModelRunner is a benchmarking tool for running TensorFlow Lite models on NXP microcontrollers. It supports both HTTP and UART communication modes and provides detailed latency profiling for each model layer.
Supported Toolchains
MCUXpresso IDE
IAR Embedded Workbench for ARM
Keil uVision MDK
ArmGCC (GNU Tools ARM Embedded)
Supported Boards
FRDM-MCXN947
MCX-N5XX-EVK
MCX-N9XX-EVK
MIMXRT700-EVK
MIMXRT595-EVK
MIMXRT685-EVK
MIMXRT1060-EVK
MIMXRT1170-EVK
Running the Demo
HTTP Mode (Default for mcxn9xxevk
)
Get Device IP
Connect device console to check the IP address:
************************************************* TFLite Modelrunner ************************************************* Initializing PHY... DHCP state : SELECTING DHCP state : REQUESTING DHCP state : BOUND IPv4 Address : 10.193.20.56 IPv4 Subnet mask : 255.255.255.0 IPv4 Gateway : 10.193.20.254 Initialized TFLiteMicro modelrunner server at port 10818
Upload Model
curl -X PUT http://10.193.20.56:10818/v1 -F 'block_content=@<path_to_model>.tflite'
users can get agent response as below:
{ "reply": "success" }
Run Latency Benchmark
curl -s -X POST http://<IP>:10818/v1?run=1
users can get agent response of inference time:
{ "timing": 511 }
Get Model Info
curl http://<IP>:10818/v1/model
users can get detail model info:
{"timing":54065000,"ktensor_arena_size":22820,"inputs":[{"name":"input_1","scale":0.58470290899276733,"zero_points":83,"datatype":"INT8","shape":[1,49,10,1]}],"outputs":[{"name":"Identity","scale":0.00390625,"zero_points":-128,"datatype":"INT8","shape":[1,12]}],"layer_count":13,"layers":[{"name":"functional_1/activation/Relu;functional_1/batch_normalization/FusedBatchNormV3;functional_1/conv2d/BiasAdd/ReadVariableOp/resource;functional_1/conv2d/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d/Conv2D1","type":"CONV_2D","avg_timing":8249000,"tensor":{"timing":8249000}},{"name":"functional_1/activation_1/Relu;functional_1/batch_normalization_1/FusedBatchNormV3;functional_1/depthwise_conv2d/depthwise;functional_1/depthwise_conv2d/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3969000,"tensor":{"timing":3969000}},{"name":"functional_1/activation_2/Relu;functional_1/batch_normalization_2/FusedBatchNormV3;functional_1/conv2d_1/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_1/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_1/Conv2D1","type":"CONV_2D","avg_timing":7373000,"tensor":{"timing":7373000}},{"name":"functional_1/activation_3/Relu;functional_1/batch_normalization_3/FusedBatchNormV3;functional_1/depthwise_conv2d_1/depthwise;functional_1/depthwise_conv2d_1/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_1/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_4/Relu;functional_1/batch_normalization_4/FusedBatchNormV3;functional_1/conv2d_2/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_2/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_2/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/activation_5/Relu;functional_1/batch_normalization_5/FusedBatchNormV3;functional_1/depthwise_conv2d_2/depthwise;functional_1/depthwise_conv2d_2/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_2/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_6/Relu;functional_1/batch_normalization_6/FusedBatchNormV3;functional_1/conv2d_3/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_3/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_3/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/activation_7/Relu;functional_1/batch_normalization_7/FusedBatchNormV3;functional_1/depthwise_conv2d_3/depthwise;functional_1/depthwise_conv2d_3/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_3/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_8/Relu;functional_1/batch_normalization_8/FusedBatchNormV3;functional_1/conv2d_4/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_4/BiasAdd;functional_1/conv2d_4/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/average_pooling2d/AvgPool","type":"AVERAGE_POOL_2D","avg_timing":401000,"tensor":{"timing":401000}},{"name":"functional_1/flatten/Reshape","type":"RESHAPE","avg_timing":6000,"tensor":{"timing":6000}},{"name":"functional_1/dense/BiasAdd","type":"FULLY_CONNECTED","avg_timing":25000,"tensor":{"timing":25000}},{"name":"Identity","type":"SOFTMAX","avg_timing":25000,"tensor":{"timing":25000}}]}
Upload Input Tensor & Get Output
curl -s -X POST "http://<IP>:10818/v1?run=1&output=Identity" -F 'block_content=@<path_to_input_tensor>.bin' |jq .outputs[0].data | xargs -i echo {$i} |base64 -d |hexdump -C
users can get output data by hexdump for further post-process:
00000000 ff 80 80 80 80 ff 80 80 80 80 80 81 |............| 0000000c
UART Mode (Default for mimxrt700evk
)
Flash the board with the firmware
check moderunner run successfully on board, and must exit uart console to make sure modelrunner agent can communicate with uart.
*************************************************
TFLite Modelrunner
*************************************************
XSPI Psram Enabled!
=>
Run HTTP-to-UART agent on x86
python3 main.py
Server runs at:
http://$ip:$port
, users can modify the port id according to IT firewall policy, we set default port to 10919.* Serving Flask app 'main' * Debug mode: on WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead. * Running on all addresses (0.0.0.0) * Running on http://127.0.0.1:10919 * Running on http://10.192.208.139:10919
Connect UART cable
Linux: Check
/dev/serial/by-id/
Windows: Use
COM<x>
port
MCU devices without ethernet port, the connection IP will be like “http://$server_ip:$port/serial/001063836560/v1”
Benchmark Users can provide dataset according to model’s input and prepare benchmark test code for further interpretation.
4.1. Upload Model
curl -X PUT http://<IP>:<port>/serial/<serial-id>/v1 -d "block_count=1"
curl -X PUT http://<IP>:<port>/serial/<serial-id>/v1 -F 'block_content=@"${model_path}";filename="${model_name},name=block_content"'
users can get agent response as below:
{
"reply": "success"
}
4.2. Run Latency Benchmark
curl -X POST http://<IP>:<port>/serial/<serial-id>/v1?run=1
users can get agent response of inference time:
{
"timing": 511
}
4.3. Get Model Info
curl http://<IP>:<port>/serial/<serial-id>/v1/model
users can get detail model info layer by layer format with json:
{
"inputs": [
{
"data_type": "INT8",
"name": "input_1",
"scale": 0.584703,
"shape": [
1,
49,
10,
1
],
"zero_points": 83
}
],
"ktensor_arena_size": 17172,
"layer_count": 3,
"layers": [
{
"avg_timing": 485000.0,
"name": "functional_1/dense/BiasAdd",
"timing": 485000.0,
"type": "NeutronGraph"
},
{
"avg_timing": 17000.0,
"name": "Identity",
"timing": 17000.0,
"type": "SOFTMAX"
}
],
"outputs": [
{
"data_type": "INT8",
"name": "Identity",
"scale": 0.003906,
"shape": [
1,
12
],
"zero_points": -128
}
],
"timing": 502
}
4.4. Upload Input Tensor & Get Output Data
curl -X POST "http://<IP>:<port>/serial/<serial-id>/v1?run=1&output=${output_tensor_name}" \
-F 'file=@<path_to_input_tensor>.bin'
users can get output data and do post-process accordingly:
{
"outputs": [
{
"data": "/4CAgID/gICAgICB",
"datatype": "INT8",
"name": "Identity",
"shape": [
1,
12
]
}
],
"timing": 498
}
ModelRunner CLI
connect to uart console of device, and run with cli as below:
python cli.py com20
=> reset
=> model_loadb model.tflite
=> model
=> tensor_loadb input_1 tmp.input
=> run output=Identity