tflm_modelrunner

Overview

ModelRunner is a benchmarking tool for running TensorFlow Lite models on NXP microcontrollers. It supports both HTTP and UART communication modes and provides detailed latency profiling for each model layer.

Supported Toolchains

  • MCUXpresso IDE

  • IAR Embedded Workbench for ARM

  • Keil uVision MDK

  • ArmGCC (GNU Tools ARM Embedded)

Supported Boards

  • FRDM-MCXN947

  • MCX-N5XX-EVK

  • MCX-N9XX-EVK

  • MIMXRT700-EVK

  • MIMXRT595-EVK

  • MIMXRT685-EVK

  • MIMXRT1060-EVK

  • MIMXRT1170-EVK

Running the Demo

HTTP Mode (Default for mcxn9xxevk)

  1. Get Device IP

    Connect device console to check the IP address:

    *************************************************                                       
               TFLite Modelrunner
    *************************************************
     Initializing PHY... 
     DHCP state : SELECTING 
     DHCP state : REQUESTING 
     DHCP state : BOUND
    
     IPv4 Address : 10.193.20.56 
     IPv4 Subnet mask : 255.255.255.0 
     IPv4 Gateway : 10.193.20.254
     
     Initialized TFLiteMicro modelrunner server at port 10818
    
  2. Upload Model

    curl -X PUT http://10.193.20.56:10818/v1 -F 'block_content=@<path_to_model>.tflite'
    

    users can get agent response as below:

    {
       "reply": "success"
    }
    
  3. Run Latency Benchmark

    curl -s -X POST http://<IP>:10818/v1?run=1
    

    users can get agent response of inference time:

    {
       "timing": 511
    }
    
  4. Get Model Info

    curl http://<IP>:10818/v1/model
    

    users can get detail model info:

    {"timing":54065000,"ktensor_arena_size":22820,"inputs":[{"name":"input_1","scale":0.58470290899276733,"zero_points":83,"datatype":"INT8","shape":[1,49,10,1]}],"outputs":[{"name":"Identity","scale":0.00390625,"zero_points":-128,"datatype":"INT8","shape":[1,12]}],"layer_count":13,"layers":[{"name":"functional_1/activation/Relu;functional_1/batch_normalization/FusedBatchNormV3;functional_1/conv2d/BiasAdd/ReadVariableOp/resource;functional_1/conv2d/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d/Conv2D1","type":"CONV_2D","avg_timing":8249000,"tensor":{"timing":8249000}},{"name":"functional_1/activation_1/Relu;functional_1/batch_normalization_1/FusedBatchNormV3;functional_1/depthwise_conv2d/depthwise;functional_1/depthwise_conv2d/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3969000,"tensor":{"timing":3969000}},{"name":"functional_1/activation_2/Relu;functional_1/batch_normalization_2/FusedBatchNormV3;functional_1/conv2d_1/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_1/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_1/Conv2D1","type":"CONV_2D","avg_timing":7373000,"tensor":{"timing":7373000}},{"name":"functional_1/activation_3/Relu;functional_1/batch_normalization_3/FusedBatchNormV3;functional_1/depthwise_conv2d_1/depthwise;functional_1/depthwise_conv2d_1/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_1/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_4/Relu;functional_1/batch_normalization_4/FusedBatchNormV3;functional_1/conv2d_2/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_2/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_2/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/activation_5/Relu;functional_1/batch_normalization_5/FusedBatchNormV3;functional_1/depthwise_conv2d_2/depthwise;functional_1/depthwise_conv2d_2/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_2/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_6/Relu;functional_1/batch_normalization_6/FusedBatchNormV3;functional_1/conv2d_3/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_3/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/conv2d_3/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/activation_7/Relu;functional_1/batch_normalization_7/FusedBatchNormV3;functional_1/depthwise_conv2d_3/depthwise;functional_1/depthwise_conv2d_3/BiasAdd;functional_1/conv2d_4/Conv2D;functional_1/depthwise_conv2d_3/BiasAdd/ReadVariableOp/resource1","type":"DEPTHWISE_CONV_2D","avg_timing":3968000,"tensor":{"timing":3968000}},{"name":"functional_1/activation_8/Relu;functional_1/batch_normalization_8/FusedBatchNormV3;functional_1/conv2d_4/BiasAdd/ReadVariableOp/resource;functional_1/conv2d_4/BiasAdd;functional_1/conv2d_4/Conv2D1","type":"CONV_2D","avg_timing":7371000,"tensor":{"timing":7371000}},{"name":"functional_1/average_pooling2d/AvgPool","type":"AVERAGE_POOL_2D","avg_timing":401000,"tensor":{"timing":401000}},{"name":"functional_1/flatten/Reshape","type":"RESHAPE","avg_timing":6000,"tensor":{"timing":6000}},{"name":"functional_1/dense/BiasAdd","type":"FULLY_CONNECTED","avg_timing":25000,"tensor":{"timing":25000}},{"name":"Identity","type":"SOFTMAX","avg_timing":25000,"tensor":{"timing":25000}}]}
    
  5. Upload Input Tensor & Get Output

    curl -s -X POST "http://<IP>:10818/v1?run=1&output=Identity" -F 'block_content=@<path_to_input_tensor>.bin' |jq .outputs[0].data | xargs -i echo {$i} |base64 -d |hexdump -C
    

    users can get output data by hexdump for further post-process:

    00000000  ff 80 80 80 80 ff 80 80  80 80 80 81              |............|
    0000000c
    

UART Mode (Default for mimxrt700evk)

  1. Flash the board with the firmware

check moderunner run successfully on board, and must exit uart console to make sure modelrunner agent can communicate with uart.

  *************************************************
      	       TFLite Modelrunner
  *************************************************
	XSPI Psram Enabled!

	=>
  1. Run HTTP-to-UART agent on x86

    python3 main.py
    

    Server runs at: http://$ip:$port, users can modify the port id according to IT firewall policy, we set default port to 10919.

    * Serving Flask app 'main'
    * Debug mode: on
    WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
    * Running on all addresses (0.0.0.0)
    * Running on http://127.0.0.1:10919
    * Running on http://10.192.208.139:10919
    
  2. Connect UART cable

    • Linux: Check /dev/serial/by-id/

    • Windows: Use COM<x> port

MCU devices without ethernet port, the connection IP will be like “http://$server_ip:$port/serial/001063836560/v1”

  1. Benchmark Users can provide dataset according to model’s input and prepare benchmark test code for further interpretation.

4.1. Upload Model

curl -X PUT http://<IP>:<port>/serial/<serial-id>/v1 -d "block_count=1"

curl -X PUT http://<IP>:<port>/serial/<serial-id>/v1 -F 'block_content=@"${model_path}";filename="${model_name},name=block_content"'

users can get agent response as below:

{
   "reply": "success"
}

4.2. Run Latency Benchmark

curl -X POST http://<IP>:<port>/serial/<serial-id>/v1?run=1

users can get agent response of inference time:

{
   "timing": 511
}

4.3. Get Model Info

curl http://<IP>:<port>/serial/<serial-id>/v1/model

users can get detail model info layer by layer format with json:

{
"inputs": [
 {
   "data_type": "INT8",
   "name": "input_1",
   "scale": 0.584703,
   "shape": [
     1,
     49,
     10,
     1
   ],
   "zero_points": 83
 }
],
"ktensor_arena_size": 17172,
"layer_count": 3,
"layers": [
 {
   "avg_timing": 485000.0,
   "name": "functional_1/dense/BiasAdd",
   "timing": 485000.0,
   "type": "NeutronGraph"
 },
 {
   "avg_timing": 17000.0,
   "name": "Identity",
   "timing": 17000.0,
   "type": "SOFTMAX"
 }
],
"outputs": [
 {
   "data_type": "INT8",
   "name": "Identity",
   "scale": 0.003906,
   "shape": [
     1,
     12
   ],
   "zero_points": -128
 }
],
"timing": 502
}

4.4. Upload Input Tensor & Get Output Data

curl -X POST "http://<IP>:<port>/serial/<serial-id>/v1?run=1&output=${output_tensor_name}" \
  -F 'file=@<path_to_input_tensor>.bin'

users can get output data and do post-process accordingly:

{
"outputs": [
 {
   "data": "/4CAgID/gICAgICB",
   "datatype": "INT8",
   "name": "Identity",
   "shape": [
     1,
     12
   ]
 }
],
"timing": 498
}

ModelRunner CLI

connect to uart console of device, and run with cli as below:

python cli.py com20
=> reset
=> model_loadb model.tflite
=> model
=> tensor_loadb input_1 tmp.input
=> run output=Identity