"## Perform the COTS detection inference and tracking.\n",
"## Perform the COTS detection inference and tracking."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The main tracking loop will perform the following: \n",
"\n",
"1. Load the images in order.\n",
"2. Run the model on the image.\n",
"3. Update the tracker with the new images and detections.\n",
"4. Keep information about each track (id, current index and length) analysis or display. \n",
"\n",
"\n",
"The detection inference has the following four main steps:\n",
"The `TrackAnnotation` class, below, will collect the data about each track:"
"1. Read all images in the order of image indexes and convert them into uint8 TF tensors (Line 45-54).\n",
"2. Feed the TF image tensors into the model (Line 61) and get the detection output `detections`. In particular, the shape of input tensor is [batch size, height, width, number of channels]. In this demo project, the input shape is [4, 1080, 1920, 3].\n",
"3. The inference output `detections` contains four variables: `num_detections` (the number of detected objects), `detection_boxes` (the coordinates of each COTS object's bounding box), `detection_classes` (the class label of each detected object), `detection_scores` (the confidence score of each detected COTS object).\n",
"4. To track the movement of each detected object across frames, in each frame's detection, the tracker will estimate each tracked COTS object's position if COTS is not detected.\n"
"The `parse_image` function, below, will take `(index, filename)` pairs load the images as tensors and return `(timestamp_ms, filename, image)` triples, assuming 30fps"
]
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": null,
"execution_count": null,
...
@@ -1254,10 +1269,18 @@
...
@@ -1254,10 +1269,18 @@
"outputs": [],
"outputs": [],
"source": [
"source": [
"# Read a jpg image and decode it to a uint8 tf tensor.\n",
"# Read a jpg image and decode it to a uint8 tf tensor.\n",
"Here is the main tracker loop. Note that initially the saved `TrackAnnotations` don't contain the track lengths. The lengths are collected in the `track_length_for_id` dict."
]
]
},
},
{
{
...
@@ -1273,25 +1296,20 @@
...
@@ -1273,25 +1296,20 @@
"# Record tracking responses from the tracker\n",
"# Record tracking responses from the tracker\n",
"detection_result = []\n",
"detection_result = []\n",
"# Record the length of each tracking sequence\n",
"# Record the length of each tracking sequence\n",
"# Output the detection results and play the result video\n",
"# Output the detection results and play the result video\n",
"\n",
"\n",
"Once the inference is done, we use OpenCV to draw the bounding boxes (Line 9-10) and write the tracked COTS's information (Line 13-20: `COTS ID` `(sequence index/ sequence length)`) on each frame's image. Finally, we combine all frames into a video for visualisation."
"Once the inference is done, we draw the bounding boxes and track information onto each frame's image. Finally, we combine all frames into a video for visualisation."