Force timestamps and text for last section

304ae9d3 · Mark Daoust · 4dba273d · 304ae9d3
Commit 304ae9d3 authored May 18, 2022 by Mark Daoust
Hide whitespace changes
Inline Side-by-side

Showing with 47 additions and 22 deletions

official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb ...etector/crown_of_thorns_starfish_detection_pipeline.ipynb +47 -22

No files found.
--- a/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb
+++ b/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb
@@ -1212,13 +1212,21 @@
    "id": "OW5gGixy1osE"
   },
   "source": [
-    "## Perform the COTS detection inference and tracking.\n",
+    "## Perform the COTS detection inference and tracking."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The main tracking loop will perform the following: \n",
+    "\n",
+    "1. Load the images in order.\n",
+    "2. Run the model on the image.\n",
+    "3. Update the tracker with the new images and detections.\n",
+    "4. Keep information about each track (id, current index and length) analysis or display. \n",
    "\n",
-    "The detection inference has the following four main steps:\n",
-    "1.    Read all images in the order of image indexes and convert them into uint8 TF tensors (Line 45-54).\n",
-    "2.    Feed the TF image tensors into the model (Line 61) and get the detection output `detections`. In particular, the shape of input tensor is [batch size, height, width, number of channels]. In this demo project, the input shape is [4, 1080, 1920, 3].\n",
-    "3.    The inference output `detections` contains four variables: `num_detections` (the number of detected objects), `detection_boxes` (the coordinates of each COTS object's bounding box), `detection_classes` (the class label of each detected object), `detection_scores` (the confidence score of each detected COTS object).\n",
-    "4.    To track the movement of each detected object across frames, in each frame's detection, the tracker will estimate each tracked COTS object's position if COTS is not detected.\n"
+    "The `TrackAnnotation` class, below, will collect the data about each track:"
   ]
  },
  {
@@ -1245,6 +1253,13 @@
    "    return f\"{self.seq_id} ({self.seq_idx}/{self.seq_length})\"\n"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The `parse_image` function, below, will take `(index, filename)` pairs load the images as tensors and return `(timestamp_ms, filename, image)` triples, assuming 30fps"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -1254,10 +1269,18 @@
   "outputs": [],
   "source": [
    "# Read a jpg image and decode it to a uint8 tf tensor.\n",
-    "def parse_image(filename):\n",
+    "def parse_image(index, filename):\n",
    "    image = tf.io.read_file(filename)\n",
    "    image = tf.io.decode_jpeg(image)\n",
-    "    return (tf.timestamp(), filename, image)"
+    "    timestamp_ms = 1000*index/30 # assuming 30fps\n",
+    "    return (timestamp_ms, filename, image)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Here is the main tracker loop. Note that initially the saved `TrackAnnotations` don't contain the track lengths. The lengths are collected in the `track_length_for_id` dict."
   ]
  },
  {
@@ -1273,25 +1296,20 @@
    "# Record tracking responses from the tracker\n",
    "detection_result = []\n",
    "# Record the length of each tracking sequence\n",
-    "track_length_dict = {}\n",
-    "\n",
-    "base_time = tf.timestamp()\n",
+    "track_length_for_id = {}\n",
    "\n",
    "# Create a data loader\n",
    "file_list = sorted(glob.glob(f\"sample_images/{test_sequence_name}/*.jpg\"))\n",
-    "list_ds = tf.data.Dataset.from_tensor_slices(file_list)\n",
+    "list_ds = tf.data.Dataset.from_tensor_slices(file_list).enumerate()\n",
    "images_ds = list_ds.map(parse_image)\n",
    "\n",
    "# Traverse the dataset with batch size = 1, you cannot change the batch size\n",
-    "for data in tqdm(images_ds.batch(1, drop_remainder=True)):\n",
-    "    # timestamp is used for recording the order of frames\n",
-    "    timestamp, file_path, images = data\n",
-    "    timestamp = (timestamp - base_time) * 1000\n",
+    "for timestamp_ms, file_path, images in tqdm(images_ds.batch(1, drop_remainder=True)):\n",
    "    # get detection result\n",
    "    detections = Detection.process_model_output(images[0], model_fn(images))\n",
    "\n",
    "    # Feed detection results and the corresponding timestamp to the tracker, and then get tracker response\n",
-    "    tracks = tracker.update(images[0].numpy(), detections, timestamp[0])\n",
+    "    tracks = tracker.update(images[0].numpy(), detections, timestamp_ms[0])\n",
    "    annotations = []\n",
    "    for track in tracks:\n",
    "      anno = TrackAnnotation(\n",
@@ -1300,11 +1318,18 @@
    "          seq_idx = len(track.linked_dets)\n",
    "      )\n",
    "      annotations.append(anno)\n",
-    "      track_length_dict[track.id] = len(track.linked_dets)\n",
+    "      track_length_for_id[track.id] = len(track.linked_dets)\n",
    "    \n",
    "    detection_result.append((file_path.numpy()[0].decode(), annotations))"
   ]
  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Once the tracking loop has completed you can update the track length (`seq_length`) for each anniotation from the `track_length_for_id` dict:"
+   ]
+  },
  {
   "cell_type": "code",
   "execution_count": null,
@@ -1313,12 +1338,12 @@
   },
   "outputs": [],
   "source": [
-    "def update_annotation_lengths(detection_result, track_length_dict):\n",
+    "def update_annotation_lengths(detection_result, track_length_for_id):\n",
    "  new_result = []\n",
    "  for file_path, annotations in detection_result:\n",
    "    new_annotations = []\n",
    "    for anno in annotations:\n",
-    "      anno = anno.replace(seq_length=track_length_dict[anno.seq_id])\n",
+    "      anno = anno.replace(seq_length=track_length_for_id[anno.seq_id])\n",
    "      new_annotations.append(anno)\n",
    "    new_result.append((file_path, new_annotations))\n",
    "  return new_result"
@@ -1332,7 +1357,7 @@
   },
   "outputs": [],
   "source": [
-    "detection_result = update_annotation_lengths(detection_result, track_length_dict)"
+    "detection_result = update_annotation_lengths(detection_result, track_length_for_id)"
   ]
  },
  {
@@ -1343,7 +1368,7 @@
   "source": [
    "# Output the detection results and play the result video\n",
    "\n",
-    "Once the inference is done, we use OpenCV to draw the bounding boxes (Line 9-10) and write the tracked COTS's information (Line 13-20: `COTS ID` `(sequence index/ sequence length)`) on each frame's image. Finally, we combine all frames into a video for visualisation."
+    "Once the inference is done, we draw the bounding boxes and track information onto each frame's image. Finally, we combine all frames into a video for visualisation."
   ]
  },
  {