" <a target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb\"><img src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" />Run in Google Colab</a>\n",
" \u003ca target=\"_blank\" href=\"https://colab.research.google.com/github/tensorflow/models/blob/master/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb?force_crab_mode=1\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/colab_logo_32px.png\" /\u003eRun in Google Colab\u003c/a\u003e\n",
" </td>\n",
" \u003c/td\u003e\n",
" <td>\n",
" \u003ctd\u003e\n",
" <a target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb\"><img src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" />View on GitHub</a>\n",
" \u003ca target=\"_blank\" href=\"https://github.com/tensorflow/models/blob/master/official/projects/cots_detector/crown_of_thorns_starfish_detection_pipeline.ipynb\"\u003e\u003cimg src=\"https://www.tensorflow.org/images/GitHub-Mark-32px.png\" /\u003eView on GitHub\u003c/a\u003e\n",
" </td>\n",
" \u003c/td\u003e\n",
"</table>"
"\u003c/table\u003e"
]
]
},
},
{
{
...
@@ -64,11 +64,11 @@
...
@@ -64,11 +64,11 @@
"id": "jDiIX2xawkJw"
"id": "jDiIX2xawkJw"
},
},
"source": [
"source": [
"### This notebook\n",
"## About this notebook\n",
"\n",
"\n",
"This notebook tutorial shows how to detect COTS using a pre-trained COTS detector implemented in TensorFlow. On top of just running the model on each frame of the video, the tracking code in this notebook aligns detections from frame to frame creating a consistent track for each COTS. Each track is given an id and frame count. Here is an example image from a video of a reef showing labeled COTS starfish.\n",
"This notebook tutorial shows how to detect COTS using a pre-trained COTS detector implemented in TensorFlow. On top of just running the model on each frame of the video, the tracking code in this notebook aligns detections from frame to frame creating a consistent track for each COTS. Each track is given an id and frame count. Here is an example image from a video of a reef showing labeled COTS starfish.\n",
"It is recommended to enable GPU to accelerate the inference. On CPU, this runs for about 40 minutes, but on GPU it takes only 10 minutes. (from colab menu: *Runtime > Change runtime type > Hardware accelerator > select \"GPU\"*)."
"It is recommended to enable GPU to accelerate the inference. On CPU, this runs for about 40 minutes, but on GPU it takes only 10 minutes. (In Colab it should already be set to GPU in the Runtime menu: *Runtime \u003e Change runtime type \u003e Hardware accelerator \u003e select \"GPU\"*)."
]
]
},
},
{
{
...
@@ -86,6 +86,8 @@
...
@@ -86,6 +86,8 @@
"id": "a4R2T97u442o"
"id": "a4R2T97u442o"
},
},
"source": [
"source": [
"## Setup \n",
"\n",
"Install all needed packages."
"Install all needed packages."
]
]
},
},
...
@@ -99,7 +101,8 @@
...
@@ -99,7 +101,8 @@
"source": [
"source": [
"# remove the existing datascience package to avoid package conflicts in the colab environment\n",
"# remove the existing datascience package to avoid package conflicts in the colab environment\n",
"Re-encode the video, and reduce its size (Colab crashes if you try to embed the full size video)."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "_li0qe-gh1iT"
},
"outputs": [],
"source": [
"subprocess.check_call([\n",
" \"ffmpeg\", \"-y\", \"-i\", tmp_video_path,\n",
" \"-vf\",\"scale=800:-1\",\n",
" \"-crf\", \"18\",\n",
" \"-preset\", \"veryfast\",\n",
" \"-vcodec\", \"libx264\", preview_video_path])"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "2ItoiHyYQGya"
},
"source": [
"The images you downloaded are frames of a movie showing a top view of a coral reef with crown-of-thorns starfish. Use the `base64` data-URL trick to embed the video in this notebook:"
" Your browser does not support the video tag.\n",
" \u003c/video\u003e\n",
" \"\"\").format(mime=mime, b64=b64))\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "SiOsbr8xePkg"
},
"outputs": [],
"source": [
"embed_video_file(preview_video_path)"
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "9Z0DTbWrZMZ-"
},
"source": [
"Can you se them? there are lots. The goal of the model is to put boxes around all of the starfish. Each starfish will get its own ID, and that ID will be stable as the camera passes over it."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "d0iALUwM0g2p"
},
"source": [
"## Load the model"
]
},
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
"id": "fVq6vNBTxM62"
"id": "fVq6vNBTxM62"
},
},
"source": [
"source": [
"Also, download the trained COTS detection model that matches your preferences above."
"Download the trained COTS detection model that matches your preferences from earlier."
]
]
},
},
{
{
...
@@ -196,246 +349,736 @@
...
@@ -196,246 +349,736 @@
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
"id": "FNwP3s-5xgaF"
"id": "ezyuSHK5ap__"
},
},
"source": [
"source": [
"You also need to retrieve the sample data. This sample data is made up of a series of chronological images."
"Load trained model from disk and create the inference function `model_fn()`. This might take a little while."
"That works well for one frame, but to count the number of COTS in a video you'll need to track the detections from frame to frame. The raw detection indices are not stable, they're just sorted by the detection score. Below both sets of detections are overlaid on the second image with the first frame's detections in white and the second frame's in orange, the indices are not aligned. The positions are shifted because of camera motion between the two frames:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "PLtxJFPuLma0"
},
"outputs": [],
"source": [
"image2 = tf.io.read_file(filenames[example_frame_number+5]) # five frames later\n",
"Now keep the white boxes for the initial detections, and the orange boxes for the new set of detections. But add the optical-flow propagated tracks in green. You can see that by using optical-flow to propagate the old detections to the new frame the alignment is quite good. It's this alignment between the old and new detections (between the green and orange boxes) that allows the tracker to make a persistent track for each COTS. "
"These help track the movement of each COTS object across the video frames.\n",
"\n",
"The tracker collects related detections into `Track` objects. \n",
"\n",
"The class's init is defined below, it's methods are defined in the following cells.\n",
"\n",
"The `__init__` method just initializes the track counter (`track_id`), and sets some default values for the tracking and optical flow configurations. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "3j2Ka1uGEoz4"
},
"outputs": [],
"source": [
"class OpticalFlowTracker:\n",
" \"\"\"Optical flow tracker.\"\"\"\n",
"\n",
" @classmethod\n",
" def add_method(cls, fun):\n",
" \"\"\"Attach a new method to the class.\"\"\"\n",
" # The running track count, incremented for each new track.\n",
" self.track_id = tid\n",
" self.tracks = []\n",
" self.prev_image = None\n",
" self.prev_time = None\n",
"\n",
" # Configuration for the track cleanup logic.\n",
" # How long to apply optical flow tracking without getting positive \n",
" # detections (sec).\n",
" self.track_flow_time = ft * 1000\n",
" # Required IoU overlap to link a detection to a track.\n",
" self.overlap_threshold = iou\n",
" # Used to detect if detector needs to be reset.\n",
" self.time_threshold = tt * 1000\n",
" self.border = bb\n",
"\n",
" if of_params is None:\n",
" of_params = default_of_params()\n",
" self.of_params = of_params\n"
]
]
},
},
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"metadata": {
"metadata": {
"id": "d0iALUwM0g2p"
"id": "yBLSv0Fi_JJD"
},
},
"source": [
"source": [
"# Load the model and perform inference and tracking on sample data\n",
"Internally the tracker will use small `Track` and `Tracklet` classes to organize the data. The `Tracklet` class is just a `Detection` with a timestamp, while a `Track` is a track ID, the most recent detection and a list of `Tracklet` objects forming the history of the track."
"Load trained model from disk and create the inference function `model_fn()`. This might take a little while."
"These help track the movement of each COTS object throughout the image frames."
" def replace(self, **kwargs):\n",
" d = self.__dict__.copy()\n",
" d.update(kwargs)\n",
" return type(self)(**d)"
]
]
},
},
{
{
"cell_type": "code",
"cell_type": "markdown",
"execution_count": null,
"metadata": {
"metadata": {
"id": "tybwY3eaY803"
"id": "Ntl_4oUp_1nD"
},
},
"outputs": [],
"source": [
"source": [
"def box_area(x0, y0, x1, y1):\n",
"The tracker keeps a list of active `Track` objects.\n",
" return (x1 - x0 + 1) * (y1 - y0 + 1)\n",
"\n",
"@dataclasses.dataclass\n",
"class Detection:\n",
" \"\"\"Detection dataclass.\"\"\"\n",
" class_id: int\n",
" score: float\n",
" x0: float\n",
" y0: float\n",
" x1: float\n",
" y1: float\n",
"\n",
"\n",
" def __repr__(self):\n",
"The main `update` method takes an image, along with the list of detections and the timestamp for that image. On each frame step it performs the following sub-tasks:\n",
"The `apply_detections_to_tracks` method compares each detection to the updated bounding box for each track. The detection is added to the track that matches best, if the match is better than the `overlap_threshold`. If no track is better than the threshold, the detection is used to create a new track. \n",
"The `parse_image` function, below, will take `(index, filename)` pairs load the images as tensors and return `(timestamp_ms, filename, image)` triples, assuming 30fps"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "Dn7efhr0GBGz"
},
},
"outputs": [],
"source": [
"source": [
"The goal of the model is to put boxes around all of the starfish. Each starfish gets its own ID, and that ID will be stable as the camera passes over it."
"# Read a jpg image and decode it to a uint8 tf tensor.\n",
"## Perform the COTS detection inference and tracking.\n",
"Here is the main tracker loop. Note that initially the saved `TrackAnnotations` don't contain the track lengths. The lengths are collected in the `track_length_for_id` dict."
"\n",
"The detection inference has the following four main steps:\n",
"1. Read all images in the order of image indexes and convert them into uint8 TF tensors (Line 45-54).\n",
"2. Feed the TF image tensors into the model (Line 61) and get the detection output `detections`. In particular, the shape of input tensor is [batch size, height, width, number of channels]. In this demo project, the input shape is [4, 1080, 1920, 3].\n",
"3. The inference output `detections` contains four variables: `num_detections` (the number of detected objects), `detection_boxes` (the coordinates of each COTS object's bounding box), `detection_classes` (the class label of each detected object), `detection_scores` (the confidence score of each detected COTS object).\n",
"4. To track the movement of each detected object across frames, in each frame's detection, the tracker will estimate each tracked COTS object's position if COTS is not detected.\n"
]
]
},
},
{
{
"cell_type": "code",
"cell_type": "code",
"execution_count": null,
"execution_count": null,
"metadata": {
"metadata": {
"colab": {
"id": "cqN8RGBgVbr4"
"background_save": true
},
"id": "vHIarsxH1svL"
},
},
"outputs": [],
"outputs": [],
"source": [
"source": [
"# Record all the detected COTS objects with the scores equal to or greater than the threshold\n",
"threshold = 0.4\n",
"_CLASS_ID_TO_LABEL = ('COTS',)\n",
"# Create a tracker object\n",
"# Create a tracker object\n",
"tracker = OpticalFlowTracker(tid=1)\n",
"tracker = OpticalFlowTracker(tid=1)\n",
"# Record tracking responses from the tracker\n",
"# Record tracking responses from the tracker\n",
"detection_result = []\n",
"detection_result = []\n",
"# Record the length of each tracking sequence\n",
"# Record the length of each tracking sequence\n",
"track_length_dict = {}\n",
"track_length_for_id = {}\n",
"\n",
"base_time = tf.timestamp()\n",
"\n",
"# Format tracker response, and save it into a new object.\n",
"# Output the detection results and play the result video\n",
"Once the tracking loop has completed you can update the track length (`seq_length`) for each annotation from the `track_length_for_id` dict:"
"Once the inference is done, we use OpenCV to draw the bounding boxes (Line 9-10) and write the tracked COTS's information (Line 13-20: `COTS ID` `(sequence index/ sequence length)`) on each frame's image. Finally, we combine all frames into a video for visualisation."
"## Output the detection results and play the result video\n",
"\n",
"Once the inference is done, we draw the bounding boxes and track information onto each frame's image. Finally, we combine all frames into a video for visualisation."
]
},
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"id": "gWMJG7g95MGk"
"id": "gWMJG7g95MGk"
},
},
"outputs": [],
"outputs": [],
...
@@ -740,26 +1394,10 @@
...
@@ -740,26 +1394,10 @@
" fps=15, \n",
" fps=15, \n",
" frameSize=size)\n",
" frameSize=size)\n",
"\n",
"\n",
"for file_path, tracks in tqdm(detection_result):\n",
"for file_path, annotations in tqdm(detection_result):\n",