"# Pre processing steps of a COCO JSON annotated file "
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "uXwwz3PlbUX2"
},
"source": [
"Given a single COCO annotated JSON file, your goal is to pre-process in order to remove noise and manipulate it into a form which is suitable for training a ML model. This script will also check if the annotated images are broken or missing."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "E1SxGZD2bv8E"
},
"source": [
"The COCO annotation file includes the following -\n",
"\n",
"1. Name of the images.\n",
"\n",
"2. Dimensions of the images.\n",
"\n",
"3. Classes in the image category.\n",
"\n",
"4. Name of the super categories of the classes.\n",
"\n",
"5. Area acquired by the segmented pixels in an image.\n",
"\n",
"6. Bounding box co-ordinates.\n",
"\n",
"7. Annotated segmentation coordinates."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "j0v31gxTbweO"
},
"source": [
"There is a lot of noise in the real world annotation file. The images name could be wrong. The images mentioned in an annotation file may not be present in the image folder, which will disrupt the model training procedure. The contents within an annotation file may not match with each other. Even the files present in an image folder may be broken or truncated, which will cause errors while reading image files. Our goal is to eradicate all these problems."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "PyFn96EKb7A-"
},
"source": [
"Our goal is to make sure that all information in the key values corresponds to each other correctly. This notebook will help you achieve this task."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "W6aXxxox0DDa"
},
"source": [
"## Import labels and sample JSON file \n",
"To import total classes for the material, material_form and plastic_type we will import the label files from the waste_identification_ml project from Tensorflow Model Garden.\n",
"We will also import a noisy sample JSON file to illustrate an example."