From WebAnnotations to COCO: Optimize Datasets for Machine Learning

In machine learning, properly formatted and annotated datasets are crucial for training accurate models. The COCO (Common Objects in Context) format stands out as one popular and widely adopted format for annotating images. However, when drawing polygons on images using tools like Annotorious in a Remix application, the tool stores the annotations in the W3C WebAnnotation format. We’ll explore converting W3C WebAnnotation JSON to COCO JSON format using a TypeScript script and discuss the COCO format’s importance in the machine learning ecosystem.

Understanding W3C WebAnnotation JSON

The W3C WebAnnotation JSON format represents annotations on web resources, including images, in a standardized way. It offers a structured format for describing annotations, such as polygons drawn on an image, and metadata about the annotation. When using the Annotorious library to draw polygons on an image, it stores the annotations in this format.

Here’s an example of a W3C WebAnnotation JSON object from your app:

[
{
"id": "a66b869f-9dc6-4e75-8caf-53cb0dbd3eb6",
"bodies": [],
"target": {
"annotation": "a66b869f-9dc6-4e75-8caf-53cb0dbd3eb6",
"selector": {
"type": "POLYGON",
"geometry": {
"bounds": {
"minX": 794.7157592773438,
"minY": 433.3932800292969,
"maxX": 894.6519775390625,
"maxY": 584.2633666992188
},
"points": [
[794.7157592773438, 433.3932800292969],
[794.7157592773438, 433.3932800292969],
[863.0336303710938, 584.2633666992188],
[863.0336303710938, 584.2633666992188],
[894.6519775390625, 563.5730590820312],
[894.6519775390625, 563.5730590820312],
[894.6519775390625, 563.5730590820312]
]
}
}
},
"creator": {
"isGuest": true,
"id": "ToUiviSOTPnRffLbW6t5"
},
"created": "2024-05-08T12:18:52.149Z"
}
]

The WebAnnotation JSON object includes an id field for uniquely identifying the annotation and a target field that describes the annotation target (in this case, an image). The selector field details the geometry of the annotation, including the bounding box coordinates and the points defining the polygon. It also includes metadata such as the creator and created timestamp.

Understanding COCO JSON Format

The machine learning community widely uses the COCO JSON format for annotating images and training object detection and segmentation models. It provides a dataset’s standardized structure for representing images, annotations, and categories.

Here’s a simplified example of a COCO JSON file:

{
"info": {
"year": 2023,
"version": "1.0",
"description": "COCO Annotations",
"contributor": "Your Name",
"url": "https://example.com",
"date_created": "2023-06-07T12:34:56Z"
},
"licenses": [
{
"id": 1,
"name": "License Name",
"url": "https://example.com/license"
}
],
"images": [
{
"id": 1,
"width": 800,
"height": 600,
"file_name": "image1.jpg"
},
{
"id": 2,
"width": 1024,
"height": 768,
"file_name": "image2.jpg"
}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1,
"segmentation": [
[100, 200, 200, 200, 200, 400, 100, 400]
],
"area": 20000,
"bbox": [100, 200, 100, 200],
"iscrowd": 0
},
{
"id": 2,
"image_id": 2,
"category_id": 1,
"segmentation": [
[500, 300, 600, 300, 600, 400, 500, 400]
],
"area": 10000,
"bbox": [500, 300, 100, 100],
"iscrowd": 0
}
],
"categories": [
{
"id": 1,
"name": "Object",
"supercategory": "None"
}
]
}

The COCO JSON file consists of several sections:

info: Contains metadata about the dataset, such as the year, version, description, contributor, URL, and date created.
licenses: Specifies the license information for the dataset.
images: Contains an array of image objects, each with a unique ID, width, height, and file name.
annotations: Contains an array of annotation objects, each with a unique ID, associated image ID, category ID, segmentation coordinates, area, bounding box, and iscrowd flag.
categories: Defines the object categories present in the dataset.

Converting WebAnnotation JSON to COCO JSON

Converting W3C WebAnnotation JSON to COCO JSON format involves using a TypeScript script that processes the WebAnnotation JSON files and generates a COCO JSON file. The script will:

  • Read the WebAnnotation JSON files from a specified directory.
  • Extract the relevant information from each WebAnnotation JSON object, such as the image ID, annotation coordinates, and category.
  • Create corresponding COCO JSON objects for images and annotations.
  • Generate a COCO JSON file with the converted data, including the info, licenses, images, annotations, and categories sections. Or, if the file already exists, add new data to the existing COCO file.

Here’s a simplified version of the TypeScript script:

// ... (Interface definitions and base directory)
function processDirectory(subDirectory: string) {
// Read the existing COCO JSON file if it exists, otherwise create a new one
//type of the defined interface
let cocoDataset: COCOFormat
if (fs.existsSync(cocoFilePath)) {
const existingData = fs.readFileSync(cocoFilePath, { encoding: 'utf-8' });
cocoDataset = JSON.parse(existingData);
} else {
cocoDataset = {
info: {
// ... (Dataset info)
},
licenses: [
// ... (License info)
],
images: [],
annotations: [],
categories: [
// ... (Category info)
],
};
}

// Read the directory for JSON files
fs.readdir(directoryPath, (err, files) => {
if (err) {
console.error(`Error reading the directory: ${directoryPath}`, err);
return;
}

files.forEach((file) => {
if (
path.extname(file).toLowerCase() === ".json" &&
!existingImageIds.has(file)
) {
const filePath = path.join(directoryPath, file);
const data = fs.readFileSync(filePath, { encoding: "utf-8" });
const annotations: Annotation[] = JSON.parse(data);

cocoDataset.images.push({
id: imageId,
width: 1920,
height: 1080,
file_name: file,
});
//map the annotations geometry bounds to the bbox coordinates of the COCo json object
//to accurately represent the polygon/annotation x and y coordinates
annotations.forEach((annotation) => {
const segmentation =
annotation.target.selector.geometry.points.reduce<number[]>(
(acc, val) => [...acc, ...val], 
[],
);
cocoDataset.annotations.push({
id: annotationId,
image_id: imageId,
category_id: 1,
segmentation: [segmentation],
area: 0,
bbox: [
annotation.target.selector.geometry.bounds.minX,
annotation.target.selector.geometry.bounds.minY,
annotation.target.selector.geometry.bounds.maxX -
annotation.target.selector.geometry.bounds.minX,
annotation.target.selector.geometry.bounds.maxY -
annotation.target.selector.geometry.bounds.minY,
],
iscrowd: 0,
});

annotationId++;
});

imageId++;
}
});

// Create the output directory if it doesn't exist
if (!fs.existsSync(outputPath)) {
fs.mkdirSync(outputPath, { recursive: true });
}

// Write the updated COCO dataset back to the file
fs.writeFileSync(cocoFilePath, JSON.stringify(cocoDataset, null, 2));
});
}

This script reads the WebAnnotation JSON files from the specified directories, converts the annotations to COCO format, and generates a COCO JSON file in the output directory or adds to an existing COCO JSON file.

Here’s an example of a converted COCO JSON object based on the WebAnnotation JSON from an app:

{
"info": {
"year": 2023,
"version": "1.0",
"description": "COCO Annotations",
"contributor": "Your Name",
"url": "https://example.com",
"date_created": "2023-06-07T12:34:56Z"
},
"licenses": [
{
"id": 1,
"name": "License Name",
"url": "https://example.com/license"
}
],
"images": [
{
"id": 1,
"width": 1024,
"height": 768,
"file_name": "image1.jpg"
}
],
"annotations": [
{
"id": 1,
"image_id": 1,
"category_id": 1,
"segmentation": [
[
794.7157592773438, 433.3932800292969,
794.7157592773438, 433.3932800292969,
863.0336303710938, 584.2633666992188,
863.0336303710938, 584.2633666992188,
894.6519775390625, 563.5730590820312,
894.6519775390625, 563.5730590820312,
894.6519775390625, 563.5730590820312
]
],
"area": 15075.951171875,
"bbox": [
794.7157592773438, 433.3932800292969,
100, 150.8701171875
],
"iscrowd": 0
}
],
"categories": [
{
"id": 1,
"name": "Object",
"supercategory": "None"
}
]
}

In this example, the WebAnnotation JSON object is converted to a COCO annotation object. The segmentation field contains the polygon points, the area field is calculated based on the bounding box dimensions, and the bbox field represents the bounding box coordinates. The image information is added to the images section, and the category information is included in the categories section.

Importance of COCO Format in Machine Learning

The COCO format has become a standard in the machine learning community because it:

  • Provides a consistent structure for image annotations, making sharing and using datasets easier across projects and frameworks.
  • Ensures compatibility with popular machine learning frameworks and tools.
  • Serves as a benchmark for evaluating object detection and segmentation models.
  • Has a large and active community that contributes to its development and provides resources.

Key Takeaways

Converting W3C WebAnnotation JSON to COCO JSON format is an essential step in preparing annotated datasets for machine learning tasks. By automating the conversion process with a TypeScript script, you can easily transform annotations created with tools like Annotorious into the widely-used COCO format. This enables leveraging the COCO format benefits, supporting the development and evaluation of accurate machine learning models for object detection tasks.

Conversation

Join the conversation

Your email address will not be published. Required fields are marked *