Image annotation refers to adding a label or marker to each object in an image to provide more information about the content of the image. By labeling each object in an image, the image can be transformed into a data set that a machine can understand, allowing for deeper analysis and research using computer vision techniques. Image annotation can help machine learning models better understand images and thus more accurately identify objects in images.

How to label images?

Example of 2D semantic segmentation Top input image Bottom prediction

Image annotation methods specifically include semantic segmentation, rectangular frame annotation, polygon annotation, key point annotation, point cloud annotation, 3D cube annotation, 2D/3D fusion annotation, target tracking, attribute discrimination, etc.

1. Semantic Segmentation

Semantic segmentation refers to dividing complex and irregular images into regions based on the attributes of objects, and marking the corresponding attributes to help train image recognition models. It is often used in areas such as autonomous driving, human-computer interaction, and virtual reality.


2. Rectangular frame labeling

Rectangular frame annotation, also known as pull frame annotation, is currently the most widely used image annotation method, which can quickly frame the specified target object in image or video data in a relatively simple and convenient way.


3. Polygon labeling

Polygonal annotation refers to the use of polygonal frames to mark irregular target objects in static images. Compared with rectangular frame annotations, polygonal annotations can frame targets more accurately and are more targeted for irregular objects.


4. Labeling of key points

Key point labeling refers to manually marking key points at specified positions, such as face feature points, human bone connection points, etc., which are often used to train facial recognition models and statistical models.


5. Point cloud labeling

Point cloud is an important way of expressing 3D data. Through sensors such as lidar, various obstacles and their position coordinates can be collected. Annotators need to classify these dense point clouds and label them with different attributes. used in the field of autonomous driving.


6. 3D Cube Labeling

Different from point cloud annotation, 3D cube annotation is still based on two-dimensional planar image annotation. Annotators frame the edges of three-dimensional objects to obtain vanishing points and measure the relative distance between objects.


7. 2D/3D fusion labeling

2D/3D fusion annotation refers to simultaneously annotate and associate image data collected by 2D and 3D sensors. This method can mark the position and size of objects in the plane and three-dimensional, and help the automatic driving model to enhance vision and radar perception.

Illustration of challenges in semantic segmentation a Input Image b Ground Truth

8. Target tracking

Target tracking refers to the extraction of frame annotations in dynamic images, marking the target objects in each frame of the picture, and then describing their trajectories. This type of annotation is often used to train automatic driving models and video recognition models.


9. OCR transcription

OCR transcription is to mark and transcribe the text content in the image to help train and improve the image and text recognition model. Currently, JLW supports the transcription of printed or handwritten pictures in more than ten languages, including Simplified Chinese, Traditional Chinese, English, Japanese, Korean, French, German, Spanish, and Arabic.


10. Attribute discrimination

Attribute discrimination refers to identifying the target object in the image through manual or machine cooperation, and marking it with corresponding attributes.





Table of Contents