2024-06-20: Test server online, final test data has been released.Baidu Drive (Extraction code: VISO).
2024-04-30: Training and validation data has been released (google Drive, Baidu Drive (Extraction code: VISO)). Paticipants can use the released data to develop their algorithms.
Satellite video cameras can provide continuous observation for a large-scale area, which is suitable for several downstream remote sensing applications including traffic management, ocean monitoring, and smart city. Recently, small moving objects detection and tracking in satellite videos have attracted increasing attention in both academia and industry. However, it remains challenging to achieve accurate and robust moving object detection and tracking in satellite videos, due to the lack of high-quality and well-annotated public datasets and comprehensive benchmarks for performance evaluation. To this end, we organize this competition based on the recent VISO (google Drive, Baidu Drive (Extraction code: VISO)) dataset, and focus on the specific competitions and research problems in moving object detection and tracking in satellite videos. We hope this competition could inspire the community to explore the tough problems in satellite video analysis, and ultimately drive technological advancement in emerging applications.
ICPR 2024 competition on Moving Object Detection and Tracking in Satellite Videos aims to facilitate the development of video object detection and tracking algorithms, and push forward research in the field of moving object detection and tracking from satellite videos. This competition is expected to include the following two competition tracks.
VISO (google Drive, Baidu Drive (Extraction code: VISO)) dataset with 95 satellite videos (with 28,500 frames) captured by Jilin-1 satellite platforms, the goal of this task is to achieve moving object detection across the whole video. The organizers will provide the training set (with 21,000 frames) and the validation set (with 3000 frames) with full bounding boxes annotations. The test set (with 4500 frames) will be also provided, but with satellite images only. The participants are expected to train their models on the training set and validate the performance on the validation set. Then, the finalized model is used to generate detection results on the test set. The final performance will be automatically evaluated by the organizers with a set of objective quantitative metrics. (see Evaluation Metrics, Track 1).
This task aims at locating multiple objects of interest, maintaining their identities, and yielding their individual trajectories across the whole video. For this task, 95 sequences (videos 1 to 95) with a total of 28,500 frames from the VISO (google Drive, Baidu Drive (Extraction code: VISO)) dataset will be provided. Specifically, videos 1 to 65 will be used as the training set and videos 66 to 75 will be used as the validation set. The bounding box annotations and the instance id of each object in each frame will be provided. The test set is composed of videos 76 to 95, and only the annotation of the first frame will be provided for initialization. The participants are expected to train their models on the training set and validate the performance on the validation set. Then, the finalized model is used to generate tracking results on the test set.
This competition is built upon our recently released VISO (google Drive, Baidu Drive (Extraction code: VISO)) dataset, the first well-annotated large-scale satellite videos dataset for the task of moving object detection and tracking. The dataset is captured by the Jilin-1 satellite constellation at different positions of the satellite orbit. The recorded videos cover several square kilometers of areas in real scenes. Each image in the videos has a resolution of 12,000 × 5,000 and contains a great number of objects with different scales. Moreover, four common types of moving objects, including car and ship, are manually labeled. An example of a labeled video is shown below:
To evaluate the detection performance of the methods submitted to the competition, the commonly-used evaluation metrics (i.e., mAP) for object detection will be used. We report the average results over all the satellite videos in the evaluation dataset. Note that, the final results are ranked by mAP (IOU = 0.5) calculated in the test dataset.
The metrics in generic multiple-object tracking competition benchmark will be used for quantitative evaluation. The final results of multi-objective tracking will be ranked according to the MOTA and IDF1 values calculated by participants in the test data set with a comprehensive weighting of 50% and 50% respectively.
Over the last few years, several milestone methods have been developed for satellite videos, including DSFNet and CFME. In this competition, DSFNet is used as a detection baseline model and the submitted results should be at least on par with DSFNet. In particular, we selected SORT as a multi object tracking baseline model. Note that, the inputs (i.e., detection results at each frame) to the baselines is used the detection results achieved by DSFNet method. The solutions with evaluation metrics values lower than these baselines will not be ranked in the leaderboard.