TrackNetV4: Enhancing Fast Sports Object Tracking with Motion Attention Maps

Arjun Raj¹, Dr. Lei Wang^{1,
2}, Dr. Tom Gedon³

¹Australian National University ²Data61/CSIRO ³Curtin University

Introduction

We propose using learnable motion attention maps to enhance the tracking of high-speed, small objects in video frames. While demonstrated with TrackNetV2, our approach can be seamlessly integrated into any heatmap-based detection and tracking framework.
Additionally, our multi-ball tracking dataset encompasses singles and doubles matches, featuring challenges such as multiple courts in a single video, multiple balls in play, nighttime matches, and balls camouflaged or blending with the court's color. It also includes varying resolutions to enhance tracking robustness.

Abstract

Accurately detecting and tracking high-speed, small objects, such as balls in sports videos, is challenging due to factors like motion blur and occlusion. Although recent deep learning frameworks like TrackNetV1, V2, and V3 have advanced tennis ball and shuttlecock tracking, they often struggle in scenarios with partial occlusion or low visibility. This is primarily because these models rely heavily on visual features without explicitly incorporating motion information, which is crucial for precise tracking and trajectory prediction. In this paper, we introduce an enhancement to the TrackNet family by fusing high-level visual features with learnable motion attention maps through a motion-aware fusion mechanism, effectively emphasizing the moving ball's location and improving tracking performance. Our approach leverages frame differencing maps, modulated by a motion prompt layer, to highlight key motion regions over time. Experimental results on the tennis ball and shuttlecock datasets show that our method enhances the tracking performance of both TrackNetV2 and V3. We refer to our lightweight, plug-and-play solution, built on top of the existing TrackNet, as TrackNetV4.

Prediction Comparison

TrackNetV4 includes two fusion layer variants: Type A and Type B. These differ in how attention maps and feature maps are combined.

Original Video	TrackNetV2 Prediction	TrackNetV4 Prediction (Type A)	TrackNetV4 Prediction (Type B)

Comparison of feature maps and heatmaps with and without motion-aware fusion. Four visualization groups are shown: the first row displays the original frames, while the second and third rows show feature maps from baseline (TrackNetV2) and our TrackNetV4, respectively. The fourth and fifth rows present heatmaps (tracking and prediction results) from the same models. Motion-aware fusion improves visual representations, resulting in clearer, more accurate ball predictions. When combined with high-level features, motion attention further refines ball localization, reducing missed detections compared to the baseline. This demonstrates motion awareness’ effectiveness in tracking fast-moving, small objects.

Refined ball localization.

Improved feature maps.

Dataset Overview

Dataset Statistics

Sample Images

Balls blend with the court.

Ball camouflages with the court.

Doubles match.

Multiple balls in play.

Visible balls in play on two courts.

Nighttime match.

Singles match.

Sample Videos

Download Dataset

Please review our terms and conditions, then complete the request form below. Once we’ve reviewed your submission, we will notify you of the outcome via the email address you provided.

Research-Only Data Use Agreement

Terms and Conditions:

Permitted Use:
The Dataset is to be used exclusively for academic and research purposes. Commercial use, reproduction, distribution, or sale of the Dataset or any derivative works is strictly prohibited.
Confidentiality:
The Dataset must not be disclosed, shared, or disseminated to any third party without prior written consent from the Provider. All data must be handled with appropriate confidentiality and security measures.
Attribution:
Any publications, presentations, or reports that result from the use of the Dataset must acknowledge the Provider and include a citation to the original dataset paper.
Prohibited Activities:
- You shall not attempt to re-identify any individuals or entities within the Dataset.
- You shall not use the Dataset to train or develop any models intended for surveillance or monitoring outside of research contexts.
Data Security:
You must implement appropriate technical and organizational measures to protect the Dataset against unauthorized access, loss, alteration, or disclosure.
Compliance with Laws:
You agree to comply with all applicable laws and regulations related to data protection and privacy in handling the Dataset.
Termination:
The Provider reserves the right to terminate this agreement and require you to delete all copies of the Dataset.

BibTeX Citation

Here is the BibTeX entry for referencing our work:

@INPROCEEDINGS{tracknetv4,
  author={Raj, Arjun and Wang, Lei and Gedeon, Tom},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={TrackNetV4: Enhancing Fast Sports Object Tracking with Motion Attention Maps}, 
  year={2025},
  volume={},
  number={},
  url={https://arxiv.org/abs/2409.14543}
}

Text copied!

Acknowledgement

Arjun Raj conducted this research under the supervision of Dr. Lei Wang for his research project at ANU. This work was also supported by the NCI National AI Flagship Merit Allocation Scheme, and the National Computational Merit Allocation Scheme 2024 (NCMAS 2024), with computational resources provided by NCI Australia.