There are 22 players competing for possession of the ball in football, one of the most popular sports on the planet.
While watching football games is an important part of the experience, the data that we can gather from them might help us understand why this is so.
In my contribution to the subproblem of football analysis, I share my experiences with the difficulty of interpreting football matches from television-like video feeds. For more information, visit baanstepball.com.
There is something that bothers you
Even though fixed cameras can be placed throughout the field, moving cameras offer difficulty in extracting positional and semantic information. Real stadiums, however, would not be able to achieve that due to budget and permission limitations. Video data can be processed in a variety of ways if you don’t want to leave your chair, and if your budget is tight.
What to do in this situation
Rather than try to break this big task into smaller, more manageable, more specific chunks, (as is the course for any textbook programmer to do), we decided to break it up into smaller bits.
As a result, the following divisions have been established:
- The players’ positions are projected onto a two-dimensional space using the camera view (reference estimate and homography estimation).
- Identification of the player, ball, and official (e.g., where they’re from).
- For my project, it is vital to track objects (also known as entities).
- Identifying players between frames is possible? Is there any way to identify them?
- I want to know what team a player plays for (how do I do that).
In the next step, we will analyze the specific tasks, such as the positioning and semantic data.
Object detection is performed on each frame sequence as the fields and entities are detected (field detection). When the events are nearly consecutively detected, entities are tracked.
Similarly, we estimate the field’s position relative to the camera by projecting the position of each entity. By identifying and placing each player within a team, we can also track each player’s performance.
Once the video has ended, we repeat it frame by frame until it has ended. This is followed by smoothing the data. We perform ‘backward adjustments’ to smooth the data, looking for similarities in trajectory detection and trajectory paths over the sequence after collecting the data frame by frame.
Immediately after a frame is fed into the system, you are able to see the steps that are performed.
Detecting objects through a method
The first thing one notices when working with machine learning is that it is difficult to locate good labeled data. A popular object detector is loV3.
Cropping the frame and using the pre-trained net will result in disappointing results. Since accuracy is more important than speed, YOLO was used as the means of transmitting the original resolution image. It is possible to tell when a ball is near a player or referee by using this method.