TY - GEN
T1 - Semi-automated development of a dataset for baseball pitch type recognition
AU - Siegler, Dylan
AU - Chen, Reed
AU - Fasko, Michael
AU - Yang, Shunkun
AU - Luo, Xiong
AU - Zhao, Wenbing
PY - 2019/1/1
Y1 - 2019/1/1
N2 - In this paper, we report our work on developing a new dataset for baseball pitch type recognition based on youtube videos of the US Major League Baseball games. The core innovation is a largely automated procedure to extract relevant clips from the full game, and automatically label the clips by aligning the infographic information included in the broadcast and the PitchF/X data. We adopted the Needleman-Wunsch algorithm to address the challenges imposed by the aligning the two streams of data based on pitch speed, i.e., minimize gaps and mismatches between the two streams. Manual inspection is used only to select games that include infographic information for clip extraction and to remove erroneous clips for improve the quality of the dataset.
AB - In this paper, we report our work on developing a new dataset for baseball pitch type recognition based on youtube videos of the US Major League Baseball games. The core innovation is a largely automated procedure to extract relevant clips from the full game, and automatically label the clips by aligning the infographic information included in the broadcast and the PitchF/X data. We adopted the Needleman-Wunsch algorithm to address the challenges imposed by the aligning the two streams of data based on pitch speed, i.e., minimize gaps and mismatches between the two streams. Manual inspection is used only to select games that include infographic information for clip extraction and to remove erroneous clips for improve the quality of the dataset.
KW - Dataset
KW - Needleman-Wunsch algorithm
KW - Pitch type
KW - PitchF/X
KW - Video-based human activity recognition
UR - https://www.scopus.com/inward/record.uri?partnerID=HzOxMe3b&scp=85076899015&origin=inward
UR - https://www.scopus.com/inward/citedby.uri?partnerID=HzOxMe3b&scp=85076899015&origin=inward
U2 - 10.1007/978-981-15-1925-3_25
DO - 10.1007/978-981-15-1925-3_25
M3 - Conference contribution
SN - 9789811519246
VL - 1138 CCIS
T3 - Communications in Computer and Information Science
SP - 345
EP - 359
BT - Communications in Computer and Information Science
A2 - Ning, Huansheng
PB - Springer
CY - che
T2 - 3rd International Conference on Cyberspace Data and Intelligence, Cyber DI 2019, and the International Conference on Cyber-Living, Cyber-Syndrome, and Cyber-Health, CyberLife 2019
Y2 - 16 December 2019 through 18 December 2019
ER -