Pianist fingers recognition

Can you imagine a computer camera which views hands of a pianist playing piano and recognizes which hands and fingers are playing which notes? - That's exactly what we showed is possible using Video Recognition techniques - with support from Piano Pedagogy Lab of the Univ. of Ottawa.

Introductory Video:

Three music pieces of increasing complexity are played. A camera observes from above and the following is performed by the C-MIDI program:

  • Piano and piano keys are automatically detected.
  • Pianist hands are detected and tracked.
  • The edges of the fingers are detected
  • The fingers and hands that pressed the keys are marked: in colour (hands), by number (fingers).

Motivation:

Video recognition for piano playing: New application

Video-conferencing (VC) for distant piano learning. A conventional session includes the transmission of a video image only. Video recognition technology allows one to transmit also the annotated video image.

From [1]: "Current music recording and transmitting technology allows teachers to teach piano remotely. This is in many cases the only way to teach music, especially in rural or distant areas where the ratio of piano teachers to piano students is extremely low. MIDI recording technology allows a teacher to play a piano at one place and to see a piano played by itself, as by an ``invisible teacher", at another place: the piano keys are pressed exactly at the same place, velocity and duration on a remote piano. However, to know how these keys were played by a teacher remains unknown. This includes the knowledge of which hand played a key, which finger was used, and who (in case of a four hand musical piece) was playing. With this technology this knowledge can now also be transmitted."

Video recognition for Piano playing recognition: test-bed

From [1]: "Recognition of hands and fingers using video, which is a very challenging video recognition problem, has been considered so far in the context of such applications as computer-human interaction, automatic sign language recognition, robotic hand posture learning, and multimedia. In all of these applications, the motion of the hand and fingers is limited to a predefined number of states, which often constitute a hand/finger gesture vocabulary that a computer vision system attempts to identify. Furthermore, in all of these applications hands and fingers are manipulated by humans in order to be detected, i.e. they are used to send visual commands or signs to either a computer or a human. Because of that the set of possible hand and finger configurations is such that it makes them easier to be visually distinguished from one another. In the case of detecting pianist fingers playing piano, the situation is very different. Pianists use hands to play music and therefore put all their attention on the acoustic quality that the motion of their hands produce, rather than on how they visually appear to a viewer. Therefore, pianist hand/finger motion can be considered as an example of non-collaborative and unbiased visual data, which can be used as a unique test-bed for hand/finger recognition algorithms."

Summary:

Video recognition problems tackled:

Piano playing applications benefited:

  • Piano detection
  • Piano keys recognition
  • Piano-midi calibration
  • Hand detection
  • Hand tracking
  • Finger detection
  • Distant and offline learning
  • Storing music pieces for a searchable databases
  • Producing music sheets.
  • Synthetic (score driven) hand/finger motion generation

Technology description and results:

Setups developed (click on the image to enlarge):

Video camera observes pianist hands from above as s/he plays and sends the video data to a computer, where they are processed in real-time and displayed back annotated: a) in home environment with Yamaha MIDI-keyboard and b,c) in a professional piano studio environment with MIDI-equipped grand piano.

Closer look at hand and finger detection and annotation: snapshots

  • Promising directions for future work: In the domain of music pedagogy: annotation of guitar and violin playing .
  • In the domain of video recognition: better finger detection by imposing additional constraints on the finger inter-relationship and their temporal coherency

When a MIDI signal is received, meaning that a piano key was pressed, the hand and finger which are believed to press the piano key are shown: hand is highlighted in red, the finger number is shown on top of the image.

Final output (Graphical User Interface) of C-MIDI video annotation program:

Shown are (in a clockwise order from the top): 1) the image captured by camera; 2) the computed background image of the keyboard which is used to detect hands as the foreground; 3) the binarized image used to detect the black keys of the piano keyboard which is also used for video-MIDI calibration; 4) the automatically detected piano keys (highlighted as white rectangles on the bottom right image), 5) the segmented blobs in the foreground images (coloured by the number of blobs detected in the bottom left image); 6) the final finger and hand detection results shown upside down, as camera sees it (on the top left), and in a vertically flipped for a convenient viewing by a pianist (bottom middle), where the label of the finger that played a key is shown on the top of the image; and 7) the result of the vision-based MIDI annotation (in a separate window at bottom right): each received MIDI event receives a visual label for the hand (either 1 or 2, i.e. left or right) and finger (either 1,2,3,4, or 5, counted from right to left) that played it. When the finger can not be determined, the annotation is omitted.

Publications & Presentations

[1] Talk for the University of Ottawa's Piano Pedagogy Lab's grand opening (click here), October 5, 2005.

[2] Dmitry Gorodnichy, Arjun Yogeswaran and Gilles Comeau. C-MIDI stands for MIDI that "sees" the notes. Video recognition of pianist hands and fingers. Toronto-Montreal Computer Vision Workshop, Ottawa, Canada, May 29-30, 2006. [ Poster ]

[3] Dmitry O. Gorodnichy and Arjun Yogeswaran. Detection and tracking of pianist hands and fingers. In Proc. of the Canadian conference Computer & Robot Vision (CRV'06), Quebec, Canada, June 7-9, 2006. [ PDF ]

Abstract: Current midi recording and transmitting technology allows teachers to teach piano playing remotely (or off-line): a teacher plays a midi-keyboard at one place and a student observes the played piano keys on another midi-keyboard at another place. What this technology does not allow is to see how the piano keys are played, namely: which hand and finger was used to play a key. In this paper we present a video recognition tool that makes it possible to provide this information. A video-camera is mounted on top of the piano keyboard and video recognition techniques are then used to calibrate piano image with midi sound, then to detect and track pianist hands and then to annotate the fingers that play the piano. The result of the obtained video annotation of piano playing can then be shown on a computer screen for further perusal by a piano teacher or a student.

Original idea of the above paper appeared in the following CVPR'06 submission that has been published as an internal technical report.

[4] Dmitry O. Gorodnichy and Arjun Yogeswaran. Pianist finger detection using the crevice detection operator. IIT-NRC Technical Report, March 2006.

Abstract: Recognition of fingers using video is a very challenging computer vision problem which does not have yet a good solution for a general hand motion. This paper addresses a specific case of hand motion - that of a piano player playing a piano as observed by a camera mounted on top of the piano, and proposes a technique which significantly facilitates detection and tracking of fingers for this type of hand motion. The technique is based on the assumption of the convexity of the finger shapes and the knowledge of the hand positions. In order to retrieve the hands positions, the deformable hand template tracking technique is developed which allows one to track the moving hands as they change their appearances and get occluded, while self-adjusting piano background detection and skin colour segmentation are used to detect hands prior to their tracking. The described approaches are incorporated into a program which can be used by piano teachers as a visual aid and which is also believed to be able to assist in automatic piano score writing.

Acknowledgements