Techniques for detecting voice fundamental frequency in real environments

Universidad del Cauca
Ingeniera Electrónica y de Telecomunicaciones. Docente
email: mariasilva@unicauca.edu.co

Universidad del Cauca
Magíster en Electrónica y de Telecomunicaciones. Docente
email: mariasilva@unicauca.edu.co

Universidad del Cauca
Magíster en Electrónica y de Telecomunicaciones. Docente
email: mariasilva@unicauca.edu.co

Universidad del Cauca
Médico General
email: mariasilva@unicauca.edu.co
Introduction: This review article was prepared as part of a graduate thesis at Universidad del Cauca in 2017. It sought to find the most appropriate methods for detecting voice fundamental frequency to be implemented in real environments. This is part of a solution to improve the communication of people with hearing disabilities and include them in society, since most of the proposals only aim to improve the communication channel in which the hearing-impaired individual is the transmitter.
Methodology: An updated review of the literature was carried out, based mainly on scientific articles published in the last five years. For the inclusion of articles, a systematic mapping was performed on the different methods for detecting voice fundamental frequency.
Results: the phenomena considered by the various algorithms to define the environment range from noise and interference to reverberation; the performance of the algorithm depends on the quality of the recorded audio, which is observed in the variations obtained which depend on the database used; up to two different fundamental frequencies can be detected.
Conclusions: Novel methods have been implemented to make the detection of voice fundamental frequency more efficient; however, there is still much work to be done in this area.
D. R. Terry, L. Quynh y B. Hoang, “Moving forward with dignity: Exploring health awareness in an isola-ted Deaf community of Australia”, Disabil. Health J., vol. 9, n.º 2, pp. 281-288, 2016. [Online]. doi: http://dx.doi.org/10.1016/j.dhjo.2015.11.002
A. Iglesias, J. Jiménez, P. Revuelta y L. Moreno, “Avoiding communication barriers in the classroom: the apeinta project”, Interact. Learn. Environ., vol. 4820, n.º September, pp. 1-15, 2014. [Online]. doi: https://doi.org/10.1080/10494820.2014.924533
R. Perkins, T. Battle, J. Edgerton y J. Mcneill, “A Sur-vey of Barriers to Employment for Individuals Who Are Deaf ”, J. Am. Deaf. Rehabil. Assoc., vol. 49, n.º 1, pp. 66-85, 2015. [Online]. Disponible en http://repository.wcsu.edu/jadara/vol49/iss2/3/.
T. Starner, J. Auxier, D. Ashbrook y M. Gandy, “The Gesture Pendant: a self-illuminating, wearable, infrared computer vision system for home auto-mation control and medical monitoring”, Iswc, pp. 87–94, 2000. [Online]. doi: https://doi.org/10.1109/ISWC.2000.888469
J. Wang, “Magic Ring: A Self-contained Gesture Input Device on Finger”, Proc. 12th Int. Conf. MobUbiquitous Multimed., vol. 13, pp. 3-6, 2013. [Onli-ne]. doi: https://doi.org/10.1145/2541831.2541875
M. Wilhelm, D. Krakowczyk, F. Trollmann y S. Albayrak, “eRing: multiple finger gesture recog-nition with one ring using an electric field”, Pro-ceedings of the 2nd international Workshop on Sensor-based Activity Recognition and Interaction - woar ’15, 2015, pp. 1-6. [Online]. doi: https://doi.org/10.1145/2790044.2790047
S. R. Ghorpade and S. K. Waghamare, “Full Duplex Communication System for Deaf & Dumb People”, vol. 5, n.º 5, pp. 224-227, 2015. [Online]. Disponible en http://www.ijetae.com/files/Volume5Issue5/IJE-TAE_0515_38.pdf.
X. Chai, G. Li, Y. Lin, Z. Xu, Y. Tang y X. Chen, “Sign Language Recognition and Translation with Kinect”, en The 10th ieee International Conference on Auto-matic Face and Gesture Recognition, 2013, pp. 22-26. [Online]. Disponible en http://iip.ict.ac.cn/sites/default/files/publication/2013_FG_xjchai_Sign%20Language%20Recognition%20and%20Transla-tion%20with%20Kinect.pdf.
L. E. Potter, J. Araullo y L. Carter, “The Leap Motion controller”, Proceedings of the 25th Australian Com-puter-Human Interaction Conference on Augmenta-tion, Application, Innovation, Collaboration - Ozchi’13, 2013, pp. 175-178. [Online]. Disponible en http://dl.acm.org/citation.cfm?doid=2541016.2541072.
K. Petersen, R. Feldt, S. Mujtaba y M. Mattsson, “Systematic mapping studies in software enginee-ring”, 12th International Conference on Evaluation and Assessment in Software Engineering, pp. 68-77, 2008. [Online]. Disponible en http://dl.acm.org/ci-tation.cfm?id=2227123.
L. Sukhostat y Y. Imamverdiyev, “A Comparative Analysis of Pitch Detection Methods Under the In-fluence of Different Noise Conditions”, J. Vo i c e, vol. 29, n.º 4, pp. 410-417, 2014. [Online]. doi: http://dx.doi.org/10.1016/j.jvoice.2014.09.016
M. Mak y H. Yu, “A study of voice activity detection techni-ques for nist speaker recognition evaluations”, Comput. Speech Lang., vol. 28, n.º 1, pp. 295-313, 2014. [Online]. doi: http://dx.doi.org/10.1016/j.csl.2013.07.003
J. Li, L. Deng, Y. Gong y R. Haeb-Umbach, “An Over-view of Noise-Robust Automatic Speech Recogni-tion”, ieee/acm Trans. Audio, Speech, Lang. Process, vol. 22, n.º 4, pp. 745-777, Abr. 2014. [Online]. doi: https://doi.org/10.1109/TASLP.2014.2304637
B. Torres, Anatomía funcional de la voz, España: Pai-dotribo, 2008. [Online]. Disponible en http://www.medicinadelcant.com/cast/1.pdf
D. Talkin, W. B. Kleijn y K. K. Paliwal, “A Robust Algorithm for Pitch Tracking (rapt)”, Speech Coding and Synthesis, Netherlands: Elsevier, 1995. [Online]. Disponible en https://www.ee.columbia.edu/~dpwe/papers/Talkin95-rapt.pdf
R. Dosal González, “Producción de la voz y el habla. La fonación”, pp. 1-27, 2014. [Online]. Disponible en http://repositorio.unican.es/xmlui/bitstream/hand-le/10902/5583/DosalGonzalezR.pdf ?sequence=1
Clínica de Mayo, “Cuerdas vocales abiertas y cerradas - Mayo Clinic” [Online]. Disponible en http://www.mayoclinic.org/es-es/diseases-conditions/vo-cal-cord-paralysis/multimedia/vocal-cords-open-and-closed/img-20008069
C. M. Travieso, J. B. Alonso, J. R. Orozco-Arroya-ve, J. F. Vargas-Bonilla, E. Noth y A. Revelo-García, “Detection of Different Voice Diseases Based on the Nonlinear Characterization of Speech Signals”, Ex-pert Syst. Appl., vol. 82, pp. 184-195, 2017. [Online]. doi: https://doi.org/10.1016/j.eswa.2017.04.012
A. Kacha, C. Mertens, F. Grenez, S. Skodda y J. Schoentgen, “On the harmonic-to-noise ratio as an acoustic cue of vocal timbre of Parkinson speakers”, Biomed. Signal Process. Control, p. 7, 2016. [Online]. doi: http://dx.doi.org/10.1016/j.bspc.2016.09.004
A. Al-Nasheri et al., “An Investigation of Multidi-mensional Voice Program Parameters in Three Di-fferent Databases for Voice Pathology Detection and Classification”, J. Vo i c e, vol. 31, n.º 1, pp. 113-118, 2017. [Online]. doi: http://dx.doi.org/10.1016/j.jvoice.2016.03.019
G. Muhammad, G. Altuwaijri y M. Alsulaiman, “Au-tomatic Voice Pathology Detection and Classifica-tion Using Vocal Tract Area Irregularity”, ems ‘13 Proceedings of the 2013 European Modelling Sympo-sium, 2016, vol. 6, pp. 164-168. [Online]. doi: https://doi.org/10.1016/j.bbe.2016.01.004
G. Muhammad et al., “Pathology Detection Using Interlaced Derivative Pattern on Glottal Source Ex-citation”, Biomed. Signal Process. Control, vol. 31, pp. 156-164, 2017. [Online]. doi: http://dx.doi.or-g/10.1016/j.bspc.2016.08.002
W. De Armas, K. A. Mamun y T. Chau, “Vocal Fre-quency Estimation and Voicing State Prediction with Surface EMG Pattern Recognition”, speechCommun., vol. 63-64, pp. 15-26, 2014. [Online]. doi: http://dx.doi.org/10.1016/j.specom.2014.04.004
N. Vieira and P. H. Sansa, “Measurement of Sig-nal-to-Noise Ratio in Dysphonic Voices by Image Processing of Spectrograms”, vol. 62, pp. 17-32, 2014. [Online]. doi: https://doi.org/10.1016/j.spe-com.2014.04.001
H. Ghasemzadeh, M. Tajik y M. Khalil, “Detection of Vocal Disorders Based on Phase Space Parame-ters and Lyapunov Spectrum”, Biomed. Signal Process. Control, vol. 22, pp. 135-145, 2015. [Online]. doi: http://dx.doi.org/10.1016/j.bspc.2015.07.002
K. Selvan, A. Joseph y K. K. Anish Babu, “Speaker Recognition System for Security Applications”, ieeeRecent Advances in Intelligent Computational Sys-tems Speaker, 2013, pp. 1-5. [Online]. Disponible en http://ieeexplore.ieee.org/document/6745441/.
R. Achkar, M. El-halabi, E. Bassil, R. Fakhro y M. Kha-lil, “Voice Identity Finder Using the Back Propagation Algorithm of an Artificial Neural Network”, Procedia - Procedia Comput. Sci., vol. 95, pp. 245-252, 2016. [Onli-ne]. doi: http://dx.doi.org/10.1016/j.procs.2016.09.322
S. Adibi, “Telematics and Informatics A low over-head scaled equalized harmonic-based voice authentication system”, Telemat. Informatics, vol. 31, n.º 1, pp. 137-152, 2014. [Online]. doi: http://dx.doi.org/10.1016/j.tele.2013.02.004.
E. San Segundo, A. Tsanas y P. Gómez-vilda, “Eucli-dean Distances as measures of speaker similarity in-cluding identical twin pairs : A forensic investigation using source and fi lter voice characteristics”, Foren-sic Sci. Int., vol. 270, pp. 25-38, 2017. [Online]. doi: http://dx.doi.org/10.1016/j.forsciint.2016.11.020.
J. H. Ahnn, “Scalable Big Data Computing for the Personalization of Machine Learned Models and its Application to Automatic Speech Recognition Servi-ce”, in ieee International Conference on Big Data (Big Data), 2014, pp. 1-8. [Online]. Disponible en http://ieeexplore.ieee.org/abstract/document/7004349/.
J. Wagner, F. Lingenfelser, T. Baur, I. Damian, F. Kistler y E. André, “The Social Signal Interpretation (ssi) Framework Multimodal Signal Processing and Recognition in Real-Time”, Proceedings of the 21st acm International Conference on Multimedia. pp. 21-25, 2013. [Online]. Disponible en http://dl.acm.org/citation.cfm?id=2502223.
P. P. Dahake, K. Shaw y P. Malathi, “Speaker Depen-dent Speech Emotion Recognition using mfcc and Support Vector Machine”, International Conference on Automatic Control and Dynamic Optimization Te-chniques, 2016, pp. 1080-1084. [Online]. Disponible en http://ieeexplore.ieee.org/document/7877753/.
E. André and T. Vogt, “Improving Automatic Emo-tion Recognition from Speech via Gender Diffe-rentiation”, Language Resources and Evaluation Conference, 2006. pp. 1-6. [Online]. Disponible en https://www.informatik.uni-augsburg.de/lehrstue-hle/hcm/publications/2006-LREC/.
R. Chakraborty, M. Pandharipande y S. Kopparapu, “Event Based Emotion Recognition for Realistic Non·Acted Speech”, tencon 2015 - 2015 ieee Reg. 10 Conf., 2015. pp. 1-5. [Online]. Disponible en http://ieeexplore.ieee.org/document/7372953/?reload=-true&arnumber=7372953.
E. D’Arca, N. M. Robertson y J. Hopgood, “Using the Voice Spectrum for Improved Tracking of People in a Joint Audio-Video Scheme”, en ieee International Conference on Acoustics, Speech and Signal Proces-sing, 2013, pp. 3622-3626. [Online]. Disponible en http://ieeexplore.ieee.org/document/6638333/.
S. H. Mohammadi and A. Kain, “An Overview of Voice Conversion Systems”, Speech Commun., vol. 88, pp. 65-82, 2017. [Online]. doi: http://dx.doi.or-g/10.1016/j.specom.2017.01.008
A. V Savchenko and L. V Savchenko, “Towards the Creation of Reliable Voice Control System Based on a Fuzzy Approach”, Pattern Recognit. Lett., vol. 65, pp. 145-151, 2015. [Online]. doi: http://dx.doi.org/10.1016/j.patrec.2015.07.013
J. Gatti, C. Fonda, L. Tenze y E. Canessa, “Voice-Con-trolled Artificial Handspeak System”, International Journal of Artificial Intelligence & Applications, vol. 5, n.º 1, pp. 107-112, 2014. [Online]. doi: https://doi.org/10.5121/ijaia.2014.5108
C. G. Le Prell and O. H. Clavier, “Effects of noise on speech recognition : Challenges for communication by service members Hearing in Noise Test Speech Perception in Noise”, Hear. Res., vol. 349, pp. 76-89, 2016. [Online]. doi: http://dx.doi.org/10.1016/j.hea-res.2016.10.004
W. J. Hess, “Pitch and Voicing Determination of Speech with an Extension Toward Music Signals”, Springer Handbook of Speech Processing, Springer, 2008, pp. 181-212. [Online]. Disponible en http://link.springer.com/10.1007/978-3-540-49127-9_10.
K. Wu, D. Zhang y G. Lu, “ipeeh : Improving pitch estimation by enhancing harmonics”, Expert Syst. Appl., vol. 64, pp. 317-329, 2016. [Online]. doi: http://dx.doi.org/10.1016/j.eswa.2016.08.018
L. N. Tan y A. Alwan, “Multi-Band Summary Co-rrelogram-Based Pitch Detection for Noisy Speech”, Speech Commun., vol. 55, n.º 7-8, pp. 841-856, 2013. [Online]. doi: http://dx.doi.org/10.1016/j.spe-com.2013.03.001
R. Rajan y H. A. Murthy, “Two-Pitch Tracking in Co-Channel Speech Using Modified Group Delay Functions”, Speech Commun., vol. 89, pp. 37-46, 2017. [Online]. doi: http://dx.doi.org/10.1016/j.spe-com.2017.02.004.
J. Zeremdini, M. Anouar, B. Messaoud y A. Bouzid, “Multiple Comb Filters and Autocorrelation of the Multi-Scale Product for Multi-Pitch Estimation”, Appl. Acoust., vol. 120, pp. 45-53, 2017. [Online]. doi: http://dx.doi.org/10.1016/j.apacoust.2017.01.013
G. Zhang y S. Godsill, “Tracking Pitch Period Using Particle Filters”, in ieee Workshop on Applications of Signal Processing to Audio and Acoustics, 2013, pp. 1-4. [Online]. Disponible en http://ieeexplore.ieee.org/document/6701846/.
J. ie Lin, G. Zhang, B. Fu y Y. Hao, “Multipitch Tracking With Continuous Correlation Feature and Hybrid dbns / hmm Model”, 11th International Computer Conference on Wavelet Actiev Media Te-chnology and Information Processing(iccwamtip), 2014, pp. 218-221. [Online]. Disponible en http://ieeexplore.ieee.org/document/7073394/.
G. Naganjaneyulu, M. V. Ramana y A. Nara-simhadhan, “A Novel Method for Pitch Detection via Instantaneous Frequency Estimation using Polynomial Chirplet transform”, ieee Region 10 Conference (tencon), 2016, n.º 2, pp. 1250-1253. [Online]. Disponible en http://ieeexplore.ieee.org/document/7848211/.
K. Kaminski, E. Majda y A. Dobrowolski, “Auto-matic speaker recognition using a unique personal feature vector and Gaussian Mixture Models”, in Signal Processing: Algorithms, Architectures, Arran-gements y Applications (spa), 2013, pp. 220-225. [Online]. Disponible en http://ieeexplore.ieee.org/document/6710629/.
F. Huang, T. Lee, W. B. Kleijn y Y. Kong, “A Me-thod of Speech Periodicity Enhancement Using Transform-Domain Signal Decomposition”, Speech Commun., vol. 67, pp. 102-112, 2015. [Online]. doi: http://dx.doi.org/10.1016/j.specom.2014.12.001.
Z. Wang and J. DeLiang, “A Multipitch Tracking Algorithm for Noisy and Reverberant Speech”, ieee International Conference on Acoustics, Speech and Signal Processing, 2010, pp. 4218-4221. [Onli-ne]. Disponible en http://ieeexplore.ieee.org/docu-ment/5495702/.
K. Akant and S. Limaye, “Pitch contour extraction of singing voice in polyphonic recordings of Indian classical music”, International Conference on Electro-nic Systems, Signal Processing and Computing Tech-nologies, 2014, pp. 123-128. [Online]. Disponible en http://ieeexplore.ieee.org/document/6745358/.
W. Yao-qi, W. Xiao-peng, L. Tao y L. Wei-wei, “Pitch Detection Method Based on Morphological Filte-ring and hht”,J. China Railw. Soc., 2014. pp. 56-61. [Online]. Disponible en http://en.cnki.com.cn/Arti-cle_en/CJFDTOTAL-TDXB201407010.htm.
Copyright (c) 2017 Ingeniaría Solidaria

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
Cession of rights and ethical commitment
As the author of the article, I declare that is an original unpublished work exclusively created by me, that it has not been submitted for simultaneous evaluation by another publication and that there is no impediment of any kind for concession of the rights provided for in this contract.
In this sense, I am committed to await the result of the evaluation by the journal Ingeniería Solidaría before considering its submission to another medium; in case the response by that publication is positive, additionally, I am committed to respond for any action involving claims, plagiarism or any other kind of claim that could be made by third parties.
At the same time, as the author or co-author, I declare that I am completely in agreement with the conditions presented in this work and that I cede all patrimonial rights, in other words, regarding reproduction, public communication, distribution, dissemination, transformation, making it available and all forms of exploitation of the work using any medium or procedure, during the term of the legal protection of the work and in every country in the world, to the Universidad Cooperativa de Colombia Press.