This project is under an NDA between Telefónica, UPC and myself. This means that, unfortunately, although I can reveal what I've written in this description as the thesis abstract is public, I cannot publish the whole text, the slides I used in my defense or the code I wrote.
From October 2015 to June 2016 I worked at Telefónica Research on my Bachelor's Thesis, Speech Activity Detection: Application-Specific Tuning and Context-Based Neural Approaches. I was co-supervised by Jordi Luque, Research Scientist at Telefónica I+D, and Antonio Bonafonte, Associated Professor at UPC.
I worked on the task of speech activity detection, that is, the segmentation of an audio signal into speech and non-speech (e.g. silence, noise, music) fragments. The first half of the project was dedicated to a thorough application-specific benchmark of a baseline system, which included the development of a team-wide GPU cluster for model training and evaluation. In the second half, I developed a novel context-based approach to speech activity detection using both feedforward and recurrent neural networks that achieved a 70% error reduction compared to Telefónica's production model.
This was my first time working as an engineer/researcher in an industry environment, and as such it was an incredible learning experience in terms of working with a team on real-world scenarios. It felt great to develop something that is used in a production environment, and I was glad to make a meaningful contribution to the speech processing team at Telefónica Research in exchange for all the knowledge they gave me during my internship.