Back to schedule


1:00 PM - 2:00 PM

DSF Lunch & Learn – ML_{doom}: A data science journey into the belly of the rhyming beast

1:00 pm-
2:00 pm
June 11th , 2020, Thursday

Please join us for Session #4 in our Webinar Series DSF Lunch & Learn. Every Thursday from 1-2 PM join DSF to hear from one of our weekly featured speakers. Grab some lunch and join DSF globally as we launch this new series.

Join us virtually in partnership with Ian Ashmore, Senior Data Scientist at Cap-HPI and Charlie Tapsell, Data Scientist at Cap-HPI.

We aim to share, inspire & bring the data community together to build your industry network & have some exciting interactions with your fellow peers.

Ticket Allocation Process:
Registering here guarantees you a ticket for the Data Science Festival Event with Ian and Charlie on June 11th 2020. Once registered you will be sent your Zoom Link via email.

Registration Link



1.00pm: Intro with David Loughlan – Founder of DSF

1.05pm: Ian Ashmore and Charlie Tapsell

Talk Title:ML_{doom}: A data science journey into the belly of the rhyming beast

Summary: In this talk we introduce our furlough project, and the first steps towards creating an algorithm capable of winning a rap battle. While solutions exist to generate poetry and other basic rhymes, simulation of more complex hip-hop styles has remained elusive. Our work first sought to better understand these limitations, by re-imagining the basic form of the rhyme data: As phonemes instead of words. The phonetic representation of any word (funny Greek looking symbols) is an instruction on how to properly pronounce it. Such an approach allows for similarly spelled (but differently pronounced) words to become separate, (e.g. through does NOT rhyme with though, but it DOES rhyme with crew, few, drew etc. since they all end with the phoneme ‘uː’, which is an ‘ew’ sound). Furthermore, modern rapping often rhymes at the syllable rather than word level, (e.g. allowing ‘tiramisu’ to rhyme with ’terror miss you’). This led to the creation of a training corpus, consisting of tens of thousands of rap & hip-hop songs, with each lyric being reduced to its constituent phonemes, and grouped into syllables. 
The first step was to obtain a phonetic representation for every word. We obtained a dictionary of ~50,000 unique English words and their phonetic representations and used that data to develop a phoneme predictor for all the plural/tense-specific/slang/obscene words used in rap songs. This took the form of a recurrent neural network with bidirectional LSTM layers, and a time distributed output layer (for predicting sequences). After significant experimentation with the training set, encoding/decoding strategies and network architecture modifications, we arrived at a trained network with ~97.5% accuracy. We are satisfied with this since the English language contains certain ambiguities that are a product of historical usage and regional accents. 
Our results are promising and pave the way for future work using text-to-speech technologies and lyric generation, thus having a valuable place in our larger on-going project; aiming to create a NN that can generate rhymes that retain thematic continuity, adhere to a particular rhyming pattern and to do so in real-time (potentially against a human opponent). We believe that representing the song lyrics as syllables, expressed as sequences of phonemes, will allow us to limit the future work to generating sentences that rhyme, make sense, and include a response to the incoming data.

1.40pm: Community Q&A

2.00pm: Close

Charlie Tapsell
Bio: Charlie is a physics graduate (BSc from Lancaster, MSc with distinction from Leeds). Most interesting modules included Cosmology, Quantum Computing and Astrophysical Fluid Dynamics (mega space explosions). MSc project centred around the formation of high-mass stars. President of the University of Leeds Science Magazine. Designed the logo for the…
Ian Ashmore
Bio: Ian Ashmore is Senior Data Scientist at Cap-HPI. He earned his PhD. from the University of Leeds in theoretical astrophysics (magnetohydrodynamics) and prior to that had become the first physics undergraduate at Leeds to have their master’s project published in reputable academic journals. In a previous life, he was…