EVENT - Tuesday
Location120 Holborn, London EC1N 2TD, UK
When6:00 PM - 9:00 PM
DSF Startup Showcase with Secret Escapes
Join Data Science Festival – London in partnership with Secret Escapes. June 11th, we will be featuring 6 new and upcoming companies at our Start-up Showcase. Come and hear how these new companies use DS to solve real work problems, the issues their teams have encountered and also the mistakes and success that you should look for when you are starting your own projects.
Due to the popularity of Data Science Festival events, we are now allocating event tickets via a random ballot. Registering here enters you into the ticket ballot for the Data Science Festival Event at Secret Escapes on June 11th 2019, the ballot will be drawn on the 4th June 2019. Those randomly selected will then be e-mailed a Universe ticket for the event, with the joining details.
If you get an allocated Universe ticket, please bring a copy of your paper ticket or your ticket on your phone to the event to check in with your QR code. Tickets are non-transferable.
Please click here to apply for a ticket: GET TICKETS
6.00pm doors open
6.30-7:15pm talk – 3 Short Sharp Lightning Talks
7:15-7:45 pm – Refreshments
7:45-8:30pm talk – 3 Short Sharp Lightning Talks
8:30-9.00pm – Close
Talk 1 – Secret Escapes challenges scaling Airflow to running hundreds of dynamically generated DAGs
The jobs in our data pipeline are either self describing or built dynamically off of config files. As Airflow DAGs are simple Python objects we thought an elegant solution that’s loosely coupled with Airflow would be to generate DAGs dynamically based on our job config files/metadata. We’ve now got a couple hundred jobs with this implementation and we’re seeing performance issues where the scheduler is slow to assign new work. We anticipate hundreds if not thousands of jobs running in production when we reach maturity so tackling the high risk questions of “Can Airflow scale?” and “Can Airflow scale micro batch pipelines with jobs running every X mins?” is a priority for us to answer. We’re therefore taking an experiment driven approach to tackling the question. We’d like to share that journey and our findings with the community.