Title: Secret Escapes challenges scaling Airflow to running hundreds of dynamically generated DAGs.
Summary: The jobs in our data pipeline are either self-describing or built dynamically off of config files. As Airflow DAGs are simple Python objects we thought an elegant solution that’s loosely coupled with Airflow would be to generate DAGs dynamically based on our job config files/metadata. We’ve now got a couple hundred jobs with this implementation and we’re seeing performance issues where the scheduler is slow to assign new work. We anticipate hundreds if not thousands of jobs running in production when we reach maturity so tackling the high-risk questions of “Can Airflow scale?” and “Can Airflow scale micro batch pipelines with jobs running every X mins?” is a priority for us to answer. We’re therefore taking an experiment driven approach to tackling the question. We’d like to share that journey and our findings with the community.