Data Science at Gousto Supply
At Gousto we work in a tribe model, with each tribe aligned to a different area of the business. Within a tribe we work cross-functionally – data scientists are embedded in or across the tribe’s squads, alongside software engineers, product managers, and product analysts. The Supply tribe is responsible for Gousto’s supply chain – we take care of the journey a box makes from its creation through to its delivery.
As data scientists in the Supply tribe, we primarily concern ourselves with optimising throughput, whether that’s through routing orders optimally for the next few hours, or through designing our factories optimally for the years ahead.
I am going to take you through how Data Science and Operational Research helps us to route orders down our production lines every day.
Order routing at Gousto
Gousto’s factory (soon to be plural) is where we put together our boxes before they are shipped out to the customer. Central to this operation is a production line, down which boxes are sent to pick up the order’s items, known as Stock Keeping Units (SKUs). The production line consists of a central conveyor flanked by pick stations, with each station containing a subset of our total SKU library. At each station a human picker will put some of the relevant SKUs into a box, before it moves back onto the central conveyor and on to the next required station.
The hard work of our operational colleagues, combined with the intelligence of our algorithms, gives us excellent pick rates in our factory. Our Pick-Face Optimisation (PFO) algorithm is one of our most important. PFO helps us define which SKUs to put on which pick stations. As you might expect, we want to put the most popular SKUs on as many stations as possible, with the less popular ones represented less – if we didn’t do this we could have a large number of orders queuing to get into the single station with a popular SKU such as garlic.
While PFO provides the setup for our production line, it is our Order Routing Algorithm (ORA) that decides exactly which stations an order will visit to collect its SKUs, and this is the algorithm that I want to give some more detail on. With many orders flowing down our production lines each day, it is important to have a robust automated solution.
At Gousto, the goal of all our data scientists is to have a meaningful impact on the business, rather than just creating complex algorithms for the sake of it, although of course sometimes these two things overlap. Before I dive into the algorithm, it is worth discussing what we are actually trying to achieve with our order routing. In Gousto Supply there are several low level metrics we care about, all relating to higher level metrics such as our operating cost, customer satisfaction, and capacity.
ORA affects our throughput – the number of boxes we can successfully get down the line in an hour, which in turn affects our capacity – the number of orders we can fulfil in a given menu week. Measuring throughput impact as an evaluation metric can be challenging since it is affected by many different decisions that sit outside of ORA, such as how many pickers are on the line or how many unique SKUs go into our week’s recipes. Because of this, we decided to optimise against proxy metrics when developing ORA, as a surrogate for the actual line throughput. Examples of representative proxy metrics include how many stations a box visits, how much time a box spends at each station or the spread of boxes across the pick-line at any given time.
Our algorithm helps balance station workload across whole order batches
With these two objectives in mind I will describe the algorithm in more depth. First, every station on the production line is given a workload cost, initialised to one. Each order in an order batch is then processed one-by-one.
For each order we solve a discrete optimisation problem that minimises the total cost of routing based on the current station workload costs, while ensuring that the chosen stations to visit contain all the necessary SKUs between them.
When our algorithm has decided which stations to visit, it uses a custom heuristic (an approximate, practical method that is not guaranteed to be optimal due to the complexity of the problem) to decide which SKUs to get from which of the allocated stations, then uses this information to update the workload costs for each of the visited stations. In the example here the order has more work assigned at the second station it visits compared to the first and third, hence the higher workload cost associated. The workload costs are then carried over for the next order to optimise against.
We run through this process iteratively for all orders, slowly building up the workload cost across all the stations, allowing us to maintain the balance of work.
The most interesting aspect of the ORA algorithm is the iterative optimisation approach that decomposes the problem into a large number of subproblems. At this point you might be wondering why we didn’t just try and optimise every order’s station visits and picks in a single huge optimisation model – after all, wouldn’t that give us a provably optimal solution?
This was how our first iteration of ORA worked, using a Genetic Algorithm to optimise the whole order batch at once. As practitioners of discrete optimisation will know, a general rule of thumb is that the complexity of an optimisation problem increases exponentially with the size of the model, due to the combinatorial nature of these problems. With a large number of orders in a batch, 20 stations, and 40+ SKUs in an order, a model of this size may never return a solution in any reasonable timeframe.
By decomposing the problem and routing each order individually we get around this size issue nicely, since each single order optimisation problem is small. Iteratively building up the station workload cost distribution allows us to carry over information from all of the previous single order optimisations over the whole algorithm run.
This iterative optimisation approach has several desirable characteristics:
It is fast. Really fast. Since each single order routing optimisation is a tiny problem, they can be solved incredibly quickly, allowing us to route large batches of orders in less than a minute.
We can scale it up – you may have realised that the runtime of the algorithm is linear in the number of orders, meaning it can scale upwards as Gousto grows.
We can scale it down – since each order is optimised individually, in the future we could route orders one at a time based on real-time information from the production lines.
These characteristics have empowered our Operations teams to route orders quickly and efficiently.
The future of order routing
While we are proud of our efforts so far, there is still so much more to do in the order routing domain. These can broadly be broken down into two categories.
Measurement and experimentation
As mentioned previously, accurately measuring how new algorithm developments affect throughput is a tricky task, given all the external operational decisions that also impact it – this in turn makes it challenging to run robust experiments on these developments. In the future we hope to develop better models of how our algorithms and other decisions affect the throughput of the production line, allowing Gousto to become even more data driven. We’re also exploring the creation of a simulation of our production line, unlocking the possibility of simulated testing of new algorithms.
Being able to model and measure the impact of our algorithms will set us up nicely to make further improvements. Our newer factories will have the ability to surface real-time production line data to ORA. We believe that this ability, combined with the simulation capability described above, will be a game changer that allows us to move towards automated real-time optimisation of order routing.
At Gousto we’re incredibly excited about our future plans for order routing. If you share our enthusiasm then you will be happy to hear that we are recruiting for data scientist, data engineering and product analyst roles to help drive this work forward in the future.
Hungry for more? Gousto are running a Keynote session at the DSF 2020. Join them on November 24th at 6:30 PM GMT for A tasty byte: how data science is driving customer choice without sacrificing supply chain efficiency. A talk by Irene Iriarte Carretero, Rita Figueiredo, Tom Shea and Niclas Thomas. Click here for tickets!