The problem I’ve faced when trying to move machine learning to production is machine learning system usually takes time to compute or give a result. If you create an API (machine learning API) in super computer server or your API will be accessed by a very few people, you can ignore this problem but unfortunately I need this.
The first solution in my mind is to create an asynchronous API. What is it? The simplest explanation about asynchronous is from stackoverflow: You are cooking in a restaurant. An order comes in for eggs and toast.
- Synchronous: you cook the eggs, then you cook the toast.
- Asynchronous, single threaded: you start the eggs cooking and set a timer. You start the toast cooking, and set a timer. While they are both cooking, you clean the kitchen. When the timers go off you take the eggs off the heat and the toast out of the toaster and serve them.
- Asynchronous, multithreaded: you hire two more cooks, one to cook eggs and one to cook toast. Now you have the problem of coordinating the cooks so that they do not conflict with each other in the kitchen when sharing resources. And you have to pay them.
So, I need to make my API able to run many request at the same time. (Another explanation : from codewala)
How to implement it?
There’s plenty way to implement it. But I think I’ve found the simplest solution, especially if you’ve created an API using Flask: Using Gunicorn and Gevent. First, you should install Gunicorn and Gevent, you can install it via pip:
pip install gunicorn
pip install gevent
To run your API using gunicorn the only difference you should make is the command when you need to run the app. Usually if you have a Flask app you run it with a command below:
if you are using gunicorn then you should run it with a command below:
`myapp` is your file name, and `app` is your flask app variable. After the app is running you can access it on http://localhost:8000 (notice the default port is changing, if you’re using Flask, you run your app in port 5000 but if using gunicorn you run it in port 8000).
It’s not finish yet, when you run using the command above, you just run it synchronously. Using Gunicorn, you can run it asynchronously without change any line of your code! just run it with the command below:
gunicorn -k gevent -w 5 myapp:app
-k option makes gunicorn run the program asynchronously using gevent service (there are more services which provide async like eventlet). -w option makes gunicorn create workers with specified number (in this example 5) (other reference: Workers).
And Congratulation, you just run your flask app asynchronously! if you want to test your API you can use apache benchmark. it can access your app many times simultanously. The example command is below:
ab -n 200 -c 100 http://localhost:8000/myapp
If you see the result, you will see the difference in number of request per second between you run it sync and async.
Thank you 🙂 I hope this note is helpful for me or anybody.