blog.dbrgn.ch

Heroku Buildpack for Numpy, Scipy and Scikit-Learn

written on Tuesday, June 18, 2013 by

(TLDR: https://github.com/dbrgn/heroku-buildpack-python-sklearn)

Background

At Webrepublic we just launched a Python based system that among other things does comparison of large texts using tf-idf vectors in a multi-dimensional vector space and measuring the cosine similarity between them (see http://stackoverflow.com/a/8897648/284318). For this, we needed scikit-learn.

During the deployment process, I discovered that one does not simply deploy scikit-learn on Heroku. There were different issues with it. First of all, Scipy needs Numpy to be available at setup.py parse time. If you just install Numpy and Scipy using requirements.txt, Numpy won't yet be installed at the time the Scipy setup.py is processed. (Note that this has been fixed in current versions of Scipy.)

Another issue was that a Fortran compiler and different libraries are needed to build Scipy, all of which are not available on Heroku.

Problem Solving Attempts

The first thing I found while looking for a solution was wyn/heroku-buildpack-python, but I couldn't quite get it to work. The second thing I found was ToonTimbermont/heroku-buildpack-python, a fork of wyn's fork that solves some of the issues.

(I also played around with Kenneth Reitz's anaconda buildpack, but didn't really get it to work the way I wanted it.)

By combining the work of both developers, using the precompiled binaries by wyn and adding some code I managed to rebase all the changes on top of Heroku's current buildpack. This solves some issues/bugs with older versions of Pip.

Another change I made was that the dependencies can be stated in requirements.txt as usual, instead of requiring a setup.py file.

You can find the buildpack at https://github.com/dbrgn/heroku-buildpack-python-sklearn. All the changes against the official Heroku buildpack at the time of this writing have been condensed in a single commit.

Usage

The process to use the buildpack is as straightforward as with any other buildpack. For a new app:

heroku create --buildpack https://github.com/dbrgn/heroku-buildpack-python-sklearn/

For an existing app:

heroku config:set BUILDPACK_URL=https://github.com/dbrgn/heroku-buildpack-python-sklearn/

If you have any questions or problems, feel free to leave a comment or open an issue on Github.

Update 8/4/13: It is important that you only use versions of Numpy and Scipy that are available as precompiled binaries. For list of available versions, see the npscipy-binaries repo.

This entry was tagged heroku and python