Installing Python, Keras, and Tensorflow from source

I found myself in the interesting position recently of needing to compile Python from source. The reasoning behind this is complicated, but it boils down to a need to use Python with Tensorflow / Keras for some natural language processing AI, as Tensorflow.js isn't going to cut it for the next stage of my PhD.

The target upon which I'm aiming to be running things currently is Viper, my University's high-performance computer (HPC). Unfortunately, the version of Python on said HPC is rather old, which necessitated obtaining a later version. Since I obviously don't have sudo permissions on Viper, I couldn't use the default system package manager. Incredibly, pre-compiled Python binaries are not distributed for Linux either, which meant that I ended up compiling from source.

I am going to be assuming that you have a directory at $HOME/software in which we will be working. In there, there should be a number of subdirectories:

bin: For binaries, already added to your PATH
lib: For library files - we'll be configuring this correctly in this guide
repos: For git repositories we clone

Make sure you have your snacks - this was a long ride to figure out and write - and it's an equally long ride to follow. I recommend reading this all the way through before actually executing anything to get an overall idea as to the process you'll be following and the assumptions I've made to keep this post a reasonable length.

Setting up

Before we begin, we need some dependencies:

gcc - The compiler
git - For checking out the cpython git repository
readline - An optional dependency of cpython (presumably for the REPL)

On Viper, we can load these like so:

module load utilities/multi
module load gcc/10.2.0
module load readline/7.0

Compiling openssl

We also need to clone the openssl git repo and build it from source:

cd ~/software/repos
git clone git://git.openssl.org/openssl.git;    # Clone the git repo
cd openssl;                                     # cd into it
git checkout OpenSSL_1_1_1-stable;              # Checkout the latest stable branch (do git branch -a to list all branches; Python will complain at you during build if you choose the wrong one and tell you what versions it supports)
./config;                                       # Configure openssl ready for compilation
make -j "$(nproc)"                              # Build openssl

With openssl compiled, we need to copy the resulting binaries to our ~/software/lib directory:

cp lib*.so* ~/software/lib;
# We're done, cd back to the parent directory
cd ..;

To finish up openssl, we need to update some environment variables to let the C++ compiler and linker know about it, but we'll talk about those after dealing with another dependency that Python requires.

Compiling libffi

libffi is another dependency of Python that's needed if you want to use Tensorflow. To start, go to the libgffi GitHub releases page in your web browser, and copy the URL for the latest release file. It should look something like this:

https://github.com/libffi/libffi/releases/download/v3.3/libffi-3.3.tar.gz

Then, download it to the target system:

cd ~/software/lib
curl -OL URL_HERE

Note that we do it this way, because otherwise we'd have to run the autogen.sh script which requires yet more dependencies that you're unlikely to have installed.

Then extract it and delete the tar.gz file:

tar -xzf libffi-3.3.tar.gz
rm libffi-3.3.tar.gz

Now, we can configure and compile it:

./configure --prefix=$HOME/software
make -j "$(nproc)"

Before we install it, we need to create a quick alias:

cd ~/software;
ln -s lib lib64;
cd -;

libffi for some reason likes to install to the lib64 directory, rather than our pre-existing lib directory, so creating an alias makes it so that it installs to the right place.

Updating the environment

Now that we've dealt with the dependencies, we now need to update our environment so that the compiler knows where to find them. Do that like so:

export LD_LIBRARY_PATH="$HOME/software/lib:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}";
export LDFLAGS="-L$HOME/software/lib -L$HOME/software/include $LDFLAGS";
export CPPFLAGS="-I$HOME/software/include -I$HOME/software/repos/openssl/include -I$HOME/software/repos/openssl/include/openssl $CPPFLAGS"

It is also advisable to update your ~/.bashrc with these settings, as you may need to come back and recompile a different version of Python in the future.

Personally, I have a file at ~/software/setup.sh which I run with source $HOME/software/setuop.sh in my ~/.bashrc file to keep things neat and tidy.

Compiling Python

Now that we have openssl and libffi compiled, we can turn our attention to Python. First, clone the cpython git repo:

git clone https://github.com/python/cpython.git
cd cpython;

Then, checkout the latest tag. This essentially checks out the latest stable release:

git checkout "$(git tag | grep -ivP '[ab]|rc' | tail -n1)"

Important: If you're intention is to use tensorflow, check the Tensorflow Install page for supported Python versions. It's probable that it doesn't yet support the latest version of Python, so you might need to checkout a different tag here. For some reason, Python is really bad at propagating new versions out to the community quickly.

Before we can start the compilation process, we need to configure it. We're going for performance, so execute the configure script like so:

./configure --with-lto --enable-optimizations --with-openssl=/absolute/path/to/openssl_repo_dir

Replace /absolute/path/to/openssl_repo with the absolute path to the above openssl repo.

Now, we're ready to compile Python. Do that like so:

make -j "$(nproc)"

This will take a while, but once it's done it should have built Python successfully. For a sanity check, we can also test it like so:

make -j "$(nproc)" test

The Python binary compiled should be called simply python, and be located in the root of the git repository. Now that we've compiled it, we need to make a few tweaks to ensure that our shell uses our newly compiled version by default and not the older version from the host system. Personally, I keep my ~/bin folder under version control, so I install host-specific to ~/software, and put ~/software/bin in my PATH like so:

export PATH=$HOME/software/bin

With this in mind, we need to create some symbolic links in ~/software/bin that point to our new Python installation:

cd $HOME/software/bin;
ln -s relative/path/to/python_binary python
ln -s relative/path/to/python_binary python3
ln -s relative/path/to/python_binary python3.9

Replace relative/path/to/python_binary with the relative path tot he Python binary we compiled above.

To finish up the Python installation, we need to get pip up and running, the Python package manager. We can do this using the inbuilt ensurepip module, which can bootstrap a pip installation for us:

python -m ensurepip --user

This bootstraps pip into our local user directory. This is probably what you want, since if you try and install directly the shebang incorrectly points to the system's version of Python, which doesn't exist.

Then, update your ~/.bash_aliases and add the following:

export LD_LIBRARY_PATH=/absolute/path/to/openssl_repo_dir/lib:$LD_LIBRARY_PATH;
alias pip='python -m pip'
alias pip3='python -m pip'

...replacing /absolute/path/to/openssl_repo_dir with the path to the openssl git repo we cloned earlier.

The next stage is to use virtualenv to locally install our Python packages that we want to use for our project. This is good practice, because it keeps our dependencies locally installed to a single project, so they don't clash with different versions in other projects.

Before we can use virtualenv though, we have to install it:

pip install virtualenv

Unfortunately, Python / pip is not very clever at detecting the actual Python installation location, so in order to actually use virtualenv, we have to use a wrapper script - because the [shebang]() in the main ~/.local/bin/virtualenv entrypoint does not use /usr/bin/env to auto-detect the python binary location. Save the following to ~/software/bin (or any other location that's in your PATH ahead of ~/.local/bin):

#!/usr/bin/env bash

exec python ~/.local/bin/virtualenv "$@"

For example:

# Write the script to disk
nano ~/software/bin/virtualenv;
# chmod it to make it executable
chmod +x ~/software/bin/virtualenv

Installing Keras and tensorflow-gpu

With all that out of the way, we can finally use virtualenv to install Keras and tensorflow-gpu. Let's create a new directory and create a virtual environment to install our packages in:

mkdir tensorflow-test
cd tensorflow-test;
virtualenv "$PWD";
source bin/activate;

Now, we can install Tensorflow & Keras:

pip install tensorflow-gpu

It's worth noting here that Keras is a dependency of Tensorflow.

Tensorflow has a number of alternate package names you might want to install instead depending on your situation:

tensorflow: Stable tensorflow without GPU support - i.e. it runs on the CPU instead.
tf-nightly-gpu: Nightly tensorflow for the GPU. Useful if your version of Python is newer than the version of Python supported by Tensorflow

Once you're done in the virtual environment, exit it like this:

deactivate

Phew, that was a huge amount of work! Hopefully this sheds some light on the maddenly complicated process of compiling Python from source. If you run into issues, you're welcome to comment below and I'll try to help you out - but you might be better off asking the Python community instead, as they've likely got more experience with Python than I have.

Type this	To get this	Notes
`bold text`	bold text	-
`_italics text_`	italics text	-
`~~deleted text~~`	~~deleted~~	-
`code text`	`code text`	Inserts some monospaced code. It is preferred that large blocks of code are linked to using a service such as Pastebin, Github Gists or Ideone.
`> Quote`	Quote	-
`[display text](//google.com)`	display text	Inserts a hyperlink. Please use responsibly. `[rel=nofollow]` is in use and spam will be deleted.
`---`		Inserts a horizontal line. The previous line must be blank.

Stardust
Blog