Installing Python, Keras, and Tensorflow from source
I found myself in the interesting position recently of needing to compile Python from source. The reasoning behind this is complicated, but it boils down to a need to use Python with Tensorflow / Keras for some natural language processing AI, as Tensorflow.js isn't going to cut it for the next stage of my PhD.
The target upon which I'm aiming to be running things currently is Viper, my University's high-performance computer (HPC). Unfortunately, the version of Python on said HPC is rather old, which necessitated obtaining a later version. Since I obviously don't have sudo
permissions on Viper, I couldn't use the default system package manager. Incredibly, pre-compiled Python binaries are not distributed for Linux either, which meant that I ended up compiling from source.
I am going to be assuming that you have a directory at $HOME/software
in which we will be working. In there, there should be a number of subdirectories:
bin
: For binaries, already added to yourPATH
lib
: For library files - we'll be configuring this correctly in this guiderepos
: For git repositories we clone
Make sure you have your snacks - this was a long ride to figure out and write - and it's an equally long ride to follow. I recommend reading this all the way through before actually executing anything to get an overall idea as to the process you'll be following and the assumptions I've made to keep this post a reasonable length.
Setting up
Before we begin, we need some dependencies:
gcc
- The compilergit
- For checking out the cpython git repositoryreadline
- An optional dependency of cpython (presumably for the REPL)
On Viper, we can load these like so:
module load utilities/multi
module load gcc/10.2.0
module load readline/7.0
Compiling openssl
We also need to clone the openssl
git repo and build it from source:
cd ~/software/repos
git clone git://git.openssl.org/openssl.git; # Clone the git repo
cd openssl; # cd into it
git checkout OpenSSL_1_1_1-stable; # Checkout the latest stable branch (do git branch -a to list all branches; Python will complain at you during build if you choose the wrong one and tell you what versions it supports)
./config; # Configure openssl ready for compilation
make -j "$(nproc)" # Build openssl
With openssl compiled, we need to copy the resulting binaries to our ~/software/lib
directory:
cp lib*.so* ~/software/lib;
# We're done, cd back to the parent directory
cd ..;
To finish up openssl, we need to update some environment variables to let the C++ compiler and linker know about it, but we'll talk about those after dealing with another dependency that Python requires.
Compiling libffi
libffi
is another dependency of Python that's needed if you want to use Tensorflow. To start, go to the libgffi GitHub releases page in your web browser, and copy the URL for the latest release file. It should look something like this:
https://github.com/libffi/libffi/releases/download/v3.3/libffi-3.3.tar.gz
Then, download it to the target system:
cd ~/software/lib
curl -OL URL_HERE
Note that we do it this way, because otherwise we'd have to run the autogen.sh
script which requires yet more dependencies that you're unlikely to have installed.
Then extract it and delete the tar.gz
file:
tar -xzf libffi-3.3.tar.gz
rm libffi-3.3.tar.gz
Now, we can configure and compile it:
./configure --prefix=$HOME/software
make -j "$(nproc)"
Before we install it, we need to create a quick alias:
cd ~/software;
ln -s lib lib64;
cd -;
libffi
for some reason likes to install to the lib64
directory, rather than our pre-existing lib
directory, so creating an alias makes it so that it installs to the right place.
Updating the environment
Now that we've dealt with the dependencies, we now need to update our environment so that the compiler knows where to find them. Do that like so:
export LD_LIBRARY_PATH="$HOME/software/lib:${LD_LIBRARY_PATH:+:$LD_LIBRARY_PATH}";
export LDFLAGS="-L$HOME/software/lib -L$HOME/software/include $LDFLAGS";
export CPPFLAGS="-I$HOME/software/include -I$HOME/software/repos/openssl/include -I$HOME/software/repos/openssl/include/openssl $CPPFLAGS"
It is also advisable to update your ~/.bashrc
with these settings, as you may need to come back and recompile a different version of Python in the future.
Personally, I have a file at ~/software/setup.sh
which I run with source $HOME/software/setuop.sh
in my ~/.bashrc
file to keep things neat and tidy.
Compiling Python
Now that we have openssl and libffi compiled, we can turn our attention to Python. First, clone the cpython git repo:
git clone https://github.com/python/cpython.git
cd cpython;
Then, checkout the latest tag. This essentially checks out the latest stable release:
git checkout "$(git tag | grep -ivP '[ab]|rc' | tail -n1)"
Important: If you're intention is to use tensorflow, check the Tensorflow Install page for supported Python versions. It's probable that it doesn't yet support the latest version of Python, so you might need to checkout a different tag here. For some reason, Python is really bad at propagating new versions out to the community quickly.
Before we can start the compilation process, we need to configure it. We're going for performance, so execute the configure
script like so:
./configure --with-lto --enable-optimizations --with-openssl=/absolute/path/to/openssl_repo_dir
Replace /absolute/path/to/openssl_repo
with the absolute path to the above openssl
repo.
Now, we're ready to compile Python. Do that like so:
make -j "$(nproc)"
This will take a while, but once it's done it should have built Python successfully. For a sanity check, we can also test it like so:
make -j "$(nproc)" test
The Python binary compiled should be called simply python
, and be located in the root of the git repository. Now that we've compiled it, we need to make a few tweaks to ensure that our shell uses our newly compiled version by default and not the older version from the host system. Personally, I keep my ~/bin
folder under version control, so I install host-specific to ~/software
, and put ~/software/bin
in my PATH
like so:
export PATH=$HOME/software/bin
With this in mind, we need to create some symbolic links in ~/software/bin
that point to our new Python installation:
cd $HOME/software/bin;
ln -s relative/path/to/python_binary python
ln -s relative/path/to/python_binary python3
ln -s relative/path/to/python_binary python3.9
Replace relative/path/to/python_binary
with the relative path tot he Python binary we compiled above.
To finish up the Python installation, we need to get pip
up and running, the Python package manager. We can do this using the inbuilt ensurepip
module, which can bootstrap a pip
installation for us:
python -m ensurepip --user
This bootstraps pip into our local user directory. This is probably what you want, since if you try and install directly the shebang incorrectly points to the system's version of Python, which doesn't exist.
Then, update your ~/.bash_aliases
and add the following:
export LD_LIBRARY_PATH=/absolute/path/to/openssl_repo_dir/lib:$LD_LIBRARY_PATH;
alias pip='python -m pip'
alias pip3='python -m pip'
...replacing /absolute/path/to/openssl_repo_dir
with the path to the openssl git repo we cloned earlier.
The next stage is to use virtualenv
to locally install our Python packages that we want to use for our project. This is good practice, because it keeps our dependencies locally installed to a single project, so they don't clash with different versions in other projects.
Before we can use virtualenv
though, we have to install it:
pip install virtualenv
Unfortunately, Python / pip is not very clever at detecting the actual Python installation location, so in order to actually use virtualenv
, we have to use a wrapper script - because the [shebang]() in the main ~/.local/bin/virtualenv
entrypoint does not use /usr/bin/env
to auto-detect the python
binary location. Save the following to ~/software/bin
(or any other location that's in your PATH
ahead of ~/.local/bin
):
#!/usr/bin/env bash
exec python ~/.local/bin/virtualenv "$@"
For example:
# Write the script to disk
nano ~/software/bin/virtualenv;
# chmod it to make it executable
chmod +x ~/software/bin/virtualenv
Installing Keras and tensorflow-gpu
With all that out of the way, we can finally use virtualenv to install Keras and tensorflow-gpu. Let's create a new directory and create a virtual environment to install our packages in:
mkdir tensorflow-test
cd tensorflow-test;
virtualenv "$PWD";
source bin/activate;
Now, we can install Tensorflow & Keras:
pip install tensorflow-gpu
It's worth noting here that Keras is a dependency of Tensorflow.
Tensorflow has a number of alternate package names you might want to install instead depending on your situation:
tensorflow
: Stable tensorflow without GPU support - i.e. it runs on the CPU instead.tf-nightly-gpu
: Nightly tensorflow for the GPU. Useful if your version of Python is newer than the version of Python supported by Tensorflow
Once you're done in the virtual environment, exit it like this:
deactivate
Phew, that was a huge amount of work! Hopefully this sheds some light on the maddenly complicated process of compiling Python from source. If you run into issues, you're welcome to comment below and I'll try to help you out - but you might be better off asking the Python community instead, as they've likely got more experience with Python than I have.