Getting started¶
This project is kept under version control in a git repository. The repository is hosted on GitHub (and the documentation made with GitHub Pages). The repository is https://github.com/philip-brohan/ML_ATB2.
If you are familiar with GitHub, you already know what to do (fork or clone the repository): If you’d prefer not to bother with that, you can download the whole thing as a zip file.
The software in this repository operate on data provided by another repository. You’ll need that too, it has its own installation instructions.
As well as downloading the software, some setup is necessary to run them successfully:
These scripts need to know where to put their output files. They rely on an environment variable SCRATCH - set this variable to a directory with plenty of free disc space.
These scripts will only work in a python environment with the appropriate python version and libraries available. I use conda to manage the required python environment - which is specified in a yaml file:
name: ml_atb2_gpu
channels:
- default
- conda-forge
dependencies:
- python=3.7
- tensorflow-gpu=2.2.*
- pillow=7.1.*
- matplotlib=3.2.*
- pdf2image=1.13.*
# Documentation formatter
- sphinx=3.1.*
# Optional, code formatter
- black
# Optional, tensorboard and profiler
- tensorboard=2.2.*
- pip
- pip:
- tensorboard_plugin_profile
Install anaconda or miniconda, create and activate the environment in that yaml file, and all the scripts in this repository should run successfully.
The environment shown above installs the GPU-enabled version of the Tensorflow libraries, and requires an (NVIDIA) GPU to run. If you have no GPU, replace tensorflow-gpu in the specification with tensorflow and it will work just as well on a CPU-only system (model training will be much slower). It can be convenient to run the data preparation and model validation steps on CPU-only systems, but for the model training steps a GPU is very desirable.