Provides an implementation of the Azran-Ghahramani clustering algorithm, as detailed in the paper A New Approach to Data-Driven Clustering (see references, referred to as The AG-Paper), as well as some examples in both non-financial and financial settings, as detailed in Market Regime Classification with Signatures
- Create a new Python virtual environment running Python 3.9 (or check the full instructions below to see if newer Python versions are now supported)
- Run the
setup.sh
installation script (you may need to make it executable first) - Add the parent of the project's root directory to Path or
PYTHONPATH
(so thatfrom regimedetection.module import function
runs from any script)
This repository makes use of the signatory package, which must be installed after PyTorch, and the version must be selected with reference to the installed PyTorch version.
At the time of writing, signatory's installation guide informs the reader that signatory is supported for Python 3.6-3.9 and PyTorch versions 1.6.0-1.9.0. Signatory must also be installed after PyTorch. All packages other than PyTorch and signatory may be installed in any order, and later versions of these will likely not cause any issues.
PyTorch may not be available for the most recent Python release. It is recommended first to download Python 3.9 (this project was built with Python 3.9.9). You may find pyenv
a useful tool here, which allows you to compile legacy Python versions in your user space. Then create a virtual environment and use this Python 3.9 executable. If you are using virtualenvwrapper
, this is a simple as mkvirtualenv -p /path/to/python3.9 regimedetection
The following steps may be used to set up the repository on a Linux machine. Instructions for other operating systems will be added shortly.
git clone
the repository:git clone https://github.com/mcindoe/regimedetection.git
- Make the installation script executable:
chmod +x setup.sh
- Run the
setup.sh
installation script to install the packages in the required order - Add the parent directory of this
regimedetection
repository to thePYTHONPATH
environment variable- This allows imports such as e.g.
from regimedetection.src.metrics import euclidean_distance
to work from any working directory - In MacOS / Linux, add
export PYTHONPATH=$PYTHONPATH:/path/to/parent/dir
in your shell's config file, e.g.~/.bashrc
if using bash, or~/.zshrc
if using zsh.
- This allows imports such as e.g.
In the multiscale-k-prototypes algorithm, at each iteration the current cluster elements are used to determine the cluster centres in the next iteration. If, however, a given cluster is empty, it is not clear what to do. The solution implemented here is to do a star-shaped-init style solution (see Section 4.2 and Algorithm 2 of The AG-Paper). That is, from the collection of prototypes corresponding to all points in the space, we choose the prototype which has maximal KL-divergence from the already-assigned cluster centres. This is repeated until a prototype is assigned to each cluster index for the next iteration.
Note that this means that in the next iteration, the cluster with a manually-assigned cluster centre is guaranteed to have at least one element, since there is an element of the space with zero KL-divergence to the cluster centre.
This approach is briefly mentioned in our preprint, but it is worth being aware of.