mandrake

Fast visualisation of the population structure of pathogens using Stochastic Cluster Embedding.

Paper:

Lees JA, Tonkin-Hill G, Yang Z, Corander J. Mandrake: visualizing microbial population structure by embedding millions of genomes into a low-dimensional representation. Philosophical Transactions of The Royal Society B. 2022;377: 20210237.

https://doi.org/10.1098/rstb.2021.0237

Documentation available at: https://mandrake.readthedocs.io/en/latest/

Installation (briefly)

See https://mandrake.readthedocs.io/en/latest/installation.html for more details.

Install miniconda.
Run conda create -n mandrake_env mandrake to install into a clean environment.
Run conda activate mandrake_env to use the environment.

Refer to the conda-forge documentation if you want to install a CUDA (GPU) enabled version.

Semi-manual

You will need some dependencies, which you can install through conda:

conda create -n mandrake_env python
conda env update -n mandrake_env --file environment.yml
conda activate mandrake_env

You can then clone this repository, and run:

python setup.py install

GPU acceleration

You will need the CUDA toolkit installed.

If you have the ability to compile CUDA (e.g. nvcc) you should see a message:

CUDA found, compiling both GPU and CPU code

otherwise only the CPU version will be compiled:

CUDA not found, compiling CPU code only

Usage

After installing, an example command would look like this:

mandrake --sketches sketchlib.h5 --kNN 500 --cpus 4 --maxIter 1000000

This would use a file sketchlib.h5 created by pp-sketchlib to calculate accessory distances using 500 nearest neighbours.

Output can be found in numerous files prefixed mandrake.embedding*.

Other useful arguments include:

--alignment use a fasta alignment to calculate distances
--accessory use a presence/absence file (Rtab or similar) to calculate distances
--distances use a .npz file from a previous run and skip straight to the embedding step
--labels give labels to colour the output by
--perplexity change the perplexity of the preprocessing (similar to t-SNE)
--animate produce a video of the optimisation
--use-gpu use a GPU for the run. Make sure to increase --n-workers.

See the documentation for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 349 Commits
.github/workflows		.github/workflows
boost		boost
docs		docs
mandrake		mandrake
src		src
test		test
vendor		vendor
.gitignore		.gitignore
.readthedocs.yml		.readthedocs.yml
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
LICENSE_SCE		LICENSE_SCE
LICENSE_kseq		LICENSE_kseq
NOTICE		NOTICE
README.md		README.md
environment.yml		environment.yml
mandrake-runner.py		mandrake-runner.py
readthedocs.yml		readthedocs.yml
requirements.txt		requirements.txt
setup.py		setup.py

License

Licenses found

bacpop/mandrake

Folders and files

Latest commit

History

Repository files navigation

mandrake

Installation (briefly)

Semi-manual

GPU acceleration

Usage

About

Topics

Resources

License

Licenses found

Stars

Watchers

Forks

Languages