Quickstart

First, to check whether MotifScan is properly installed, you can inspect the version of MotifScan by -v/--version option:

$ motifscan --version

Configuration

MotifScan requires basic data files including genome sequences and motif PFMs (Position Frequency Matrices) to detect the binding sites of known motifs. Before scanning, users should install genome assemblies and motif sets from a remote database or with local prepared files.

Default Installation Location

Newly installed genome assemblies are placed under $HOME/.motifscan/genomes/, if you want to change it:

$ motifscan config --set-default-genome <path>

As for motif sets (PFMs/PWMs), the default path is under $HOME/.motifscan/motifs/, you can also change it with command:

$ motifscan config --set-default-motif <path>

Please check Config for all the details about configurations.

Install genome assemblies

Install from a remote database

You can download genome assemblies from the UCSC database.

First, display all available genome assemblies:

$ motifscan genome --list-remote

Then, install a genome assembly (e.g. hg19):

$ motifscan genome --install -n hg19 -r hg19

Install with local files

To install a genome assembly locally, you have to prepare a FASTA file containing the genome sequences and a genome annotation file (refGene.txt).

$ motifscan genome --install -n hg19 -i <hg19.fa> -a <refGene.txt>

Install and build motif sets

Install from a remote database

Users can install motif PFMs sets in the JASPAR 2020 database.

First, display all available motif PFMs sets in JASPAR 2020:

$ motifscan motif --list-remote

Then, install a JASPAR motif PFMs set (e.g. vertebrates_non-redundant):

$ motifscan motif --install -n <motif_set> -r vertebrates_non-redundant -g hg19

Install with local files

Install a motif set with local PFMs file:

$ motifscan motif --install -n <motif_set> -i <pfms.jaspar> -g hg19

Build PFMs for additional genome

Build the motif PFMs set for another installed genome assembly hg38:

$ motifscan motif --build <motif_set> -g hg38

Scanning Motifs

After the data preparation steps, you can now scan a set of genomic regions to detect the occurrences of known motifs.

$ motifscan scan -i regions.bed -g hg19 -m <motif_set> -o <output_dir>