How-to guides

After building up the environment for COBRA from the Installation, this page will help you explore the abilities of COBRA, and guide you how to train this snake to do its jobs.

Setup

Before you start, please make sure you have all the required environment prepared. If you haven’t done so yet, please check those out!

when you are ready

Cool, now let us have a look what kind of abilities does this snake have.

Main Modules

Before we start, let’s just quickly go through its main modules. Currently, COBRA contains three main modules:

COBRA main modules

Command

Inputs

Outputs

cobra scorepeak

narrowPeaks, custom PWM files

csv file with binding confidence

cobra find

binding confidence, csv file with differential expression

target genes with target probability

cobra plot

outputs from previous steps, labels

dotplots and other plots

Note that: COBRA still have some other side abilities which would be introduced later. If you run cobra --help , you will get all the information you need.


cobra scorepeak

cobra scorepeaks is to calculate binding confidence for each gene given the ChIP-seq peakcalling. For example, let’s say you are interested in human STAT1 direct targets. You’ll need two inputs to run cobra scorepeak.

  • narrowPeak_dir/: a directory contains all the narrowPeak files (>20 files)

  • stat1_human.meme`: custom PWM file.

Our current working folder structure shows below

[users]$ ls
narrowPeak_dir   stat1_human.meme

If you have those two files prepared, you will be able to run cobra scorepeak now!

show an example of cobra scorepeak

1cobra scorepeak --tf STAT1 -i narrowPeak_dir/ --meme stat1_human.meme --human --core 8
Parameters of cobra scorepeaks
cobra scorepeaks cheating sheet

Parameters

Functions

--tf

TF of interest (default: STAT1)

-i

input of narrowPeaks (default: ./)

--meme

input of custom PWM files (default: tf_speicies.meme)

--human or --mouse

given one to select speicies (default: mouse)

--core

num of cores to use (default:1)


cobra find

This module is to find the TF direct target genes by given the binding scores for each gene plus differential expression analysis results. COBRA will require two inputs as well:

  • csv file with binding confidence (human_stat1.csv)

  • Differential expression analysis data (rna_dir/)

show an example of cobra find

1cobra find --tf STAT1 -c chip-init-outs.csv -r rna_dir/ -o output_dir/
Parameters of cobra find
cobra find

Parameters

Functions

--tf

TF of interest (default: STAT1)

-c

csv files of chip confidence binging outputs (default: ./)

-r

input of differential expression data or directory.

-o

output dir. If it is not exist, create one.


cobra plot

The coolest ability that COBRA has is using cobra plot to draw some pretty images showing the TF binding status for every gene given the data. From those plots, bioloists would be able to discover some stories easier. You could set

Here is the magic command to activate cobra plot.

show an example of cobra plot

1cobra plot --tf STAT1 -s mouse --gene STAT1 STAT3 -i output-cobra-find/ -o img/ -t truth.pl
Parameters of cobra plot
cobra plot

Parameters

Functions

--tf

TF of interest (default: STAT1)

-s

species

--gene

list of genes of interest

-i

outputs of cobra find (target genes with probs)

-o

image outputs dir

-t

labels or targets (provided by users)


Test COBRA

a) Load test data

For those new friends, you may not have your data prepared yet to train this snake. Don’t worry, in order to get familiar with users, COBRA itself have a test dataset in its pocket. All you need to do is to do ammunition loading using cobra load which will automatically load the ChIP-seq data into ./narrow/, RNA-seq data into ./rna/. Why not give it a trail now?

# ammunition loading
cobra load

If you check your folder with ls, you will find two folders - narrow and rna from your current working directory (please make sure that COBRA have the permission to mkdir in the current working directory). Also, if you want to load in other folders, you can simply use the parameter --dir or -d following by the folder you want. (COBRA documentation writor assume that you are lazy and will load test data in the current working directory.)

b) Run cobra normally

Once you have your test data prepared, you could simply run cobra scorepeak and cobra find step by step to get the results.

# this step would generate multiple files for future usage (by default --tf: STAT1)
cobra scorepeak -i narrow/ --meme stat1_human.meme --human --core 8

######### Done: after a few seconds #########

# gives input files or folders and output folders (GMM model will be trained)
cobra find -c human_stat1_2000.csv -r rna/ -o output/

######### A few seconds later: you will find your results in output/ #########

c) Generate figures

Let’s do some visualisation now. If you do ls, you will find there are a few files available now. You have fimo.csv which is a intermediate step of motif filtering.