Category: R

Circular Ideogram

Displaying data in circular fashion is increasingly getting attractive and a great way to make art out of data!

Here I show one of the many ways to make Circos style Ideogram plot of point genomic data with few genomic features. The goal here is to create a cheesy visualization of association between HIV or MLV integration sites to genes & GC content. The data and code is available at my Github repository.


The handy cut()

One of the most interesting functions for a bioinformatician in R is cut()

Want to a generate asthetically pleasing window size label given an integer?

getWindowLabel <- function(x) {
    ind <- cut(abs(x), c(0, 1000, 1e+06, 1e+09, 1e+12), include.lowest = TRUE, 
                  right = FALSE, labels = FALSE)
    paste(x/c(1, 1000, 1e+06, 1e+09, 1e+12)[ind], 
          c("bp", "Kb", "Mb", "Gb")[ind], sep = "")

getWindowLabel(c(0, 1e+07, 1000, 1e+06, 2e+09))

[1] "0bp"  "10Mb" "1Kb"  "1Mb"  "2Gb" 

How about those significant p-values?

getStars <- function(x) 
     as.character(cut(x, c(0, 0.001, 0.01, 0.05, 1), 
                  c("***", "**", "*", ""), 
                  include.lowest = TRUE))

getStars(c(0, 0.001, 0.01, 0.05, 1))

[1] "***" "***" "**"  "*"   ""

You can use cut() to get equally sized bins of datapoints or convert a numeric variable into a categorical type.


R Packages

hiAnnotatorcontains set of functions which allow users to annotate a RangedData or GRanges object with custom set of annotations. The basic philosophy of this package is to take two RangedData or GRanges objects (query & subject) with common set of space/seqnames (i.e. chromosomes) and return associated annotation per space/seqname and rows from the query matching space/seqnames and rows from the subject (i.e. genes or cpg islands).

hiReadsProcessor: contains set of functions which allow users to process LM-PCR products sequenced using any platform. Given an excel/txt file containing parameters for demultiplexing and sample metadata, the functions automate trimming of adaptors and identification of the genomic product. Genomic products are further processed for QC and abundance quantification.

R Code

Collection of posts to do neat stuff with R.


Where I stash most of my pleasant looking code

Easy Reproducible Report using RStudio, Markdown, & Knitr

Why am I doing this?

  • Software for doing these sort of things are gaining momentum in the open science community.
  • Presenting your work in this sort of fashion looks unique & saves you time.

Why Reproducible Report (RR)?

  • Code, analysis, and results are in a single file.
  • Automatically renew the report when the code or data changes.
  • Avoid erros by NOT copy-pasteing results.
  • Preserve contextual narrative about why analysis was performed in a certain fashion [1].
  • Documentation for the analytic and computational processes from which conclusions are drawn [1].

Workflow in a nutshell

Combine R with plain text file format to produce documents (e.g., pdfs, HTML documents, etc.)

Look here for all possible instances:

In this document we are going to explore making reports using Knitr & Markdown in RStudio

Why RStudio, Markdown, & Knitr?

  1. Classically RR is made using Sweave (R + LaTeX)
    • Sweave, a function in R to make reports from R code weaved with LaTeX.
      • The problem: Not smart enough to reformat/resize things if unfit for a document.
      • Limited to only understanding LaTeX
    • LaTeX, an elegant markup language
      • The problem: Too many conventions to remember to make a simple report.
      • But if mastered, then fruits are worth it!
  2. Knitr, R package to overcome most limitation of Sweave
    • Allows any input languages (e.g. R, Python and Awk) and any output markup languages (e.g. LaTeX, HTML, Markdown and reStructuredText)
  3. Markdown, a light weight markup language that\’s simple, readable, and intuitive.
    • Notice the irony here: markup languages generally take a piece of text and decorations around it to render the final results (i.e. <strong>BOLD TEXT</strong>), while in markdown you are doing the bare minimum!
  4. RStudio, software with builtin support for Knitr and Markdown….and it\’s simply awesome!

What are we doing essentially?

Weave R code into a document ->
Where text is fashioned by markdown ->
Which Knitr will parse and ->
Write the output into a HTML file.


Two best friends in RStudio


  • MD button is a reference to Markdown documentation
  • Knit HTML button will make the report.

Simplest Example

  1. Open a new R markdown document in RStudio. File -> New -> R Markdown.
  2. This launches a sample document to get you started.
    • All the R code is enclosed in triple tick marks called code chunks
    • Each code chunk by default shows up in the final output. You can add various options to fine tune the behavior of code chunks.
  3. Press Knit HTML button to see what it does.

R Code Chunks

To insert an R code chunk, you can type it manually or just press Chunks - Insert chunks or use the shortcut key. This will produce the following code chunk:

The following R code chunk is as follows:

myData <- data.frame(x = 1:10, y = 20:30)

The code chunk input and output is then displayed as follows:

myData <- data.frame(x = 1:10, y = 21:30)
##     x  y
## 1   1 21
## 2   2 22
## 3   3 23
## 4   4 24
## 5   5 25
## 6   6 26
## 7   7 27
## 8   8 28
## 9   9 29
## 10 10 30


Some example of ggplot!



Pressing tab when inside the braces will bring up code chunk options.

Most frequently used options (bold text denote default options):

  • comment: display # in front of text when printing text from code chunk. Set to “” to display nothing.
  • echo: display the code chunk itself in the output. (T/F)
  • eval: evaluate the code chunk. (T/F)
  • result: how should the results be displayed. (markup,asis,hide)
  • fig.width or fig.height : dimensions of the figure. (7)
  • cache : save the code chunk computation for reuse later. To rerun cached code chunks, just delete the contents of the cache folder. (T/F)

References & Excellent resources to learn more