Beginner’s Guide to Bioinformatics Tools for Analyzing Microbiome Data

Next-generation sequencing technologies have allowed for sequencing at a low cost and fast speed, and is used more and more to study microbial communities. RNA-seq metatranscriptome and WGS metagenome studies aim to investigate microbial communities at genome and transcriptome levels. In this article, I will introduce a few tools that I frequently use to analyze metagenomic and metatranscriptomic datasets. Generating Microbial community taxonomy profiles Since a variety of microbes live in the microbial community at Read more…

Three Useful Nextflow Patterns Every Computational Biologist Should Know

In this article I’ll go over three Nextflow patterns I frequently use to make development of Nextflow data processing pipelines easier and faster. I use each of these in most of my workflows, so they really come in handy. I am assuming here that you know what processes, channels, strings, closures, directives and operators are and are somewhat comfortable writing Groovy and Nextflow code. If you want further details on any of the topics I Read more…

HighPrep PCR Beads as an AMPureXP Alternative

Comparing HighPrep PCR and AMPureXP for cleanup and size selection Written by Hana Husic High-throughput sequencing requires precise size selection of DNA fragments in order to increase the amount of usable data generated. If the fragments are too small, sequencing reads could be contaminated with adapter sequences. If the fragments are too long, library quantification is not as accurate and the run could under-cluster, producing less reads. Therefore, size selection is one of the most Read more…

Gene Set Enrichment Analysis in Minutes with the NASQAR Web App

Gene Set Enrichment Analysis (GSEA) is a common method to analyze RNA-Seq data that determines whether a predefined defined set of genes (for example those in a GO term or KEGG pathway) show statistically significant and concordant differences between two biological phenotypes. There are a myriad of tools for GSEA analysis, and one of them which I particularly like is clusterProfiler. Developed as an R-based tool, clusterProfiler has until now been inaccessible to users unfamiliar Read more…

Analyze your Data Faster with NASQAR: Nucleic Acid SeQuence Analysis Resource

The bioinformatics team at the NYU Center for Genomics and Systems Biology in Abu Dhabi and New York have recently developed NASQAR (Nucleic Acid SeQuence Analysis Resource), a web-based platform providing an intuitive interface to popular R-based bioinformatics data analysis and visualization tools including Seurat, DESeq2, Shaman, clusterProfiler, and more. These tools, although powerful, typically require significant computational experience and lack a graphical user interface (GUI), making them inaccessible to many researchers. NASQAR addresses this Read more…

GPU-Accelerated MinION Basecalling On the HPC

I recently helped the Rockman lab basecall their MinION sequencing data on the Prince HPC, leveraging the power of the GPUs available there. This allowed us to bring the total time required for basecalling down to around five hours, from the two weeks(!) it was going to take on the desktop. Since more people are beginning to perform MinION sequencing here at the Center for Genomics and Systems Biology, I thought it would be helpful Read more…

How To Find Out What Barcodes Are In Your Undetermined Reads

Sometimes after demultiplexing there exists a high number of undetermined reads, i.e. reads which were not assigned to any library based on the barcodes provided. This is most often the result of incorrect metadata or barcode contamination. Determining what barcodes are present in the undetermined reads can be useful in troubleshooting your run. NOTE: If you’re sequencing at the NYU Genomics Core, we automatically provide undetermined read data for you in your MultiQC report The Read more…

Beginners Guide: What is OpenStack?

OpenStack, a project originally started by NASA and Rackspace, is an open source cloud computing platform that enables users to access and control pools of compute, storage, and networking resources. TechCrunch calls it “one of the most important and complex open-source projects you’ve never heard of”. Like competitor Amazon Web Services (AWS) and other cloud platforms, OpenStack consists of several interrelated components: compute, storage, image management, networking, etc. Together these components deliver a massively scalable Read more…

reform featured image

reform: Modify Reference Sequence and Annotation Files Quickly and Easily

reform is a python-based command line tool that allows for fast, easy and robust editing of reference genome sequence and annotation files. With the increase in use of genome editing tools such as CRISPR/Cas9, and the use of reference genome based analyses, the ability to edit existing reference genome sequences and annotations to include novel sequences and features (e.g. transgenes, markers) is increasingly necessary. reform provides a fast, easy, reliable, and reproducible solution for creating Read more…

Next-Generation Sequencing Analysis Resources

The NYU Center For Genomics and Systems Biology in New York and Abu Dhabi have developed a new website with resources for mastering NGS analysis: https://learn.gencore.bio.nyu.edu/ Modules are designed to provide hands on experience with analyzing next generation sequencing data. Standard pipelines are presented that provide the user with and step-by-step guide to using state of the art bioinformatics tools. Each module includes sample data sets and scripts that can be accessed on NYU’s HPC Read more…