GPU-Accelerated MinION Basecalling On the HPC

I recently helped the Rockman lab basecall their MinION sequencing data on the Prince HPC, leveraging the power of the GPUs available there. This allowed us to bring the total time required for basecalling down to around five hours, from the two weeks(!) it was going to take on the desktop. Since more people are beginning to perform MinION sequencing here at the Center for Genomics and Systems Biology, I thought it would be helpful Read more…

How To Find Out What Barcodes Are In Your Undetermined Reads

Sometimes after demultiplexing there exists a high number of undetermined reads, i.e. reads which were not assigned to any library based on the barcodes provided. This is most often the result of incorrect metadata or barcode contamination. Determining what barcodes are present in the undetermined reads can be useful in troubleshooting your run. NOTE: If you’re sequencing at the NYU Genomics Core, we automatically provide undetermined read data for you in your MultiQC report The Read more…

Beginners Guide: What is OpenStack?

OpenStack, a project originally started by NASA and Rackspace, is an open source cloud computing platform that enables users to access and control pools of compute, storage, and networking resources. TechCrunch calls it “one of the most important and complex open-source projects you’ve never heard of”. Like competitor Amazon Web Services (AWS) and other cloud platforms, OpenStack consists of several interrelated components: compute, storage, image management, networking, etc. Together these components deliver a massively scalable Read more…

reform featured image

reform: Modify Reference Sequence and Annotation Files Quickly and Reproducibly

Update 7/21/2021: reform has officially been published as an NFT. Read about this experiment in scientific publishing here. Access the reform publication (PDF) here. Update 11/20/2019: reform is now available as a web app https://reform.bio.nyu.edu/ reform is a python-based command line tool that allows for fast, easy and robust editing of reference genome sequence and annotation files. With the increase in use of genome editing tools such as CRISPR/Cas9, and the use of reference genome Read more…

Next-Generation Sequencing Analysis Resources

The NYU Center For Genomics and Systems Biology in New York and Abu Dhabi have developed a new website with resources for mastering NGS analysis: https://learn.gencore.bio.nyu.edu/ Modules are designed to provide hands on experience with analyzing next generation sequencing data. Standard pipelines are presented that provide the user with and step-by-step guide to using state of the art bioinformatics tools. Each module includes sample data sets and scripts that can be accessed on NYU’s HPC Read more…

Building an Analysis Pipeline for HPC using Python

In this post we will build a pipeline for the HPC using Python 3. We will begin by building the foundation for a pipeline in Python in part 1, and then use that to build a simple NGS analysis pipeline in part 2. At NYU, we submit jobs to the HPC using the Slurm Workload Manager. Therefore, we will implement this pipeline using Slurm, however this could easily be substituted for PBS or another similar Read more…

Shared Genome Resource

Explore the New Shared Genome Resource

Save time and resources with the local CGSB repository of commonly used genomic data sets. Data is obtained from Ensembl and NCBI. New versions/releases will be added periodically or upon request. Previous versions/releases will be preserved. All files are readable from within the shared genome resource. There is no need to copy the file(s) to your local directory. The second table below shows all available data types. All data are stored in a common location Read more…

Salmon and kallisto

Salmon & kallisto: Rapid Transcript Quantification for RNA-Seq Data

Salmon and kallisto might sound like a tasty entree from a hip Tribeca restaurant, but the duo are in fact a pair of next-generation applications for rapid transcript quantification. They represent a new approach to transcript quantification using NGS that has a number of advantages over existing alignment-based methods. I’ve tried them both out and provide my thoughts below. We are used to quantification methods that rely on full base-to-base alignment of reads to a Read more…

Node to Joy: Maximize Your Performance on the HPC

In this post we’ll discuss maximizing your performance on the HPC. This entry is aimed towards experienced HPC users; for new users, please see Getting Started on the HPC. Recent advances in sequencing technology have made High Performance Computing (HPC) more critical than ever in data-driven biological research. NYU’s HPC resources are available to all NYU faculty, staff, and faculty-sponsored students. Familiarizing yourself with NYU’s HPC resources, and optimizing your use of this technology will Read more…