Mohammed Khalfan – Genomics Core at NYU CGSB

Bioinformatics

Streamlined RNA-Seq Analysis Using Nextflow

UPDATED: April 16, 2024 nf-core is a community effort to collect a curated set of analysis pipelines built using Nextflow. This post will walk you through running the nf-core RNA-Seq workflow. The pipeline uses the STAR aligner by default, and quantifies data using Salmon, providing gene/transcript counts and extensive quality Read more…

By Mohammed Khalfan, 4 years2021-02-05 ago

Bioinformatics

Variant Calling Pipeline using GATK4

This is an updated version of the variant calling pipeline post published in 2016 (link). This updated version employs GATK4 and is available as a containerized Nextflow script on GitHub. Identifying genomic variants, including single nucleotide polymorphisms (SNPs) and DNA insertions and deletions (indels), from next generation sequencing data is Read more…

By Mohammed Khalfan, 5 years2020-03-25 ago

Bioinformatics

Gene Set Enrichment Analysis in Minutes with the NASQAR Web App

Gene Set Enrichment Analysis (GSEA) is a common method to analyze RNA-Seq data that determines whether a predefined defined set of genes (for example those in a GO term or KEGG pathway) show statistically significant and concordant differences between two biological phenotypes. There are a myriad of tools for GSEA Read more…

By Mohammed Khalfan, 6 years2019-08-02 ago

Bioinformatics

Analyze your Data Faster with NASQAR: Nucleic Acid SeQuence Analysis Resource

The bioinformatics team at the NYU Center for Genomics and Systems Biology in Abu Dhabi and New York have recently developed NASQAR (Nucleic Acid SeQuence Analysis Resource), a web-based platform providing an intuitive interface to popular R-based bioinformatics data analysis and visualization tools including Seurat, DESeq2, Shaman, clusterProfiler, and more. Read more…

By Mohammed Khalfan, 6 years2019-07-18 ago

Bioinformatics

GPU-Accelerated MinION Basecalling On the HPC

I recently helped the Rockman lab basecall their MinION sequencing data on the Prince HPC, leveraging the power of the GPUs available there. This allowed us to bring the total time required for basecalling down to around five hours, from the two weeks(!) it was going to take on the Read more…

By Mohammed Khalfan, 6 years2019-05-30 ago

Bioinformatics

How To Find Out What Barcodes Are In Your Undetermined Reads

Sometimes after demultiplexing there exists a high number of undetermined reads, i.e. reads which were not assigned to any library based on the barcodes provided. This is most often the result of incorrect metadata or barcode contamination. Determining what barcodes are present in the undetermined reads can be useful in Read more…

By Mohammed Khalfan, 6 years2019-05-02 ago

Bioinformatics

Beginners Guide: What is OpenStack?

OpenStack, a project originally started by NASA and Rackspace, is an open source cloud computing platform that enables users to access and control pools of compute, storage, and networking resources. TechCrunch calls it “one of the most important and complex open-source projects you’ve never heard of”. Like competitor Amazon Web Read more…

By Mohammed Khalfan, 7 years2018-10-30 ago

Bioinformatics

reform: Modify Reference Sequence and Annotation Files Quickly and Reproducibly

Update 7/21/2021: reform has officially been published as an NFT. Read about this experiment in scientific publishing here. Access the reform publication (PDF) here. Update 11/20/2019: reform is now available as a web app https://reform.bio.nyu.edu/ reform is a python-based command line tool that allows for fast, easy and robust editing Read more…

By Mohammed Khalfan, 7 years2018-10-10 ago

Bioinformatics

Next-Generation Sequencing Analysis Resources

The NYU Center For Genomics and Systems Biology in New York and Abu Dhabi have developed a new website with resources for mastering NGS analysis: https://learn.gencore.bio.nyu.edu/ Modules are designed to provide hands on experience with analyzing next generation sequencing data. Standard pipelines are presented that provide the user with and Read more…

By Mohammed Khalfan, 7 years2018-04-05 ago

Bioinformatics

Building an Analysis Pipeline for HPC using Python

In this post we will build a pipeline for the HPC using Python 3. We will begin by building the foundation for a pipeline in Python in part 1, and then use that to build a simple NGS analysis pipeline in part 2. At NYU, we submit jobs to the Read more…

By Mohammed Khalfan, 7 years2018-04-05 ago