GPU-Accelerated MinION Basecalling On the HPC

GPU-Accelerated MinION Basecalling On the HPC

I recently helped the Rockman lab basecall their MinION sequencing data on the HPC, leveraging the power of the GPUs available there. This allowed us to bring the total time required for basecalling down to around five hours, from the two weeks(!) it was going to take on the desktop.

Since more people are beginning to perform MinION sequencing here at the Center for Genomics and Systems Biology, I thought it would be helpful to share the procedure for basecalling with GPUs on the HPC.

First, you’ll need to transfer your data to the HPC. I recommend rsync for this if you’re on a mac. If you’re using Windows, I suggest WinSCP. You’ll also need to know the flowcell and kit that was used for sequencing (see this table for the full list of options), and lastly the output path where you want the basecalled fastq files to go.

Copy the script below to your local directory, modify the first four parameters shown in red (leave --device "auto" intact), then submit it to Slurm like this: sbatch script-name.s

If you have multiple fast5 directories (for example: fast5_pass and fast5_skip), you can combine the fast5 files into one directory, or you can run the script twice, providing a different input path each time.


If you’re doing RNA sequencing, you need to provide the --reverse_sequence argument as well.

The script above should notify you via email when it begins, ends, or if there are any problems, but you can also track it’s status using:

watch squeue -u <your_netID>

Questions? E-mail me (mk5636) or post in the comments below.

Available Flowcell + Kit Combinations

Flowcell Kit
FLO-MIN106 SQK-RNA001
FLO-MIN106 SQK-RNA002
FLO-MIN107 SQK-RNA001
FLO-MIN107 SQK-RNA002
FLO-PRO001 SQK-LSK109
FLO-PRO001 SQK-DCS109
FLO-PRO001 SQK-PCS109
FLO-PRO002 SQK-LSK109
FLO-PRO002 SQK-DCS109
FLO-PRO002 SQK-PCS109
FLO-MIN107 SQK-DCS108
FLO-MIN107 SQK-DCS109
FLO-MIN107 SQK-LRK001
FLO-MIN107 SQK-LSK108
FLO-MIN107 SQK-LSK109
FLO-MIN107 SQK-LSK308
FLO-MIN107 SQK-LSK309
FLO-MIN107 SQK-LSK319
FLO-MIN107 SQK-LWP001
FLO-MIN107 SQK-PCS108
FLO-MIN107 SQK-PCS109
FLO-MIN107 SQK-PSK004
FLO-MIN107 SQK-RAD002
FLO-MIN107 SQK-RAD003
FLO-MIN107 SQK-RAD004
FLO-MIN107 SQK-RAS201
FLO-MIN107 SQK-RLI001
FLO-MIN107 VSK-VBK001
FLO-MIN107 VSK-VSK001
FLO-MIN107 SQK-LWB001
FLO-MIN107 SQK-PBK004
FLO-MIN107 SQK-RAB201
FLO-MIN107 SQK-RAB204
FLO-MIN107 SQK-RBK001
FLO-MIN107 SQK-RBK004
FLO-MIN107 SQK-RLB001
FLO-MIN107 SQK-RPB004
FLO-MIN107 VSK-VMK001
FLO-PRO001 SQK-RNA002
FLO-PRO002 SQK-RNA002
FLO-MIN106 SQK-DCS108
FLO-MIN106 SQK-DCS109
FLO-MIN106 SQK-LRK001
FLO-MIN106 SQK-LSK108
FLO-MIN106 SQK-LSK109
FLO-MIN106 SQK-LWP001
FLO-MIN106 SQK-PCS108
FLO-MIN106 SQK-PCS109
FLO-MIN106 SQK-PSK004
FLO-MIN106 SQK-RAD002
FLO-MIN106 SQK-RAD003
FLO-MIN106 SQK-RAD004
FLO-MIN106 SQK-RAS201
FLO-MIN106 SQK-RLI001
FLO-MIN106 VSK-VBK001
FLO-MIN106 VSK-VSK001
FLO-MIN106 SQK-RBK001
FLO-MIN106 SQK-RBK004
FLO-MIN106 SQK-RLB001
FLO-MIN106 SQK-LWB001
FLO-MIN106 SQK-PBK004
FLO-MIN106 SQK-RAB201
FLO-MIN106 SQK-RAB204
FLO-MIN106 SQK-RPB004
FLO-MIN106 VSK-VMK001

 

Start the New Year Off Right: How to Choose the Right Sequencer

Start the New Year Off Right: How to Choose the Right Sequencer

A new year means new sequencing projects, but how do you know which sequencer is right for your project? There are many factors that go into choosing which sequencing platform and machine will fit your specific project. Factors that must be considered are time, cost, and depth. Making the wrong decision can lead to more time and more money.

Three main things to consider when choosing what run configuration is needed and subsequently which instrument will be used are:

  1. Read Length
    The read length refers to the number of bases/nucleotides into the fragment of DNA the sequencer will read. For example, a 1×50 run will read 50 bases in from the start of the DNA fragment and a 2×50 (50×50) run will read 50 bases from one side of the fragment and then 50 bases from the opposite end. Increasing the read length will increase the coverage of the sample resulting in an increase in the overall coverage. Longer read lengths also are extremely helpful for read mapping and assemblies.
  2. Single End or Paired End
    There is also the option of reading the fragment in one (single end) or two directions (paired end). Single end will read from the beginning of the fragment and stop after the number of bases specified. Paired end will read from both ends of the fragment inwards, stopping after the number of bases specified. This allows for more complete coverage of the fragment.
  3. Depth of Coverage
    Depth of coverage is defined as the average number of reads that cover a reference. There are no concrete standards defined for what coverage one needs. The recommended guidelines vary by project and application.  Figuring out the depth of coverage needed for your specific experiment will help to determine your ideal run configuration, which sequencer to use, and also how many samples you can potentially multiplex.Illumina provides a coverage calculator that is very helpful in determining the ideal run configuration and multiplexing options for the coverage desired as well as a database of publications:
    Illumina Coverage Calculator
    Illumina Publication Database
    Also, GenoHub has a great chart on what sequencing coverage is suggested in the literature for a wide variety of applications and it is complete with references and updated regularly: GenoHub Guide.

Below are charts to help you choose which sequencer available in GenCore is best for your study.

InstrumentsandKeynotes

Suggested Applications