I recently helped the Rockman lab basecall their MinION sequencing data on the HPC, leveraging the power of the GPUs available there. This allowed us to bring the total time required for basecalling down to around five hours, from the two weeks(!) it was going to take on the desktop.
Since more people are beginning to perform MinION sequencing here at the Center for Genomics and Systems Biology, I thought it would be helpful to share the procedure for basecalling with GPUs on the HPC.
First, you’ll need to transfer your data to the HPC. I recommend rsync for this if you’re on a mac. If you’re using Windows, I suggest WinSCP. You’ll also need to know the flowcell and kit that was used for sequencing (see this table for the full list of options), and lastly the output path where you want the basecalled fastq files to go.
Copy the script below to your local directory, modify the first four parameters shown in red (leave
--device "auto" intact), then submit it to Slurm like this:
If you have multiple fast5 directories (for example: fast5_pass and fast5_skip), you can combine the fast5 files into one directory, or you can run the script twice, providing a different input path each time.
If you’re doing RNA sequencing, you need to provide the
--reverse_sequence argument as well.
The script above should notify you via email when it begins, ends, or if there are any problems, but you can also track it’s status using:
watch squeue -u <your_netID>
Available Flowcell + Kit Combinations