HPC Data Analysis Best Practices

Running Jobs:

  • Your job should run in, and output written to, your home directory on scratch. Ex: /scratch/<netID>/my-project/job-xyz/
  • Your PBS scripts should live in your job or project directory
  • Keeping a copy of the job script in it’s run directory is good practice as it allows you to check later what parameters you used for this job
  • All other scripts (ex: python scripts, other executables) should live in your home folder (ie. /home/<netID>)
  • If you need to run a script that you created (a python script for example), call it from your home directory (accessible via the $HOME variable) in your pbs script
  • If you are loading a module, always include the full path (including version) in your scripts. This ensures reproducibility.
  • If you need a software package which is not available on the HPC, please email the HPC team at hpc@nyu.edu with your request.

Fastq Delivery from GenCore

Location: /scratch/cgsb/gencore/out/<PI>

  • Files here are not subject to flushing
  • Files here are backed up

Lab Share Directory

Location: /scratch/cgsb/<PI>

  • 5TB Quota
  • Files here are not subject to flushing
  • Files here are backed up to tape
  • Use this directory to share your analysis and results with other members of your lab
  • Reference and input data (ex: GenCore delivered fastq files, reference genome, indexes, etc.) should not live here!
  • To establish a labshare directory on /scratch/cgsb and to add members to your Labshare directory submit a request using this form

Personal Directory on scratch

Location: /scratch/<netID>

  • 5TB Quota
  • Files here which are not used for a period of 60 days are subject to flushing.
  • Use this directory to run your analysis and store analysis results as you are working on them

Personal Directory on home

Location: /home/<netID>

  • 20GB Quota
  • Not subject to flushing
  • Backed up
  • Your custom scripts (python scripts or other executables) should live here

Personal Directory on archive

Location: /archive/<netID>

  • 2TB Quota
  • Not subject to flushing
  • Data will be retained for 5 years
  • Data will be backed up to tape
  • Completed analyses and results that you want to store should be archived (tar) and then stored here

Shared Genome Resource

Location: /scratch/work/cgsb/reference_genomes/

  • Local CGSB repository of commonly used genomic data sets
  • New organisms/versions/releases will be made available periodically or upon request (Mohammed Khalfan – mkhalfan@nyu.edu)
  • Previous versions/releases will be preserved
  • More about the Shared Genome Resource…