Commit c6291721 authored by rpeckner-broad's avatar rpeckner-broad Committed by GitHub
Browse files


parent ff87c4bf
......@@ -17,31 +17,4 @@ Scan index Retention time (s) Precursor sequence Precursor charge T
Specter currently requires the cluster-computing framework Apache Spark; a cloud framework and accompanying website will appear in the future. See ```SpecterUserGuide.pdf``` for detailed instructions on how to set up and use Specter.
## Running a Specter job
The general syntax for running an analysis job with Specter is (from the command line of the Spark-enabled cluster):
spark-submit --driver-memory <mem> <mzMLname> <blibName> <index> <StartOrEnd> <numPartitions> <instrumentType> <tol>
where the bracketed arguments are as follows:
* mem: The amount of memory to be provisioned to the Spark driver node.
* mzMLname: The full path to the mzML file containing the DIA data to be analyzed, without the mzML extension.
* blibName: The name of the blib file containing the spectral library, without the blib extension.
* index: The first or last index of the subset of MS2 spectra to be analyzed.
* StartOrEnd: Should index be interpreted as the first (StartOrEnd = "start") or last (StartOrEnd = "end") index of the spectra to be analyzed? This is useful for breaking jobs into smaller pieces to respect cluster memory constraints.
* numPartitions: The number of partitions Spark will use to parallelize the MS2 spectra. A reasonable starting choice is five times the number of cluster CPUs.
* instrumentType: This can be one of 'orbitrap','tof', or 'other'. Use of this argument in the first two cases helps avoid certain known issues with mzMLs coming from data converted from these instrument types.
* tol: The instrument mass accuracy, in parts-per-million.
For example, the command
spark-submit --driver-memory 10g /rpeckner/data/20170501_PhosphoDIARun1 /rpeckner/libs/HumanPhosphoLib 100000 end 200 orbitrap 10
would tell Specter to analyze the first 100,000 MS2 spectra in the Orbitrap DIA experiment file /rpeckner/data/20170501_PhosphoDIARun1.mzML using the spectral library /rpeckner/libs/HumanPhosphoLib.blib with a mass tolerance parameter of 10 p.p.m., 200 partitions of the associated Spark RDD (the elements of this RDD correspond to the individual MS2 spectra), and 10g RAM available to the driver node. Specter will create a subdirectory 'SpecterResults' of the working directory, into which the results file will be written.
## System requirements
Specter requires Apache Spark (with the PySpark API) and Python >= 2.7.9 (Specter hasn't been tested with Python 3). The python packages pymzml and cvxopt are also required; depending on the administrative permissions for your cluster, this may necessitate use of an Anaconda environment. The Specter job commands above must be run from the directory containing the scripts and from this repository.
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment