Illumina Instrument Integrations FAQ
The BaseSpace Clarity LIMS integration with Illumina instrument software provides the following features:
• | Instrument monitoring |
• | Automated reporting |
• | Capturing and parsing run statistics |
This section answers questions that are often asked about Illumina integration packages.
To ensure that your Illumina instrument warranty remains valid, the instrument integration must be performed and maintained by the Clarity LIMS Support team.

The Clarity LIMS instrument integrations are designed to communicate with Illumina instruments. To ensure that your Illumina instrument warranty remains valid, the instrument integration must be performed and maintained by the Clarity LIMS Support team. Under this agreement, the LIMS software is guaranteed to not void or otherwise impact your instrument warranty in any way.

Clarity LIMS requires a minimum version of the controller software, and access to sequencer instrument workstations for the installation of a batch file script that is triggered at select sequencing run events (start run, cycle complete, and run complete). Requirements are as follows:
• | GAII, GAIIx running SCS 2.9 (RTA 1.9), SCS 2.10 (RTA 1.13) |
• | HiSeq 2000/1000 running HCS 1.4 (RTA 1.12), HCS 1.5 (RTA 1.13) |
• | HiSeq 2500/1500 running HCS 2.0.3 (RTA 1.17) |
• | CASAVA 1.8.0, 1.8.1, or 1.8.2 for Bcl conversion / demultiplexing |
• | HiSeq X running HCSX 3.1 (RTA 2.3) |
• | MiSeq running MCS 1.1 (RTA 1.13), MCS 2.0 (RTA 1.16), MCS 2.2 (RTA 1.17) |
• | NextSeq running NCS 1.2 (RTA 2.1) |

Illumina provides a supported mechanism for invoking custom scripts on key events during a sequencing run, and batch files that plug into these events.
The LIMS local script takes the context information passed to it from the Illuminasupported event infrastructure and writes to a text file in a dedicated folder (not the run directory). This folder is located on the same run data network share and is consumed by the Illumina NGS Packages event-processing software running on the LIMS server.
This specific methodology protects both the Illumina instrument and Clarity LIMS by:
• | Using only the supported external script integration points supplied by Illumina. |
• | Using a simple batch file invoked by an Illumina supported mechanism. |
• | Not conducting any read or write of any data on the instrument itself. |
• | Not conducting any write of any data to the instrument run directory. |
• | Not requiring any software to be installed on the instrument machines (no installer to run, no changes to system registry, no changes to environment variables, etc.). |
• | Not requiring an install of Automation Worker (AW) node on the Illumina computer. |

The central point of network contact and connectivity between Clarity LIMS and the Illuminainstrumentation is the Network Attached Storage (NAS).
The NAS is used as the hub and interprocess communication (IPC) mechanism to ensure that:
• | No additional network paths, ports, or connections need to be configured on-site. The Clarity LIMS server only has the same access to the NAS that the Illumina instrument (Windows®) and Bcl conversion server (Linux) already has. By not requiring direct network connectivity between the Illumina instruments and the LIMS server, the amount of work required by your IT department to enable this integration is minimized. |
• | No additional file access protocols need to be configured. The Windows-based Illumina instrument writes to the NAS using a standard file read. The Linux-based Bcl conversion server requires similar file-share-based access to operate. |
• | Robustness of the NAS is implied because: |
– | For the Illumina Instrument to run, it must be able to read/write its data to this NAS location. |
– | For the Bcl conversion to run, it must already be able to read/write its data to this NAS location. |

The NAS is mounted using a standard Linux mount using NFS or CIFS.

The general paradigm is one of consumption. This means that when an event file produced by the Clarity LIMS NGS integration has been properly processed, it is moved to an archive directory sorted by year and month.
If a Begin Run or End Run event file is found, and there is no corresponding flow cell/reagent cartridge in the system (or there is insufficient data in the run folder at this time to process it), that event file is left for processing the next time the daemon looks for events.
If a Begin Run or End Run event file is not processed within 14 days, the event is archived to the directories sorted by year and month. When an event is archived, the event file is read and the container name is pre-pended to the event file name, making it easier to find all events for a container.
If a Cycle Complete event file is found and there is no corresponding flow cell in the system, that event is consumed and removed.

The Clarity LIMS Illumina NGS integration package itself does not support this scenario. The LIMS interface operates without problems while the instrument is running (capturing run parameters, updating status, capturing and parsing statistics, and generating reports). However, after sequencing run completion, if you move the data to another location, the process fields and the generated link files will no longer be valid.
If you want to subsequently run the Bcl Conversion and Demultiplexing step, it will attempt to locate the network run data (via the custom field configured on the sequencing process), and will fail since the data is no longer there.
The minimum manual update required is to change the Output Folder field on the Illumina Sequencing (Illumina SBS) step— from the original location of the run data directory to the location of the data that will be used for the Bcl Conversion and Demultiplexing (Illumina SBS) step run. Finally, updated he Output Folder field value to the location of the permanent home of the run data on the network storage.
If you move the data after both Sequencing and Bcl Conversion and Demultiplexing steps are complete, the links to the run data and demultiplexed sample directories and calculated metrics will no longer be valid.

If the flow cell is unknown, the Begin Run and End Run events build up in the gls_events folder and the Cycle Complete events are removed. As there is no corresponding Flow Cell ID in the LIMS, no information regarding the run is recorded.

When the Bcl Conversion and Demultiplexing step completes, it attaches HTML link files to the FASTQ result file output placeholders in the LIMS. These files contain links to the demultiplexed sample directories and calculate metrics. The Demultiplex_Stats.htm and log file are attached, and the LIMS parses Flowcell_demux_summary.xml.

There is no direct indication at the file system level. Run complete events created in gls_events are consumed and the link to the run directory is created in Clarity LIMS.
Other files are created (reports, for example) and are attached directly to the LIMS. These files do not appear in the file system, so there is no clear indication when the LIMS has finished from this level.
However, a user can tell when a sequencing run is complete, or at what stage it is at, through the Clarity LIMS interface:
• | The Sequencing step displays a Finish Date and Cycle Status. |
• | The system also generates a link back to the run folder where the data was captured. |
• | The RunReport.pdf is generated. This is the last step of the Sequencing process. Once this is associated in the LIMS, the step is complete. |
While it is not possible to determine that the LIMS has finished processing at the file system level directly, it is possible to determine that processing has finished by querying the REST API. This requires custom scripting by the customer, but may provide a mechanism to detect when to initiate the next stage of the analysis pipeline, in the scenario that the Bcl Conversion and Demultiplexing integration with CASAVA does not meet requirements.

If you require configuration of multiple NAS drives, you will need to provide the Clarity LIMS Support team with the details of all of your drive locations.

The Illumina NGS Package sequencer instrument interface running on the LIMS server was designed to require a Read/Write mount to the network run data folder root for the instruments. This allows for maximum automation and convenience.
In addition, it allows for automation of subsequent Bcl conversion and demultiplexing with a supplied SampleSheet.csv file. The file is required by the CASAVA/Bcl Conversion and Demultiplexing step.
Note the following:
• | For HiSeq/GAIIx and HiSeq X, the SampleSheet.csv file is generated by and attached to the Cluster Generation step in Clarity LIMS |
• | For MiSeq and NextSeq, the SampleSheet.csv file is generated by and attached to the Denature, Dilute and Load Samples step. |
• | The file may be used with CASAVA for HiSeq/GAIIx, so is placed in the \<Run>\Data\Intensities\BaseCalls\ directory. This is where CASAVA/Bcl Conversion and Demultiplexing looks for the file. |
NextSeq and HiSeq X deliver a file for use with Illumina's bcl2fastq v2 tool.

If necessary, you can mount the run data directory with ‘Read Only’ access.
The following actions still occur and function correctly:
• | The sequencing master step fields are populated with key run information. |
• | The per-lane statistic custom fields on the flow cell are populated with run summary statistics. |
• | The SampleSheet.csv file iscreated and attached in the LIMS. |
• | The runreport.pdf is created and attached in the LIMS. |
The following actions no longer occur and fail:
• | The SampleSheet.csv file is not be written to <runFolderRoot>/Data/Intensities/BaseCalls/ |
• | A log file message records that the SampleSheet.csv failed to be written. |
If this alternative option is used, the instrument must be able to write to the network location, and the Clarity LIMS server must have Read/Write/Delete permissions to the network location. This folder location is defined in the hiseqga.seqservice.eventFileDirectory database property, and is used for the monitoring and consumption of instrument-generated event files.

Illumina instruments write locally, with a worker process run-time access (RTA) daemon. This service is responsible for moving data from the local drive to the network drive that is the actual target directory.
Both Clarity LIMS and the Illumina instrument generally assume that the network drive is a reliable resource. The Illumina software is only so tolerant of outages and much depends upon the duration (seconds, minutes, hours, days, weeks, etc.). Clarity LIMS has a general assumption that the network location will be reliable.
If there is an outage, the following may occurs:
• | If a network resource is unavailable, there is no adverse effect upon the instrument processing. Specifically, if the batch file attempts to write the event file, and the network drive cannot be accessed, it does not block or pause normal instrument operation. |
• | If a network resource is unavailable when the batch file attempts to write the event text file. That event is abandoned and is never successfully written if the first attempt fails. |
In practice, these scenarios are rare and seldom cause problems.
• | If the events do fail to be written, the Begin Run and End Run events can be created manually. The run is still recorded accurately in Clarity LIMS. Note that there is no specific real-time dependency on when that run is registered in the LIMS. |

Illumina RTA software operates by first writing to a local disk. RTA2 software instead operates in memory and then uses an HTTP interface to transfer information to the HCSX software. During the run, it analyzes the run data and transfers it to the Illumina network run directory, where it is available for access once the run completes. Clarity LIMS does not interact with or monitor the local drive directly; it only monitors the network run directory to which RTA streams the data during the run.

Clarity LIMS only captures and records data if there is a match between Flow Cell IDs in the LIMS and on the instrument run. Flow Cell IDs are recorded differently, depending on the instrument type.
HiSeq and HiSeq X—Recorded in the runParameters.xml <Barcode /> property
MiSeq—Recorded in the runParameters.xml <ReagentKitRFIDTag><SerialNumber /> property
NextSeq—Recorded in RunParameters.xml <ReagentKitSerial> property
If there is no match, no data are captured or recorded. If there is a match, the following types of data are recorded:
• | Statistical data associated to Cluster Generation (Yield PF, Avg Q Score, % Bases etc.) |
• | Status data associated to Sequencing (Cycle Status, Flow Cell ID, Flow Cell Position, Read 1/2 Cycles, Finished Date etc.) |
• | Generated reports (Run Report, which includes Clarity LIMS sample/project names, run and read cycle summaries etc.) |
• | Clarity LIMS does not capture the raw data. Instead, it provides HTML links to the locations in which this data is stored on the NAS Share—specifically, to the following: |
– | Run Data directory |
– | Demultiplexed sample directories and calculated metrics |
Avg Q Score is not present for HiSeq X and is being phased out of other instrument integrations.

To automate the dispatch and invocation of Bcl conversion and demultiplexing jobs, Clarity LIMS NGS Packages require our lightweight EPP/automation program to be installed and run on the CASAVA (Bcl) server. Some customers are uncomfortable with this, do not desire this automation, or are moving the data pre and post Bcl conversion and demultiplexing.
Normal Operation
The automation/EPP program receives dispatches from the LIMS server and runs a locally-deployed script (embedded inside the ngs-extensions.jar) to do the following:
• | Access the LIMS server API to gather the information required to generate command lines for the configure and make commands. |
• | Invoke the configure and make commands to perform Bcl conversion and demultiplexing. |
• | Wait for Bcl conversion and demultiplexing completion. |
• | Parse Flowcell_demux_summary.xml to get sample names and statistics to store in the LIMS. |
• | Traverse the Unaligned directory for Project/Sample folder names to create the HTML link files. |
Alternative Options
Customers who are unable or unwilling to install EPP on their CASAVA (Bcl) server will not be able to automate CASAVA (Bcl) runs. However, they can still benefit from two elements of the integration:
• | Generation of command lines for CASAVA (Bcl). |
• | Storage of the post-demultiplexing sample statistics and file locations in Clarity LIMS. |

Modify the configuration of the Bcl Conversion and Demultiplexing (Illumina SBS) Clarity LIMS step as follows.
1. | On the automation, update the channel to limsserver. |
2. | Update the EPP/automation command line string in such a way that: |
• | It would work properly if it was installed on the CASAVA (Bcl) server. |
• | The java path (/usr/bin/java) references the local LIMS server. |
• | The path to the ngs-extensions.jar file (/opt/gls/hiseqgaii-extensions.jar) references the local LIMS server (/opt/gls/clarity/extensions/Illumina_HiSeq/v5/EPP/hiseqgaii-extensions.jar). |
• | -m {udf:MODE} is appended to the command line. |
3. | Add a master step field/process UDF called MODE and assign it two values: |
• | SIMULATE |
• | HARVEST |
4. | Ensure that you have completed the standard configuration for the following properties: |
• | hiseqga.bcl.netPathPrefixSearch.1 |
• | hiseqga.bcl.netPathPrefixReplace.1 |
• | hiseqga.bcl.replacementSuffixes.limsserver |
The hiseqga.bcl.replacementSuffixes.limsserver property is not created by default. Add this property manually. You must do one of the following:
• | Ensure that the mount name and path through which the LIMS server 'sees' the network run data directories is the same as that through which the CASAVA(Bcl) server 'sees' these directories. |
• | Ensure that additional netPathPrefixSearch and netPathPrefixReplace properties are configured and that the hiseqga.bcl.replacementSuffixes.limsserver maps to those numeric indices. |
AThe bcl.replacementSuffixes property formerly shared a set of search/replace paths with the sequencing netPathPrefixSearchReplaceSuffixes. This is no longer the case; it now references its own bcl.netPathPrefixSearch and bcl.netPathPrefixReplace properties.

SIMULATE Mode
• | Optional) To run CASAVA (Bcl), run the Bcl Conversion and Demultiplexing (Illumina SBS) step and select step and select SIMULATE. This generates all the standard command lines, which are recored in the Script Log Details log file attached to the step. You can then use these command lines to set up the run manually on the CASAVA (Bcl) server at the command line. In SIMULATE mode, the script does everything it would have done in terms of generating command lines, but does not actually invoke anything. |
HARVEST mode
• | (Optional) When the CASAVA (Bcl) run completes, to capture statistics, run the Bcl Conversion and Demultiplexing (Illumina SBS) step and select HARVEST. In HARVEST mode, the script looks at the run folder, finds the Flowcell_demux_summary.xml file, parses the relevant data and stores everything it normally would in the LIMS. HARVEST mode also traverses the Unaligned directory for the necessary information to create the HTML link files. |

Event files take up very little space on disk. See the following calculation example for 6 HiSeqs running at full capacity. Note that MiSeqs and NextSeqs will use up even less space for their event files. It is up to the customer to determine how much space is needed for the actual sequencing data, this discussion is only about the incremental space required for LIMS event files.
• | Each HiSeq Run creates two event files which are archived: 1x Begin Run and 1 x End Run event file. The End Imaging (or cycle complete) event files are consumed in real-time and not archived. |
• | Each event file is between 180 and 220 bytes in size. |
Suppose each HiSeq runs in Rapid mode so completes a run every single day. Assume the lab has 6 HiSeqs.
• | 6 runs completed each day X 365 days = 2190 runs per year |
• | 2190 runs per year X 2 event files/run = 4380 event files produced annually |
• | 4380 event files/year X 220 bytes = 963600 bytes, which converts to 0.91 MB (1,048,576 Bytes in a megabyte) |