Name Matching XML Parser

Compatibility: BaseSpace Clarity LIMS v2.1.0 or later; NGS Extensions Package v4.3.0 or later

Often, data can be parsed from an instrument result file in XML format into BaseSpace Clarity LIMS, for the purposes of QC.

For example, perform a TapeStation instrument run. This produces an XML result file, which the user imports into the LIMS. The file includes information of interest for each sample, which should be parsed and stored for a range of capabilities, such as QC threshold checking, searching, and visibility in the LIMS interface.

The XmlSampleNameParser tool allows for sample data to be parsed into UDFs on result files (measurement records) that map directly to the derived samples being measured.

The XmlSampleNameParser tool is installed as a standalone jar file as part of the NGS Extensions Package. Currently it contains one script, parseXmlBySampleName.

Provided the result file is in XML format, this script can be used to match data in the file to samples in the LIMS using the measurement record LIMSID.

Values are mapped to UDFs in the LIMS using a configuration file that contains XPath mappings for the result file. (External resources, such as w3schools, can be used to learn more about Xpath, and many XML viewing tools will generate it automatically for elements of interest.)

The format for the data needed to make the association between the file contents and the sample in the LIMS is: LIMSID_NAME.

The name is optional and is supported for readability. This means it may come from the input sample on which the step is being run.
The LIMSID must come from the output result file, which is also where the parsed information will be stored in UDFs.

Typically, it is ideal to set up the instrument run with the sample and result file information, so that it will appear in the same format in the XML result file. To automate setup, you can use a tool such as the generic driver file generator.

The LIMSID_NAME can be provided to the instrument as the sample name, or as a comment or other field on the sample. The only conditions are that:

The sample field that you want to use for the LIMSID_NAME must be passed into the file result file (eg via a driver file).
The configuration file must be set up such that it can access this field from the correct location. (See Configuration File Format.)