This white paper presents an overview of AnIML (Analytical Information Markup Language). It describes its features and highlights a number of possible applications. It is intended to assist decision makers in the user and vendor communities when evaluating the use of AnIML as a data format.
Welcome to AnIML
Meet AnIML – Analytical Information Markup Language. AnIML is a standardized data format that allows for storing and sharing of experiment data in a single format. It is suitable for a wide range of analytical measurement techniques. Using AnIML, you can accurately record and document laboratory workflows and results, no matter which instruments or measurement techniques were used.
To achieve this, AnIML provides a generic data container—the AnIML Core—that permits the storage of arbitrary analytical data. This includes
- Sample information
- Method information
- Measurement results
- Instruments and software used
- Workflow information that ties experiments and samples together
Technique Definitions permit the formal specification of constraints for using this data container. Such a definition can prescribe how the data for specific measurement techniques should be captured in an AnIML document. This way, AnIML can be applied to many different analytical techniques. It grows gracefully as new techniques are developed, allowing continued use of existing software components.
AnIML is based on XML. This has two interesting consequences. First, many tools for XML manipulation are readily available off the shelf, making implementation easier. Second, as XML is a text-based format, AnIML documents are human readable—which is important for long-term storage.
While AnIML has its roots in analytical chemistry, efforts have been made to apply the standard to many other scientific domains. AnIML was developed by the ASTM E13.15 Subcommittee on Analytical Data, which consists of volunteers from the industrial, academic, governmental and vendor communities.
The Need for a New Data Standard
One of the challenges in laboratory data management is the handling of experimental data. There are excellent instruments available from many different vendors, but most instruments produce data in their own proprietary formats. This leads to major difficulties in data processing, sharing and archival. There is little choice in software, as users are often tied to the tools that came with the instrument.
By its very nature, every electronic record carries an implicit expiration date. The reason for this is that the system components required for data access may degrade or become unavailable over time. Archiving data in a proprietary format means that the original application is required to open the data in the future. Accordingly, organizations need to retain this software for the desired lifetime of the data. As hardware and operating systems evolve, such software will no longer run. Access to the data is lost. Users are forced to buy software upgrades for every type of instrument data they need to maintain. This is costly and works only as long as the chosen data formats are still supported—and the vendors are still in business. Through the use of techniques like virtualization and hardware emulation, the lifetime of software can be extended, reducing the frequency of upgrades. However, this only defers the problem.
The solution is to move the data into an open, standardized, multi-technique format. This greatly reduces the number of software tools required to maintain to retain access to the data.
Due to the proprietary nature of instrument data files, it is difficult to share data with other scientists. A number of standards already exist to address this issue, including SpectroML, JCAMP-DX and ANDI. However, these standards apply only to particular measurement methods; they are not intended for cross-technique applications. This makes data sharing, analysis, collaboration, instrument integration and interfacing with other software such as laboratory information management systems (LIMS) difficult.
Having a universally accepted standard that supports multiple techniques makes data exchange much easier.
Experiments in an automated laboratory are becoming more and more complex, involving the use of multiple analytical techniques. These techniques are often applied in combination or sequence on the same sample. Workflows may also involve sample acquisition, subsampling and sample preparation. At the moment, no other file format exists that can represent such experiments. Before AnIML, this meant that a home-grown custom data representation needed to be devised and implemented.
A Look at the Architecture
Let’s explore the high-level architecture behind AnIML. The standard is divided into two logical layers, the AnIML Core and the AnIML Technique Definitions. Each of these layers is described by an XML Schema, the Core Schema and the Technique Schema.
The AnIML Core provides a universal container for arbitrary analytical data. This container is very flexible: it accepts name–value pairs, hierarchies and multidimensional data sets to represent the actual data. Additionally, mechanisms for organizing the data into samples and experiments are provided. Every AnIML document is governed by the Core Schema.
AnIML Technique Definitions
To allow for data interchangeability, it is necessary to constrain the usage of the Core. That‘s where Technique Definitions come in. A Technique Definition describes how to use the Core for experiments involving a particular measurement technique. Like a digital blueprint, it defines how the data need to be structured and labeled. This description is machine readable: Technique Definitions are simple XML documents, governed by the Technique Schema.
AnIML Technique Extensions
Sometimes, the fields defined in a Technique Definition are not sufficient to characterize an experiment. This can be the case if an instrument can measure additional parameters, or when users need to store additional sample information. To accommodate such additional fields, a Technique Extension can be used. A Technique Extension defines which fields should be added to a technique and how they should be structured. Because this description is a machine-readable XML document, software can discover such additional fields programmatically. This way, the standard can be extended without breaking compatibility with existing applications.
Inside the Universal Data Container
The AnIML Core can be considered the heart of the standard. With its flexibility, it provides a universal data container that is used to document our experiments. This section will highlight some of the concepts of the Core. Each AnIML document can contain Samples, Experiment Steps, an Audit Trail, and Digital Signatures.
Each sample used in an AnIML document is declared at the top of the document. It is identified by a sample ID, a bar code and a name. Samples can carry any number of attributes, which can be organized in hierarchical categories, allowing for complete representation of sample characteristics.
The attributes that describe a sample vary depending on the experiment performed. To be precise, they are specific to the analytical technique. A Technique Definition can therefore prescribe the structure of a sample definition.
If required, the relationship between a sample and its container can be described as well. This is useful if your samples are located in microtiter plates, racks or vials, or on 2D gels. Each sample is declared only once per document. It can then be referenced by its sample ID.
An Experiment Step documents how a particular analytical technique has been applied to a technique-specific set of samples. It can be thought of as an instance of a Technique Definition, or as a basic building block of an analytical workflow. An Experiment Step always represents a single action, such as the acquisition of a single infrared spectrum.
For each Experiment Step, the AnIML Core can store the measured results, a description of the method, references to the samples involved and the Technique used. It also provides a number of metadata fields that further describe how the experiment was performed. The exact content is technique specific and prescribed by the corresponding Technique Definition.
Experiment Steps can both consume and produce samples. Analytical techniques typically consume samples and analyze them. However, sample preparation or separation techniques also produce new materials. An example would be liquid chromatography coupled with a mass spectroscopy (MS) detector. A chromatography Experiment Step consumes the sample, separates it and produces fractions. Each time a mass spectrum is acquired, a corresponding MS Experiment Step is created that consumes and analyzes the fraction as it exits the column. With this mechanism, material flow between experiments can be tracked.
An Experiment Step also can consume data produced by other Experiment Steps. Let’s consider an MS Experiment Step containing a mass spectrum result. The spectrum is processed and generates a peak table. Accordingly, a new peak table Experiment Step is created that consumes the data measured by the MS Step. This illustrates the data flow between experiments.
Representing Laboratory Workflows
AnIML expresses laboratory workflows by assembling and connecting Samples and Experiment Steps. This way, both the material and the data flow in the process are captured. This may sound complex, but a document becomes only as complex as the workflow it represents. Simple experiments are straightforward to represent using AnIML.
A simple infrared experiment
Analyzing a sample with an infrared spectrophotometer generates two entries in an AnIML document. The first is a Sample Definition containing a unique sample ID and the sample attributes. The second is a single Experiment Step containing the infrared spectrum and the instrument parameters,
Enriching the workflow with other techniques
Having started with a simple experiment, we can now expand the workflow. Suppose the same sample is now analyzed using UV/Vis spectroscopy. A new Experiment Step is then created, containing the UV/Vis trace. It is configured to reference the same sample ID as the infrared Experiment Step above.
If desired, additional Experiment Steps can be added to describe the sample preparation.
Making AnIML Work with Any Measurement Technique
The fact that the AnIML Core is independent of measurement techniques makes AnIML very powerful. With such a generic architecture, AnIML can handle data from all well-known and frequently used techniques, including spectroscopy, chromatography and imaging. However, it can also be used for custom or one-off experiments, micro-fluidic chips or special sensors, making these techniques first-class citizens in the data system. Over time, new analytical techniques and their corresponding Technique Definitions will be developed. This generic approach allows the system to use them without requiring modifications or software upgrades.
As discussed above, Technique Definitions and Technique Extensions provide machine-readable definitions of the layout of Experiment Steps and Samples. Not only does this allow software to discover how to create data files for a given technique, it also permits validating the content of an AnIML document. Here, the Technique Definitions serve as a checklist of data fields that must be present. Technique Definitions and Extensions can mark these fields as optional or required, enabling a validator tool to verify that all required information about an experiment has been captured properly. This opens new possibilities for improving data quality and integrity.
Laboratories involved in certain aspects of pharmaceutical or environmental research are subject to rules and regulations such as 21 CFR Part 11, EPA CROMERR, Good Laboratory Practices (GLP) and Good Manufacturing Practices (GMP). AnIML provides mechanisms to help organizations address these requirements.
Certain regulations require the implementation of electronic signatures on data records. AnIML supports this by providing a standardized way of applying digital signatures to scientific data. Like all data formats, AnIML is not in a position to prevent unauthorized modification of a document. This is the job of the underlying storage system, where appropriate access controls must be enforced. However, using digital signatures, such unauthorized changes can be detected.
In addition to detecting modifications, it is also possible to use signatures to implement sign-off workflows and multi-level approval processes in a paperless laboratory.
Users can apply multiple signatures to an AnIML document. Each signature can have a different scope, covering different parts of a document. This proves quite useful in practice: each staff member can sign for those parts of the experiment he or she conducted. Eventually, one could envision instruments directly signing off on the results they produce, proving data authenticity. Such a mechanism allows for total traceability of scientific data, from long-term archival all the way back to the original instrument.
The AnIML signature mechanism leverages the established W3C XML Digital Signature standard. This ensures compatibility with existing public key infrastructures, certificates, key tokens and smart cards. In many countries, it is possible to create legally binding signatures this way.
Changes to AnIML documents can be recorded in the built-in audit trail. Each audit trail entry accurately records all aspects of a change. This includes the old and the new values, the person responsible, the reason for the change and a time stamp. Using the audit trail, it is possible to examine the changes and revert the document to previous versions. To increase security, audit trail entries can be digitally signed.
Let’s look at a number of application areas where the deployment of AnIML promises significant advantages.
Depending on the use case, electronic data records must be preserved for several decades. Today, many laboratories choose to archive their data in PDF format. However, PDF only captures an image of the scientific data, not the actual raw numbers. This makes it difficult to perform operations such as reprocessing, recalculation and re-integration of the data in such files. AnIML ensures that the actual values are preserved and remain readable and available for reprocessing in the future.
It is no longer necessary to retain the original instrument software. Instead, a single generic AnIML viewer tool can be used to access any document in the archive. This results in significant cost savings in archive maintenance.
At its core, AnIML stores the experimental data in XML. This ensures that the data are structured and tightly constrained, but human readable at the same time. Therefore, even if the software required to access the archived data is lost, full reconstruction of any archived record remains possible. In addition, the syntactic (XML) and semantic (technique) integrity of archived records can be verified with a validation tool to maintain data quality and archival integrity.
Cross-technique data mining
Having data of multiple techniques in the same format allows for cross-technique data analysis and provides a solid foundation for data mining tools.
Instrument software today offers many features for analysis and post-processing of result data. However, every vendor implements algorithms slightly differently, making it hard to compare results from different instruments. To solve this problem, organizations can convert results to AnIML as soon as the results are produced by an instrument. Any processing and analysis can then be performed on the AnIML document using a standard tool, improving consistency.
AnIML can be used to publish scientific data on the internet or an intranet. Journal publishers and authors can make supplementary materials for articles available for download in AnIML format. Some publicly funded research programs require submission or public dissemination of results; AnIML can serve as a valuable tool in such efforts.
By design, AnIML allows for the creation of generic software components that work with any analytical technique. Such software is independent of the measurement technique, the instruments used and the software needed to drive the instruments.
Accordingly, it is no longer necessary to install expensive proprietary instrument software on all laboratory PCs to permit the simple viewing of data: we need only deploy and maintain a very compact set of software tools on a user’s PC. . Only the generic components are required. This also yields benefits to instrument vendors, who can reuse the same set of software components across their product lines, resulting in significantly reduced implementation times.
Organizations seeking to store certain parts of the results data in LIMS can also benefit from AnIML. Rather than implementing a LIMS interface for every instrument used, the organization could establish a single AnIML interface . The LIMS would then extract any required information from the AnIML document, no matter which instrument was used.
With AnIML, laboratories have an open format at their disposal to deliver experimental results to end users. This is especially beneficial for service labs that perform analyses on behalf of outside customers or other departments. The ability to deliver raw data in addition to PDF reports increases confidence in the results and allows for reprocessing by the recipient.
BSSN Software and AnIML
BSSN Software has developed a comprehensive offering of products and services to help organizations of all sizes introduce AnIML into their processes. In addition, we support instrument manufacturers in implementing AnIML in their products.
Due to our constructive involvement and technical leadership in the AnIML development process, BSSN Software is uniquely positioned to deliver AnIML-based solutions very efficiently. Today.
View, manipulate and validate AnIML documents, Technique Definitions and Technique Extensions.
- Provides a 360° view of a sample
- Graphical workflow representation
- Dynamic Document Streaming (DDS) for large files
- Flexible cross-technique reporting
- Easy LIMS and ELN integration
Efficiently store and organize all your laboratory data for quick and easy retrieval.
- Sophisticated metadata management
- DDS for large data sets
- Quick, easy retrieval
- Designed with long-term archival in mind
- Easy integration with open and well-documented interfaces
- Works as a stand-alone application or together with your existing scientific data management system (SDMS)
Easily convert your proprietary instrument data to AnIML. Converters are available for many popular instruments and standard data formats.
Leverage our expertise for your next AnIML project. The following services are available:
- Data management strategy development
- AnIML implementation and rollout
- Legacy data migration
- Custom converter development
- Validation of AnIML implementations
- Technique Definition / Extension development
- LIMS and ELN integration
- Training and coaching
We realize that many instrument and software vendors are looking for strategies to adopting AnIML. BSSN Software’s professional services can help with planning and implementation. Most of our tools are available for OEM licensing and have various branding options. Additionally, we offer software components that make AnIML implementation painless.
Interested in learning more about AnIML? Contact us today for a consultation. Lab Informatics Specialists from BSSN Software can deliver remote or onsite workshops designed to familiarize you with the details of the AnIML standard and help you evaluate it. Together, we examine where it makes sense to integrated AnIML into your laboratory processes. After this workshop, you will know how AnIML works and what it will take to adopt it in your organization.
Top Ten Reasons for Using AnIML
- Open. Independent of individual vendors.
- Technique agnostic. A single format captures data from any measurement technique.
- Extensible. User-, vendor- and instrument specific content can easily be accommodated without hindering compatibility.
- Engineered with long-term archival in mind. Data are human readable. Proprietary software no longer needs to be maintained.
- Complex experiments. Workflows of arbitrary complexity can be captured accurately.
- Regulatory compliance. Audit trails, digital signatures and validation are available.
- Easy data exchange. Get the right data to the right people at the right time.
- Cross-technique data analysis. Analyze your data in new ways.
- Less software to deploy. A single tool can handle arbitrary experiments.
- Great tools and services. BSSN Software provides everything you need to deploy AnIML today.
Sign up For the Resource Library
Explore our Resource Library of white papers, articles, e-books and videos on Life Sciences R&D and Lab Informatics.