ambitcli | AMBIT2

A command line Java application used for processing chemical files, structure standardisation, import into AMBIT database and processing AMBIT database entries.

Chemical structure standardization. Available since AMBIT 3.0.0.

Download

Latest release ambitcli-3.1.0
All releases
Pre-release ambitcli-3.2.0-{buildnumber}.jar
Development Maven repository

Usage

java -Xmx1536m -jar ambitcli{version}.jar -a standardize -i <inputfile> -m post -d page=page num -d pagesize=-1|page_size -o <output> -d tautomers=true -d splitfragments=true -d implicith=true -d smiles=false -d smilescanonical=true -d inchi=true -d neutralise=true -d isotopes=true

or in order to rename the default SMILES and InChI fields:

java -Xmx1536m -jar ambitcli.jar -a standardize -i <inputfile> -m post -d page=pagenum -d pagesize=-1|page_size -o <output> -d tautomers=true -d splitfragments=true -d implicith=true -d smiles=false -d smilescanonical=true -d inchi=true -d neutralise=true -d isotopes=true  -d tag_inchi=AMBIT_InChI -d tag_inchikey=AMBIT_InChIKey -d tag_smiles=AMBIT_SMILES -d tag_rank=TAUTOMER_RANK

Logging

Logging configuration can be specified via -Djava.util.logging.config.file option, specifying logging.properties file. If not specified, the default logging.properties is used.

$java -jar -Djava.util.logging.config.file=myLoggingConfigFilePath .... other options ....

For options other than standardisation see the main ambitcli page.

Standardization specific help:

    $java -jar ambitcli-{version}.jar - a standardize -m post -d <parameters>

    "Chemical structure standardization (-i inputfile.sdf -o outputfile.sdf , recognized by extensions .sdf , .csv, .cml , .txt)"
    -a standardize -m post
    -d smirks=null	// JSON file with SMIRKS transformations	[type:String, mandatory:false]
    -d neutralise=false	// If true neutralises the molecule via set of predefined SMIRKS	[type:Boolean, mandatory:false]
    -d splitfragments=false	// If true keeps the largest fragment	[type:Boolean, mandatory:false]
    -d implicith=false	// If true converts hydrogens to implicit	[type:Boolean, mandatory:false]
    -d generatestereofrom2d=false	// If true uses org.openscience.cdk.stereo.StereoElementFactory to generate the stereochemistry from 2D coordinates	[type:Boolean, mandatory:false]
    -d isotopes=false	// If true clears isotopes	[type:Boolean, mandatory:false]
    -d generate2D=false	// Generate 2d coordinates (if no any)	[type:Boolean, mandatory:false]
    -d tautomers=false	// If true generates the top ranked tautomer	[type:Boolean, mandatory:false]
    -d inchi=true	// Generates InChIs. If -d tautomers=true InChI FixedH=true, otherwise generates standard InChI	[type:Boolean, mandatory:false]
    -d smiles=true	// Generates SMILES (isomeric, kekule).  Uses CDK SmilesGenerator.isomeric()	[type:Boolean, mandatory:false]
    -d smilescanonical=false	// Generates SMILES (canonical).  Uses CDK SmilesGenerator.absolute()	[type:Boolean, mandatory:false]
    -d smilesaromatic=false	// Generates aromatic SMILES.  Uses CDK SmilesGenerator.aromatic()	[type:Boolean, mandatory:false]
    -d page=0	// Start page (first page = 0)	[type:Integer, mandatory:false]
    -d pagesize=20000	// Page size (in number of records). Set to -1 to read all records.	[type:Integer, mandatory:false]
    -d inputtag_smiles=SMILES	// Specifies the name of the column, containing SMILES in the input file	[type:String, mandatory:false]
    -d inputtag_inchi=InChI	// Specifies the name of the column, containing InChI in the input file	[type:String, mandatory:false]
    -d inputtag_inchikey=InChIKey	// Specifies the name of the column, containing InChIKey in the input file	[type:String, mandatory:false]
    -d tag_inchi=InChI	// Specifies the tag to store the generated InChI	[type:String, mandatory:false]
    -d tag_inchikey=InChIKey	// Specifies the tag to store the generated InChIKey	[type:String, mandatory:false]
    -d tag_smiles=SMILES	// Specifies the tag to store the generated SMILES	[type:String, mandatory:false]
    -d tag_rank=RANK	// Specifies the tag to store the tautomer rank (energy based, less is better)	[type:String, mandatory:false]
    -d tag_tokeep=	// Specifies which tags to keep, comma delimited list. Everything else will be removed. To keep all the tags, leave this empty.	[type:String, mandatory:false]
    -d sdftitle=null	// Specifies which field to write in the first SDF line null|inchikey|inchi|smiles|any-existing-field	[type:String, mandatory:false]
    -d debugatomtypes=false	// Writes only structures with AtomTypes property set. For debug purposes	[type:boolean, mandatory:false]

Examples:

Generate SMILES, InChI and InChI key, retain only the PUBCHEM_CID from the fields in the input file

    java -jar ambitcli-3.1.0-release.jar -a standardize -m post -a standardize -m post -d page=0 -d pagesize=-1 -d tautomers=false -d tag_tokeep=PUBCHEM_CID -d smilescanonical=false -d smiles=true -d inchi=true -i inputfile -o outputfile

Full standardisation, write SMILES, InChI and InChI key into AMBIT_SMILES, AMBIT_InChI, AMBIT_InChIKey, retain only the PUBCHEM_CID from the fields in the input file

    java -jar ambitcli-3.1.0-release.jar -a standardize -m post -a standardize -m post -d page=0 -d pagesize=-1 -d tag_smiles=AMBIT_SMILES -d tag_inchi=AMBIT_InChI -d tag_inchikey=AMBIT_InChIKey -d tautomers=true -d splitfragments=true -d implicith=true -d smiles=true -d smilescanonical=false -d inchi=true -d neutralise=true -d isotopes=true -d tag_tokeep=PUBCHEM_CID

Options

1.Transformation

 -d smirks=null|file.json

Chemical structure transformation by SMIRKS, implemented by ambit2-smirks package. The option expects either null (default) or a JSON file defining SMIRKS in the following format. Any number of transformations could be specified.

{
    "REACTIONS": [
        {
            "NAME": "Nitro group uncharged -> charged",
            "CLASS": "standardization",
            "SMIRKS": "[*:1][N:2](=[O:3])=[O:4]>>[*:1][N+:2](=[O:3])[O-:4]",
            "USE": true
        },
        {
            "NAME": "Nitro group charged -> uncharged",
            "CLASS": "standardization",
            "SMIRKS": "[*:1][N+:2](=[O:3])[O-:4]>>[*:1][N:2](=[O:3])=[O:4]",
            "USE": false
        }    
    ]
}

The example transformations above will convert the nitro groups from uncharged form to the charged one (if USE:false the transformation will be ignored).

2.Fragments

 -d splitfragments=true|false

If true keeps the largest fragment. If false keeps the entire molecule, even if disconnected. Default is false.

3.Isotopes

 -d isotopes=true|false

If true clears isotopes.

4.Neutralisation

 -d neutralise=true|false

If true neutralises the molecule via set of predefined SMIRKS. This is an option for convenience only. Using the transformation option -d smirks with the same SMIRKS file will have the same effect.

5.Implicit hydrogens

 -d implicith=true|false

If true converts hydrogens to implicit. If false leaves the structure as it is. Default is false.

6.Stereochemistry

 -d generatestereofrom2d=true|false

If true uses org.openscience.cdk.stereo.StereoElementFactory to generate the stereochemistry from 2D (stereo elements derived from 2D coordinates).

7.Tautomers

 -d tautomers=true|false		
 -d tag_rank=RANK

If true generates the top ranked tautomer via ambit-tautomers package doi:10.1002/minf.201200133. Default is false. The tag_rank option specifies the tag to store the tautomer rank (energy based, less is better).

Note: this is the slowest operation within the standardisation options, as it generates all tautomers and selects the top ranked one. Typical processing time is 200-500 msec per chemical structure.

8.InChI generation

 -d inchi=true|false
 -d tag_inchi=InChI	// Specifies the InChI tag	[type:String, mandatory:false]
 -d tag_inchikey=InChIKey	// Specifies the InChIKey tag	[type:String, mandatory:false]

Generates InChIs. If -d tautomers=true uses InChI option FixedH=true, otherwise generates standard InChI. If false does not generate InChI. Default is true.

9.SMILES generation

    -d smiles=true	// Generates SMILES (isomeric, kekule).  Uses CDK SmilesGenerator.isomeric()	[type:Boolean, mandatory:false]
    -d smilescanonical=false	// Generates SMILES (canonical).  Uses CDK SmilesGenerator.absolute()	[type:Boolean, mandatory:false]
    -d smilesaromatic=false	// Generates aromatic SMILES.  Uses CDK SmilesGenerator.aromatic()	[type:Boolean, mandatory:false]

10.Page/Pagesize

 -d page=0	// Start page (first page = 0)	[type:Integer]
 -d pagesize=20000	// Page size (in number of records)	[type:Integer]

Used to process specific part of the file (e.g. -d page=2 -d page=100 will skip the first 200 records).

11. SDF file molecule name

 -d "sdftitle=InChIKey"

If the output is SDF file, will write the specified property in the first line

12. Input tags

-d inputtag_smiles=SMILES // Specifies the name of the column, containing SMILES [type:String, mandatory:false]