A command line Java application used for processing chemical files, structure standardisation, import into AMBIT database and processing AMBIT database entries.

Chemical structure standardization. Available since AMBIT 3.0.0.

Download

Usage

java -Xmx1536m -jar ambitcli{version}.jar -a standardize -i <inputfile> -m post -d page=page num -d pagesize=-1|page_size -o <output> -d tautomers=true -d splitfragments=true -d implicith=true -d smiles=false -d smilescanonical=true -d inchi=true -d neutralise=true -d isotopes=true

or in order to rename the default SMILES and InChI fields:

java -Xmx1536m -jar ambitcli.jar -a standardize -i <inputfile> -m post -d page=pagenum -d pagesize=-1|page_size -o <output> -d tautomers=true -d splitfragments=true -d implicith=true -d smiles=false -d smilescanonical=true -d inchi=true -d neutralise=true -d isotopes=true  -d tag_inchi=AMBIT_InChI -d tag_inchikey=AMBIT_InChIKey -d tag_smiles=AMBIT_SMILES -d tag_rank=TAUTOMER_RANK

Logging

Logging configuration can be specified via -Djava.util.logging.config.file option, specifying logging.properties file. If not specified, the default logging.properties is used.

$java -jar -Djava.util.logging.config.file=myLoggingConfigFilePath .... other options ....

For options other than standardisation see the main ambitcli page.

Standardization specific help:

    $java -jar ambitcli-{version}.jar - a standardize -m post -d <parameters>

    "Chemical structure standardization (-i inputfile.sdf -o outputfile.sdf , recognized by extensions .sdf , .csv, .cml , .txt)"
    -a standardize -m post
    -d smirks=null	// JSON file with SMIRKS transformations	[type:String, mandatory:false]
    -d neutralise=false	// If true neutralises the molecule via set of predefined SMIRKS	[type:Boolean, mandatory:false]
    -d splitfragments=false	// If true keeps the largest fragment	[type:Boolean, mandatory:false]
    -d implicith=false	// If true converts hydrogens to implicit	[type:Boolean, mandatory:false]
    -d generatestereofrom2d=false	// If true uses org.openscience.cdk.stereo.StereoElementFactory to generate the stereochemistry from 2D coordinates	[type:Boolean, mandatory:false]
    -d isotopes=false	// If true clears isotopes	[type:Boolean, mandatory:false]
    -d generate2D=false	// Generate 2d coordinates (if no any)	[type:Boolean, mandatory:false]
    -d tautomers=false	// If true generates the top ranked tautomer	[type:Boolean, mandatory:false]
    -d inchi=true	// Generates InChIs. If -d tautomers=true InChI FixedH=true, otherwise generates standard InChI	[type:Boolean, mandatory:false]
    -d smiles=true	// Generates SMILES (isomeric, kekule).  Uses CDK SmilesGenerator.isomeric()	[type:Boolean, mandatory:false]
    -d smilescanonical=false	// Generates SMILES (canonical).  Uses CDK SmilesGenerator.absolute()	[type:Boolean, mandatory:false]
    -d smilesaromatic=false	// Generates aromatic SMILES.  Uses CDK SmilesGenerator.aromatic()	[type:Boolean, mandatory:false]
    -d page=0	// Start page (first page = 0)	[type:Integer, mandatory:false]
    -d pagesize=20000	// Page size (in number of records). Set to -1 to read all records.	[type:Integer, mandatory:false]
    -d inputtag_smiles=SMILES	// Specifies the name of the column, containing SMILES in the input file	[type:String, mandatory:false]
    -d inputtag_inchi=InChI	// Specifies the name of the column, containing InChI in the input file	[type:String, mandatory:false]
    -d inputtag_inchikey=InChIKey	// Specifies the name of the column, containing InChIKey in the input file	[type:String, mandatory:false]
    -d tag_inchi=InChI	// Specifies the tag to store the generated InChI	[type:String, mandatory:false]
    -d tag_inchikey=InChIKey	// Specifies the tag to store the generated InChIKey	[type:String, mandatory:false]
    -d tag_smiles=SMILES	// Specifies the tag to store the generated SMILES	[type:String, mandatory:false]
    -d tag_rank=RANK	// Specifies the tag to store the tautomer rank (energy based, less is better)	[type:String, mandatory:false]
    -d tag_tokeep=	// Specifies which tags to keep, comma delimited list. Everything else will be removed. To keep all the tags, leave this empty.	[type:String, mandatory:false]
    -d sdftitle=null	// Specifies which field to write in the first SDF line null|inchikey|inchi|smiles|any-existing-field	[type:String, mandatory:false]
    -d debugatomtypes=false	// Writes only structures with AtomTypes property set. For debug purposes	[type:boolean, mandatory:false]

Examples:

  • Generate SMILES, InChI and InChI key, retain only the PUBCHEM_CID from the fields in the input file
    java -jar ambitcli-3.1.0-release.jar -a standardize -m post -a standardize -m post -d page=0 -d pagesize=-1 -d tautomers=false -d tag_tokeep=PUBCHEM_CID -d smilescanonical=false -d smiles=true -d inchi=true -i inputfile -o outputfile
  • Full standardisation, write SMILES, InChI and InChI key into AMBIT_SMILES, AMBIT_InChI, AMBIT_InChIKey, retain only the PUBCHEM_CID from the fields in the input file
    java -jar ambitcli-3.1.0-release.jar -a standardize -m post -a standardize -m post -d page=0 -d pagesize=-1 -d tag_smiles=AMBIT_SMILES -d tag_inchi=AMBIT_InChI -d tag_inchikey=AMBIT_InChIKey -d tautomers=true -d splitfragments=true -d implicith=true -d smiles=true -d smilescanonical=false -d inchi=true -d neutralise=true -d isotopes=true -d tag_tokeep=PUBCHEM_CID

Options

1.Transformation

 -d smirks=null|file.json

Chemical structure transformation by SMIRKS, implemented by ambit2-smirks package. The option expects either null (default) or a JSON file defining SMIRKS in the following format. Any number of transformations could be specified.

{
    "REACTIONS": [
        {
            "NAME": "Nitro group uncharged -> charged",
            "CLASS": "standardization",
            "SMIRKS": "[*:1][N:2](=[O:3])=[O:4]>>[*:1][N+:2](=[O:3])[O-:4]",
            "USE": true
        },
        {
            "NAME": "Nitro group charged -> uncharged",
            "CLASS": "standardization",
            "SMIRKS": "[*:1][N+:2](=[O:3])[O-:4]>>[*:1][N:2](=[O:3])=[O:4]",
            "USE": false
        }    
    ]
}

The example transformations above will convert the nitro groups from uncharged form to the charged one (if USE:false the transformation will be ignored).

2.Fragments

 -d splitfragments=true|false	

If true keeps the largest fragment. If false keeps the entire molecule, even if disconnected. Default is false.

3.Isotopes

 -d isotopes=true|false	

If true clears isotopes.

4.Neutralisation

 -d neutralise=true|false	

If true neutralises the molecule via set of predefined SMIRKS. This is an option for convenience only. Using the transformation option -d smirks with the same SMIRKS file will have the same effect.

5.Implicit hydrogens

 -d implicith=true|false

If true converts hydrogens to implicit. If false leaves the structure as it is. Default is false.

6.Stereochemistry

 -d generatestereofrom2d=true|false	

If true uses org.openscience.cdk.stereo.StereoElementFactory to generate the stereochemistry from 2D (stereo elements derived from 2D coordinates).

7.Tautomers

 -d tautomers=true|false		
 -d tag_rank=RANK	 

If true generates the top ranked tautomer via ambit-tautomers package doi:10.1002/minf.201200133. Default is false. The tag_rank option specifies the tag to store the tautomer rank (energy based, less is better).

Note: this is the slowest operation within the standardisation options, as it generates all tautomers and selects the top ranked one. Typical processing time is 200-500 msec per chemical structure.

8.InChI generation

 -d inchi=true|false
 -d tag_inchi=InChI	// Specifies the InChI tag	[type:String, mandatory:false]
 -d tag_inchikey=InChIKey	// Specifies the InChIKey tag	[type:String, mandatory:false]

Generates InChIs. If -d tautomers=true uses InChI option FixedH=true, otherwise generates standard InChI. If false does not generate InChI. Default is true.

9.SMILES generation

    -d smiles=true	// Generates SMILES (isomeric, kekule).  Uses CDK SmilesGenerator.isomeric()	[type:Boolean, mandatory:false]
    -d smilescanonical=false	// Generates SMILES (canonical).  Uses CDK SmilesGenerator.absolute()	[type:Boolean, mandatory:false]
    -d smilesaromatic=false	// Generates aromatic SMILES.  Uses CDK SmilesGenerator.aromatic()	[type:Boolean, mandatory:false]

10.Page/Pagesize

 -d page=0	// Start page (first page = 0)	[type:Integer]
 -d pagesize=20000	// Page size (in number of records)	[type:Integer]

Used to process specific part of the file (e.g. -d page=2 -d page=100 will skip the first 200 records).

11. SDF file molecule name

 -d "sdftitle=InChIKey"	

If the output is SDF file, will write the specified property in the first line

12. Input tags

-d inputtag_smiles=SMILES // Specifies the name of the column, containing SMILES [type:String, mandatory:false]

Back to top

Last Published: 2018-05-16.