Study of Diversity and Similarity of Large Chemical Databases Using Tanimoto Measure

dc.contributor.author Sankara Rao, A.
dc.contributor.author Durga Bhavani, S.
dc.contributor.author Sobha Rani, T.
dc.contributor.author Bapi, Raju S.
dc.contributor.author Narahari Sastry, G.
dc.date.accessioned 2022-03-27T05:50:50Z
dc.date.available 2022-03-27T05:50:50Z
dc.date.issued 2011-12-16
dc.description.abstract ZINC is a freely available chemical database which contains 27 million compounds including Drug-like, Natural Products, FDA etc., along with 9 molecular features. In this paper firstly we compute an additional number of 49 molecular features and represent the entire chemical space in the 58-length finger print space. Tanimoto metric, a popular similarity measure is used to mine the chemical space for extracting similar and diverse fingerprints. One of the important issues is that of choosing a proper reference string. Experiments with different reference strings are carried out to assess the appropriateness of a reference string. A finger print which is constituted by mandating non-trivial presence of each feature is found to be the best. Further a method which is independent of reference string is proposed using pairwise distribution but this raises the time complexity from linear to quadratic. A subgoal of this paper is also to propose a scheme that extracts a small sample data set that reflects the similarity and diversity of the population. Towards this, we conduct stratified sampling of Natural Products Database(NPD) which has 90,000 chemical compounds by dividing the space along strata representing distinct structures (rings) and then compute pairwise similarity profile. This scheme can be extended to other data bases that reside in ZINC. © Springer-Verlag Berlin Heidelberg 2011.
dc.identifier.citation Communications in Computer and Information Science. v.157 CCIS
dc.identifier.issn 18650929
dc.identifier.uri 10.1007/978-3-642-22786-8_5
dc.identifier.uri http://link.springer.com/10.1007/978-3-642-22786-8_5
dc.identifier.uri https://dspace.uohyd.ac.in/handle/1/8261
dc.subject Chemical space
dc.subject Functional groups
dc.subject Molecular finger print
dc.subject Representative set
dc.subject Stratified sampling
dc.title Study of Diversity and Similarity of Large Chemical Databases Using Tanimoto Measure
dc.type Book Series. Conference Paper
dspace.entity.type
Files
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Plain Text
Description: