关键词:
Chemical data
Chemical patents
Information storage
摘要:
The generation of topological fragments from generic structures for full structure and substructure searching is described;these include fragments from components described either in specific or in generic terms, and those which overlap them. Fragments derived wholly from within partial structures (PS's) are termed intra-PS fragments, while those which span partial structures are termed inter-PS fragments. Where the generation of fragments from generic radicals is involved, the methods depend on searching the members of the fragment set against the intensional description of generic radical terms (or Homologous Series Identifiers, HSI), represented as default or explicit parameter values, to determine which fragments contribute to the description of the potential specific members of the homologous radical described by the HSI. High efficiency in screen generation is necessary because of the much higher number of screens generated from generic structures in comparison with specific structures and because of the complexity of the data structures involved. To achieve this efficiency in processing, a tree-structured dictionary of fragment types (augmented atoms and atom sequences) is created, against which potential fragments are matched, so that the appropriate level of description is correctly selected. Both of these fragment types are considered in this and the following paper. The interrelation of the fragments is facilitated by the Extended Connection Table Representation (ECTR) and by the bubble-up process which makes use of a logical matrix to represent the inter-PS relationships. Fragment screens are organized as a two-part vector in which one part indicates the presence of a fragment in an invariant feature of the structure (MUST screens), and the second (POSS screens) is formed from the union of the MUST fragments with those which are optional in the structure (the MAY screens).