1 marpat basics (markush structures). 2 where did the term markush come from? in 1923 dr eugene a...
TRANSCRIPT
2
Where did the term Markush come from?In 1923 Dr Eugene A Markush filed a patent application in the United States concerning a method of preparing pyrazoline dyes that could be used for wool or silk.Claim 1 of this application read:
…coupling with a halogen substituted pyrazolone a diazotizated unsulphonated material selected from a group consisting of aniline, homologues of aniline and halogen substitution products of aniline.
The claim was challenged as being too unspecific. On appeal the US Commissioner of Patents ruled on the propriety of such claims. The patent was granted in 1924 as US 1,506,316.
Introduction
3
Markush StructuresMarkush structures in patents condense an entire set of implied substances into a single representation using real atoms and R groups.
R1 = alkyl of 1-6 carbon atoms or alkenyl of 2-6 carbon atoms R2 = alkyl of 1-6 carbon atoms R3 and R4 = H or alkyl of 1-6 carbon atoms
R2
R1
R3
R4
Introduction
4
Introduction
Prophetic substances, e.g., those encompassed by the Markush structures in patent claims, are not indexednot indexed in CAPLUS and REGISTRY (if missing).
5
Introduction
Prophetic substances, e.g., those encompassed by the Markush structures in patent claims, are not indexednot indexed in CAPLUS with RNs, even if they are indexed in REGISTRY.
6
Introduction
But any patent containing Markush structures has examples But any patent containing Markush structures has examples correlated to the Markush structures.correlated to the Markush structures.
7
R
R’
R = pyridyl ring (isolated/embedded)
R' = carbon chain of any length
No additional fusion on the polycyclic ring
Additional substitution allowed at all open sites
What has been reported on compounds with the following structural characteristics?
Introduction
9
=> l1 fullFULL SEARCH INITIATED 04:51:57FULL SCREEN SEARCH COMPLETED - 100.0% PROCESSED 4 ITERATIONS SEARCH TIME: 00.00.01 L3 0 SEA SSS FUL L1 => fil marpat
Introduction
10
=> l3 fullFULL SEARCH INITIATED 04:52:43FULL SCREEN SEARCH COMPLETED - 100.0% PROCESSED 139 ITERATIONSSEARCH TIME: 00.00.02 L5 3 SEA SSS FUL L1
Introduction
12
• The MARPAT database– produced by CAS and available only on STN,– contains structural representations of the
Markush structures that appear in patent claims.
• A Markush structure condenses a set of implied substances into a single representation.
• Only Markush structures are searchable in Only Markush structures are searchable in MARPAT.MARPAT.
MARPAT Overview
13
• The same types of structure queries can be searched in REGISTRY and MARPAT.
• MARPAT queries may contain:– Specific atoms and shortcuts– Variable groups– G-groups– Specific bonds– Unspecified bonds– Isolated rings– Ring/chain nodes– Not ring/chain bondsNot ring/chain bonds
MARPAT Overview
14
In Marpat are available the following structure searches:
SSS CSS
Sample Full Range Subset
MARPAT Overview
15
Search Scope Iterations Answers Minutes Online SAMPLE 2,000 50 5 Subset SAMPLE 2,000 50 5 Online FULL 100,000 100,000 30 Subset FULL 100,000 100,000 30 BATCH FULL 150,000 150,000 180 Online RANGE 100,000 100,000 30 Subset RANGE 100,000 100,000 30 BATCH RANGE 150,000 150,000 180
MARPAT Overview
16
MARPAT OverviewYears of Coverage Currently 1988+ ;
When addition is complete, 1961+
Update Frequency Weekly
Markush Structures Over 640,000 Markush structures in MARPAT Over 27M small molecule structures in REGISTRY
Source of Data 50 patent-granting authorities
Patent Records 247,000 (in MARPAT)5.1M (in CA)
Bibliographic Data Displayable; Searchable in CAPLUS
SDI availability Yes
17
MARPAT Enhacements
Approximately 40,000 Markush structures derived from Institute National de la Propriete Industrielle (INPI) data have been added to MARPAT from the pre-1988 time period. Additional records back to the early 1960s will continue to be added during 2006.
MARPAT does not have File Segments, but each INPI Markush structure includes the following note: “Record may include structures from Record may include structures from disclosuredisclosure.”
18
MARPAT Enhacements
=> fil marpat => Record may include structures from disclosureRecord may include structures from disclosureL1 40359 RECORD MAY INCLUDE STRUCTURES FROM DISCLOSURE (RECORD(W)MAY(W)INCLUDE(W)STRUCTURES(1W)DISCLOSURE)
=> sel l1 1-10000 anL2 SEL L1 1-10000 AN : 10047 TERMS
=> fil hcaplus => l2/dnL5 10047 L2/DN
19
MARPAT Enhacements
=> l5 and polymers/cc=> l5 and polymers/ccL6 293 L5 AND POLYMERS/CCL6 293 L5 AND POLYMERS/CC TI Preparation of amides from nitriles and amines TI Olefinic benzocyclobutene polymers and processes for their preparation TI N-tert-alkyl-n-sec-alkyl secondary amine compounds TI N-substituted carbamoyl lactams TI Catalysts for the polymerization of olefins
20
MARPAT Enhacements
=> l5 and alloys/cc=> l5 and alloys/ccL7 46 L5 AND ALLOYS/CCL7 46 L5 AND ALLOYS/CC TI Benzoylalamine as corrosion inhibitor for aqueous systems TI Dicyclopentadiene dicarboxylic acid salts as corrosion inhibitors TI Copper etching process and solution
TI Alkaline cleaning bath for aluminum
21
MARPAT Enhacements
=> d l1 1, 40359 an AN 2003:785279 HCAPLUSDN 139:283282DN 139:283282 Correction of: 104:177615 Correction of: 104:177615 AN 1912:8178 CAPLUSDN 6:8178DN 6:8178OREF 6:1274g-i
22
MARPAT Enhacements
1CI 1907 v1 6CI 1957 V51 1908 v2 1958 V52 1909 v3 1959 V53 1910 v4 1960 V54 1911 v5 1961 V551961 V55. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4CI 1937 v31 1938 v32 12CI 1987 V106 and V107 1939 v33 1988 V108 and V109 1940 v34 1989 V110 and V1111989 V110 and V111
23
Feature Notes
Source of the Markush structures
Structure from claim – or disclosure if no Markush in the claims
Additional details from disclosure
Search access points
Substructure seraching
Text terms from MPL, NTE, and other text fields associated with the Markush structure
CAplus information Displayable, not searchable
MARPAT Overview
24
• All MARPAT records are also in CAplus.All MARPAT records are also in CAplus.
• They have the same Accession Number (DN).
• CAplus information can be displayed in MARPAT.
MARPAT Overview
25
AN 133:290336 MARPATTI Coordination compounds with ligands of a nitrogen heterocycle and Organic electroluminescent device using these complexesIN Kim, Kong-Kyeom; Son, Se-Hwan; Kim, Ok-Hee; Yoon, Seok-Hee; Bae, Jae-Soon; Lee, Youn-Gu; Kim, Hyo-SeokPA LG Chemical, Ltd., S. KoreaSO PCT Int. Appl., 47 pp. CODEN: PIXXD2DT PatentLA EnglishIC ICM C07F001-00 ICS C07F003-00; C09K011-06; H05B033-14CC 78-7 (Inorganic Chemicals and Reactions) Section cross-reference(s): 28, 73
o o o
BIB information.
MARPAT Overview
/DN in CAPlus
26
o o oFAN.CNT 1PI PATENT NO. KIND DATE APPLICATION NO. DATE ------------- ---- ---- --------------- -----PI WO 2000058315 A1 20001005 WO 2000-KR289 20000330 W: CA, CN, JP RW: AT, BE, CH, CY, DE, DK, ES, FI, FR, GB, GR, IE, IT, LU, MC, NL, PT, SEPRAI KR 1999-11160 19990331 AB Disclosed are new coordination compds. having light-emitting and electron-transporting
o o oST zinc benzothiazole deriv complex prepn electroluminescent deviceIT Electroluminescent devices (zinc benzothiazole deriv. complex as electron- transporting material for)IT 7429-90-5D, Aluminum, complexes with nitrogen- contg. heterocycles
o o o(continued on next page)
Caplus subject and CAS RN indexing.
CAplus abstract.
BIB information includes patent family members.
MARPAT Overview
27
IT 103-72-0, Phenyl isothiocyanate 105-53-3, Diethyl malonate 122-39-4, Diphenylamine, reactions 557-34-6, Zinc acetate 1076-38-6, 4-Hydroxycoumarin 1677-46-9 RL: RCT (Reactant) (reactant for prepn. of zinc benzothiazole deriv. complexes as electron-transporting material for electroluminescent devices)RE.CNT 6RE(1) Chen, C; US 6020078 A 2000 CAPLUS(2) Lg Chemical Ltd; WO 9837736 A1 1998 CAPLUS(3) Lg Chemical Ltd; WO 9963023 A1 1999 CAPLUS(4) Sanyo Electric Co Ltd; EP 0743809 A2 1996 CAPLUS(5) Shinko Electric Industries Co Ltd; EP 0801518 A2 1997 CAPLUS(6) Xerox Corporation; EP 0862353 A2 1998 CAPLUS
o o o
Citations from the patent.
MARPAT Overview
RNs, not searchablenot searchable.
28
MSTR 1
G1 = 7 / (SC 26 / 40 / 57 / 74 / 95 / 104 / 113 / 122 / 131)
(continued on next page)
G5G1 G8
7
NG2
G4
26
G2
N 40
G2
N
57
G2
N
74
G2
N95
G2
NN
N
104
G2
NN 113
G2
N
Markush structure from the patent claims.
MARPAT Overview
29
G2 = CH2 (SO) / O / S / Se / 8
G3 = alkyl (SO) / aryl (SO)G4 = R<TX "moiety to form aromatic or heterocyclic ring">G5 = 15 / (SC 140 / 153 / 166 / 182 / 195 / 211 / 227 / 253 / 265 / 277 / 289 / 301 / 314)
o o oG6 = O / S / 20
G7 = R<TX "moiety to form aromatic or heterocyclic ring">G8 = R<TX "metal"> / (SC Li / Be / Zn / Mg / Ga / In / Al) MPL: claim 1 NTE: as complex with G8
122
G2
N
N
131
G2
NS
8N G3
15G6
O
G7
HO140
G6
OHO153
G6
OHO
Me
166G6
OHO
20N G3
MARPAT Overview
30
• Searching both MARPAT and REGISTRY enhances substructure search recall of the patent literature by retrieving both– Specific compoundSpecific compound matches, from REGISTRY– Prophetic compoundProphetic compound matches, from MARPAT
• Techniques for conducting a substructure search in MARPAT are similar to those used in REGISTRY.
MARPAT Overview
31
Specific Substances, from a patent, are indexed in Registry, and in CAPlus, with a RN.
MARPAT Overview
For patent documents, the following substances are
indexed with a RN.
32
From examplesFrom examples
From tables (disclosure)From tables (disclosure)
From claims when a MarkushFrom claims when a Markush structure is exactly structure is exactly defined (1980 only)defined (1980 only)
MARPAT Overview
35
AN 127:62046 MARPATTI Preparation of arylthioalkyl- and arylthioalkenylphosphonic acids and derivatives as herbicides
IT 191411-58-2P 191411-61-7P 191411-63-9P 191411-65-1P 191411-67-3P 191411-69-5P 191411-71-9P 191411-73-1P 191411-74-2P 191411-75-3P 191411-76-4P 191411-77-5P 191411-78-6P 191411-79-7P 191411-80-0P 191411-81-1P 191411-82-2P 191411-83-3P 191411-84-4P 191411-85-5P 191411-86-6P 191411-87-7P 191411-88-8P 191411-89-9P 191411-90-2P 191411-91-3P 191411-92-4P 191411-93-5P 191411-94-6P 191411-95-7P 191411-96-8P 191411-97-9P 191411-98-0P 191411-99-1P 191412-00-7P RL: AGR (Agricultural use); SPN (Synthetic preparation); BIOL (Biological study); PREP (Preparation); USES (Uses) (prepn. as herbicide)
CAS RN 191411-58-2 from Example 1
(CH2)4 OEt
OEt
P
O
S
NH2
CAS RN 191411-86-6 from Claim 4
PO3H2S CH2 CH2 CH
OH
CH
MARPAT Overview
36
• PropheticProphetic substances, e.g., those represented by Markush structures in patent claims, are not generally indexed in CAplus.
• Markush structures provide structure-searchable access (ONLYONLY) to the substances covered by patent claims and disclosures, such as the prophetic substances.
• Only patents with Markush structures are indexed in Marpat
MARPAT Overview
37
Each specific compound generated based on the Markush variables is not indexed in CAplus.
MARPAT Overview
38
MSTR 1
G1 = H / F / Cl / Br / IG2 = NH2 / OH / alkylcarbonyloxy<(1-4)> / OCHO /OCOPhG3 = S / S(O) / SO2
G1
G2
9G3 G4 11P
O
G5
G9
MARPAT Overview
39
G5 = OH / 26 / 32
G6 = alkyl<(1-4)> / 28
G8 = alkali metal atom / NH3 /
R<TX "organic ammonium cation">
G4 = CH2CH2CH2CH2 / 13-9 16-11 / 17-9 20-11 / 21-9 24-11
13HC CH CH216CH2 17H2C CH CH 20CH2 21
H2C CH2 CH 24CH
26O G6 32OH G8
MARPAT Overview
40
• Consider searching both REGISTRY and MARPAT when
– It is important to comprehensively search all structural possibilities covered in patent claims
– REGISTRY searches turn up no hits
– It is important to cover "incompletely defined" compounds (/ IDS) that might not match your query specifications in REGISTRY
MARPAT Overview
41
Search Question:
What has been reported on compounds with the following structural characteristics?
R1 = pyridyl ring (isolated/embedded)R2 = carbon chain of any lengthNo additional fusion on the polycyclic ringAdditional substitution allowed at all open sites
MARPAT Overview
43
chain nodes :20 21 ring nodes :1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 chain bonds :2-21 3-20 ring bonds :1-2 1-6 2-3 3-4 4-5 5-6 5-7 6-9 7-8 7-10 8-9 8-13 10-11 11-12 12-13 14-15 14-19 15-16 16-17 17-18 18-19 exact/norm bonds :2-21 3-20 5-6 5-7 6-9 7-8 7-10 8-9 8-13 10-11 11-12 12-13 exact bonds :1-2 1-6 2-3 3-4 4-5 normalized bonds :14-15 14-19 15-16 16-17 17-18 18-19 Match level :1:Atom 2:Atom 3:Atom 4:Atom 5:Atom 6:Atom 7:Atom 8:Atom 9:Atom 10:Atom 11:Atom 12:Atom 13:Atom 14:Atom 15:Atom 16:Atom 17:Atom 18:Atom 19:Atom 20:CLASS 21:CLASS 22:CLASS
MARPAT Overview
44
=> FILE REGISTRY=>Uploading mar1.strL1 STRUCTURE UPLOADED
=> D L1L1 HAS NO ANSWERSL1 STR
(continued on next page)N
N
NO
N
Ak
MARPAT Overview
45
=> S L1 SSS SAM
L2 0 SEA SSS SAM L1
=> S L1 SSS FULL
L3 0 SEA SSS FUL L1
The REGISTRY search turns up no substances.
MARPAT Overview
46
• "No hits" in REGISTRY means that no substances represented by the structure query were indexed in CAplus from 1967-present (or in CAOLD from the 1957-1966 Molecular Formula Indexes).
• The substance of interest could be encompassed in the prophetic substances covered by the Markush structure in a patent claim.
MARPAT Overview
47
=> FILE MARPAT
=> S L3 SSS SAMSAMPLE SEARCH INITIATED 13:55:20 FILE 'MARPAT' SAMPLE SCREEN SEARCH COMPLETED - 7 TO ITERATE100.0% PROCESSED 7 ITERATIONS 0 ANSWERSSEARCH TIME: 00.00.03FULL FILE PROJECTIONS: ONLINE **COMPLETE** BATCH **COMPLETE**PROJECTED ITERATIONS: 7 TO 299PROJECTED ANSWERS: 0 TO 0
L5 0 SEA SSS SAM L1
(continued on next page)
Searching MARPAT for prophetic substance matches
Use a SAMPLE search in MARPAT to verify the search will run to completion.
MARPAT Overview
48
=> S L3 SSS FULLFULL SEARCH INITIATED 13:54:52 FILE 'MARPAT' FULL SCREEN SEARCH COMPLETED - 89 TO ITERATE100.0% PROCESSED 89 ITERATIONS 3 ANSWERSSEARCH TIME: 00.00.09
L4 3 SEA SSS FUL L1
=> D 1-3 BIB ABS
L6 ANSWER 1 OF 3 MARPAT COPYRIGHT 2000 ACSAN 127:293247 MARPATTI Preparation of pyrrolopyrazines as GABAa receptor ligandsIN Blum, Charles; Hutchison, AlanPA Neurogen Corp., USA
o o o
The MARPAT search retrieves 3 patents.
HELPFUL HINT:HELPFUL HINT: Search the L-number resulting from the REGISTRY substructure Search the L-number resulting from the REGISTRY substructure search to take advantage of a lower, “extended” search fee in MARPAT.search to take advantage of a lower, “extended” search fee in MARPAT.For additional information on file-specific charges see HELP COST in the file.
MARPAT Overview
49
L6 ANSWER 2 OF 3 MARPAT COPYRIGHT 2000 ACSAN 126:199581 MARPATTI aryl substituted pyrrolopyrazines as a new class of GABA brain Receptor ligandsIN Blum, Charles; Hutchison, AlanPA Neurogen Corporation, USASO U.S., 16 pp. Cont.-in-part of U.S. 5,286,860. CODEN: USXXAM
o o oL6 ANSWER 3 OF 3 MARPAT COPYRIGHT 2000 ACSAN 120:245160 MARPATTI Preparation of indolopyrazines and related compounds as brain GABAa agonists, antagonists, or inverse agonistsIN Blum, Charles; Hutchison, AlanPA Neurogen Corp., USASO U.S., 17 pp.
o o o(continued on next page)
MARPAT Overview
50
o o oFAN.CNT 3
PATENT NO. KIND DATE APPLICATION NO. DATE --------------- ---- -------- --------------- --------PI US 5286860 A 19940215 US 92-975409 19921112 WO 9411374 A1 19940526 WO 93-US10870 19931110 W: AT, AU, BB, BG, BR, BY, CA, CH, CZ, DE, DK, ES, FI, GB, HU, JP, KP, KR, KZ, LK, LU, MG, MN, MW, NL, NO, NZ, PL, RO, RU, SD, SE, SK, UA, US, VN RW: AT, BE, CH, DE, DK, ES, FR, GB, GR, IE, IT, LU, MC, NL, PT, SE, BF, BJ, CF, CG, CI, CM, GA, GN, ML, MR, NE, SN, TD, TG AU 9455526 A1 19940608 AU 94-55526 19931110 US 5606059 A 19970225 US 95-436252 19950512 US 5668283 A 19970916 US 95-486595 19950607
o o o
MARPAT Overview
51
Formats showing structures:
Use this format If you want to displayHITSTR In CAplus: The structure(s) of the
specific compounds(s) that caused theanswer to be retrieved
FQHIT In MARPAT: The hit portion only for thefirst hit Markush structure in the answer
FHIT In MARPAT: The hit Markush structure inthe answer with the hit portionshighlighted
MSTR In MARPAT: All the Markush structuresassociated with an answer
Note: This display can be very long forcomplex Markush structures
CASLINK
52
=> d an fhit
AN 127:293247 MARPAT
MSTR 2A
1NG3 3
N
N
G1
HG1 = Ph (SO (1-2) G2) / thienyl (SO (1-2) G2) / pyridyl (SO (1-2) G2)
MARPAT Overview
N
N
NO
N
Ak
53
1NG3 3
N
N
G1
HG3 = 14-3 15-1 / 74-3 75-1
14G6
15G4G5
G5 74G29
G30
75G34
MARPAT Overview
N
N
NO
N
Ak
54
14G6
15G4G5
G5 74G29
G30
75G34
G4 = (0-2) CH2G5 = H / alkyl<(1-6)>G6 = 19 / C(O) / 22 . . . . . . . . . . . . .
MARPAT Overview
N
N
NO
N
Ak
55
REGISTRY and MARPAT, Retrieval Differences:
• In REGISTRY, only specific substituents are present in the database compounds. Substructures match against specific substituents.
• In MARPAT, substructures may match against– Specific substituents: H, CF3, CN, etc.– Generic substituents: X, HY, alkoxycarbonyl, etc.
MARPAT Overview
56
• Answers in each database are different:
– Answers in REGISTRY are compounds. The bibliographic references and abstracts associated with each compound (from patents and journals) are available in CAplus.
– Answers in MARPAT are references to patents.
MARPAT OverviewREGISTRY and MARPAT, Retrieval Differences:
58
Incompletely defined substances are assigned CAS RNs in REGISTRY when all of the atoms in the structure are defined and the only uncertainty is one or more of the following:
Attachment position for a substituent
Site of saturation/unsaturation
Site of esterification/etherification
Branching of an alkyl group
Generally it is not possible to anticipate and allow for all of these possibilities in a structure query for REGISTRY. As a result, some potentially interesting substances may be missed in a substructure searchsubstructure search in REGISTRY.
Incompletely defined substances
59
MARPAT can effectively encompass an incompletely defined portion of the molecule in the Markush structure with its ability to support:
Variable attachment points for substituents G-group lists of substituent possibilities
Incompletely defined substances
60
Find all non-polymeric glycerides Find all non-polymeric glycerides containing oleic and/or stearic containing oleic and/or stearic fatty acid. fatty acid. Show as Marpat can find some Show as Marpat can find some patents,due to IDS compound, not patents,due to IDS compound, not retrieved by structure search.retrieved by structure search.
Incompletely defined substances
61
Incompletely defined substances => fil marpat L1 STRUCTURE UPLOADED
=> l1 FULL FILE PROJECTIONS: ONLINE **INCOMPLETE** BATCH **INCOMPLETE**
O CH2
CH2
Me7-7
7-7
O
CH2
CH
CH2
O
O
62
Incompletely defined substances=> fil reg => c h o/elf(p)21-57/c(p)4-6/o not (pms/ci or rsd/fa)L3 17741 C H O/ELF(P)21-57/C(P)4-6/O NOT . . . . => fil hcaplus => l3 and marpat/osL4 2860 L3 AND MARPAT/OS => fil marpat => l4L5 2860 L4 => l1 full subset=l5L7 870 SEA SUB=L5 SSS FUL L1 ((Patents in MarpatPatents in Marpat))
63
Incompletely defined substances
=> fil reg => l1 full subset=l3L9 1517 SEA SUB=L3 SSS FUL L1 => l3 and (propan? or glycer?)/cns(l)(octadecen? or oleic? or octadecan? or stear?) L10 1330 L3 AND (PROPAN? OR GLYCER?)/CNS(L) . . => l10 and ids/ciL11 909 L10 AND IDS/CI => l11 not l9L12 885 L11 NOT L9 ((Extra IDS compounds fromExtra IDS compounds from dictionarydictionary))
64
Incompletely defined substances
=> fil hcaplus => l9 and p/dtL13 1805 L9 AND P/DT ((Patents from structurePatents from structure)) => l11 and p/dtL14 6344 L11 AND P/DT((Patents from extra IDSPatents from extra IDS)) => l14 not l13L15 5906 L14 NOT L13 ((Patents only from extra IDSPatents only from extra IDS)) => l7L16 870 L7 ((Patents from MarpatPatents from Marpat)) => l15 and l16L17 234 L15 AND L16 ((Patents, only extra IDS,Patents, only extra IDS, retrievedretrieved by Marpatby Marpat))
65
Incompletely defined substances
RN 25496-72-4 HCAPLUS CN 9-Octadecenoic acid (9Z)-, monoester with 1,2,3-propanetriol CM 1 CRN 112-80-1 CMF C18 H34 O2 Double bond geometry as shown.
(CH2)7 (CH2)7MeHO2C
Z
CM 2 CRN 56-81-5 CMF C3 H8 O3
CH CH2CH2
OH
OHHO
66
Locate information on substances with the following structural characteristics?
Any substitution at all open sitesHy = heterocycle containing exactly 3 N and no O, S, or PAll rings are isolated/embedded-SO3H groups attached at any position on the benzene rings
Cl
N O N
NN O
Cl
Hy
Cl
N
S
O
O
O S
O
O
O
Incompletely defined substances
67
=> 5/nrrs(s)c18n2o2/rf(p)cnrs>=3(p)c6/ea(p)3 n/rel and cl>=3(p)s>=2(p)o>=8(p)n>=8L1 509 5/NRRS(S)C18N2O2/RF . . .
=> l1 and ids/ci 234090 IDS/CIL2 160 L1 AND IDS/CI
Incompletely defined substances
=> FILE REGISTRY
L3 STRUCTURE UPLOADED
=> D L3
=> l3 full
L4 115 SEA SSS FUL L3
Incompletely defined substances
O
N
N
O
Cl
Cl
N
N
Hy
S
O
O
O
S
O
O
ON
Cl
Incompletely defined substances
=> fil caplus=> d his (FILE 'HOME' ENTERED AT 06:29:23 ON 08 MAR 2004) FILE 'REGISTRY' ENTERED AT 06:29:41 ON 08 MAR 2004L1 505 5/NRRS(S)C18N2O2/RF(P)CNRS>=3(P)C6/EA(P)3 N/REL AND CL>=3(P)S>=L2L2 160 L1 AND IDS/CIL3 STRUCTURE UPLOADEDL4 115 L3 FULL FILE 'MARPAT' ENTERED AT 06:32:31 ON 08 MAR 2004L5 51 L4 FULL
72
Incompletely defined substances
=> l2 and p/dtL6 21 L2 AND P/DT (Reg. IDS/CAPlus) => l4 and p/dtL7 28 L4 AND P/DT (Reg. Str./CAPlus) => l5L8 51 L5 (Marpat)
73
Incompletely defined substances
=> l6 and l7L9 3 L6 AND L7 => l6 not l9L10 18 L6 NOT L9 => l10 and l5 51 L5L11 2 L10 AND L5 (IDS structures recovered in MARPAT)
75
• The CASLINK tool provides one-step searching of both the– Specific substancesSpecific substances in REGISTRY– Prophetic substancesProphetic substances in MARPAT
• CASLINK also retrieves the references to the specific substances from CAplus and eliminates any duplicate hits between CAplus and MARPAT.
CASLINK
76
Search Question:
Locate references on the following substances:
R1 = Any non-hydrogen substituent ring or chainR2 = any ring or chain carbon atomNo additional fusion on the ringAdditional substitution allowed at all open sites
CASLINK
77
1Enter the CASLINK cluster of files=> FILE CASLINKFILE 'REGISTRY' ENTEREDFILE 'MARPAT' ENTERED FILE 'CAPLUS' ENTERED
Predefined command sequences will be executed in REGISTRY, MARPAT, MARPATPREV, and CAPLUS.
2Perform the upload=>Uploading mar1.str
L1 STRUCTURE UPLOADED
CASLINK
79
Run a SAMPLE search
=> S L1 SSS SAM
PROJECTED ITERATIONS: 4669 TO 6691
PROJECTED ANSWERS: 68 TO 532
L2 15 SEA SSS SAM L1
CASLINK
80
S L2 SSS SAM FILE=MARPAT
FULL FILE PROJECTIONS: ONLINE **COMPLETE**
BATCH **COMPLETE**
PROJECTED ITERATIONS: 9195 TO 11845
PROJECTED ANSWERS: 3 TO 164
L3 3 SEA SSS SAM L1
CASLINK
81
• Each answer from REGISTRY is a specific compound that matches the structure query.
• Each answer from MARPAT is a document record (patent) in which the Markush structure contains fragments that matchs the structure query.
• D SCAN is used to determine if the REGISTRY and MARPAT answers are on target. D SCAN content is file-specific.
CASLINK
82
=> D SCAN L2L2 15 ANSWERS REGISTRY COPYRIGHT 2000 ACSIN 3-Pyrrolidinecarboxylic acid, 4-ethenyl-2-oxo- 1-(phenylmethyl)-, Methyl ester, (3R,4R)-rel-
(9CI)MF C15 H17 N O3
Relative stereochemistry.
HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):0(continued on next page)
MeO
Ph
O
O N
CH2S S
Note: Substance identification information from REGISTRY.
CASLINK
83
=> D SCAN L3
MSTR 1
G5 = CHOG6 = 22
MPL: claim 1STE: diastereoisomers and mixtures
HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):0
G1
OC(O)G6
G4
22
NO OG5
CASLINK
N
O
O
O
A
84
• When a FULL structure search is requested in CASLINK, STN:
1 Runs FULL structure searches in REGISTRY, MARPAT
2 Searches the REGISTRY answers in Caplus
3 Removes duplicateRemoves duplicate** answers between CAplus and MARPAT and creates a single answer set
CASLINK
85
=> S L1 SSS FULL
S L1 SSS FUL FILE=REGISTRY
L4 411 SEA SSS FUL L1
S L4 SSS FUL FILE=MARPATL5 54 SEA SSS FUL L1
S L4 FILE=CAPLUS
L7 161 FILE CAPLUS
CASLINK
86
DUP REM L5 L7 (this dup rem is useless)(this dup rem is useless)
PROCESSING COMPLETED FOR L5
PROCESSING COMPLETED FOR L7
L8 209 DUP REM L6 L5 L7 (6 DUPLICATES REMOVED)
ANSWERS '1-54' FROM FILE MARPAT
ANSWERS '55-209' FROM FILE CAPLUS
CASLINK
87
Illustration: FQHIT vs. FHIT in MARPAT
With FQHIT display format, only the portions of the Markush structure that caused it to be a "hit" are displayed. The format is useful to
Get a general idea of the context in which a structure hit Determine if a structure query is too broad Display large numbers of MARPAT hits
CASLINK
88
Step 6: Display Results
=> D 5 FQHIT
L8 ANSWER 5 OF 209 MARPAT COPYRIGHT 2000 ACS DUPLICATE 6 MSTR 1
G7 = alkoxycarbonyl<(1-8)> (SO (1-) G8)G11 = OMPL: claim 1
N
G1
G1
G7
G9
G10
G12G11
89
Illustration: FQHIT vs. FHIT in MARPAT
With FHIT display format, all G-group definitions are shown. It is useful when you need to see the entire context in which a structure hit.
CASLINK
90
=> D 5 FHITL8 ANSWER 5 OF 209 MARPAT COPYRIGHT 2000 ACS DUPLICATE 6 MSTR 1
G1 = H / X / CF3 / CN / Me / 13 / 16 / SMe / 20 / SO2Me / 23 / 30 / (SC Cl)
G2 = 13 / 18 / CF3
G3 = Me / COPhG4 = H / MeG5 = pyridyl (SO (1-) G6)G6 = X / CF3G7G7 = X / CN / CO2H / alkoxycarbonyl<(1-8)> (SO (1-)alkoxycarbonyl<(1-8)> (SO (1-) G8) G8) / cycloalkyloxycarbonyl<(3-8)> (SO (1-) G13) / (SC Cl / CO2Et) / (EX 54 / 43)
(continued on next page)
N
G1
G1
G7
G9
G10
G12G11
13F2C CF2 H
16O G2
20S Me
O23C N O G3
G4 30O G5
13F2C CF2 H
18F2C H
The parts that caused the answer to be a "hit" are highlighted.
CASLINK
91
G8 = X / CF3 / Ph / cycloalkyl<(3-7)>G9 = alkyl<(1-4)> (SO (1-) cyclopropyl) / cycloalkyl<(3-4)> (SO (1-) Me) / alkenyl<(2- 4)> / (SC Et / CH=CH2) / (EX 48)
G10 = H / X / (SC F)G11G11 = OO / SG12 = H / X / (SC Cl)G13 = X / CF3 / Ph / alkyl<(1-5)>MPL: claim 1
43C(O) O CH2Me
54C(O) O CH
Me
48H2C
The parts that caused the answer to be a "hit" are highlighted.
Helpful Hint:Helpful Hint: Markush displays can be very complicated.OptionOption: Consider displaying the associated CAplus abstract. Many contain structural summaries of the claimed substances.
CASLINK
92
Markush displays in MARPAT consist of• Base structure• G-groups defining the variability in the structure
G-groups in MARPATdisplays may contain
Example
Real atoms G1 = O / S / NH / CH2
Variable groups G2 = H / X / CF3 / CN / Me
Variable groups with deeperlevels of definition
G3 = OH / / Hy<EC (3-7) A (1-2)Q (1-2) N (0) OTHERQ, RC (1),RS (1) X7> (SO) / NMe2
CASLINK
93
G - g r o u p s i n M A R P A Td i s p l a y s m a y c o n t a i n
E x a m p l e
S t r u c t u r a l f r a g m e n t si d e n t i f i e d b y n o d e n u m b e r s
G 4 = 1 3 / 1 8 / C F 3
1 3F 2 C C F 2 H
1 8F 2 C H
G e n e r i c t e x t s h o r t c u t s G 5 = O H / a l k o x y / X / N H 2 /a l k o x y c a r b o n y l / C O 2 H /
T e x t u a l i n f o r m a t i o nd e s c r i b i n g a n o n - s t r u c t u r a le n t i t y
G 6 = R < T X " p r o t e c t i n gg r o u p " > / C H 2 P h /C H 2 C H = C H 2
CASLINK
95
Search Question:
Locate references discussing compounds with the following structure:
R1 = heterocyclic ring with at least one =O attached R2, R3 = any type of carbon chain (substituted or unsubstituted)The oxygen-containing ring may be isolated or embedded in a larger ring systemAny substitution at all open sites
All of the atoms in the structure, except for the benzopyran ring, may match real atoms or generic groups in Markush structures.The benzopyran ring may match only real atoms.
Levels in Marpat
97
• Match Levels may be changed on a single node or a group of nodes.
• To change match level of a single node1 Right click on the node. A pop-up menu appears. Select
Markush Attributes
Levels in Marpat
98
• Match Levels may be changed on a single node or a group of nodes.• To change match level of a single node
2 The Markush Attributes dialog box appears. Do the following:– Click the radio button associated with the Match Level of interest.
– Click OK.
Levels in Marpat
99
• To change match level on multiple nodes all at once, 1 Highlight the nodes using the highlighting tool.
Levels in Marpat
100
• To change match level on multiple nodes all at once, 2 From the Query Def pull-down menu, select Markush Attributes. An alert dialog box appears. Click OK.
The Markush Attributes dialog box appears. Do the following:
a Click the radio button associated with the Match Level of interest.
b Click OK.
Levels in Marpat
By default STN Express assign the level CLASS to all chains, By default STN Express assign the level CLASS to all chains, and level ATOM to all rings. But you can (must) change it.and level ATOM to all rings. But you can (must) change it.
101
• Verifying Match Levels Assignments1 From the Query Def pull-down menu, select Query Verification.2 The Query Verification dialog box appears.
a Click the Select radio button.
b Click in the match level box.
c Click OK.
Levels in Marpat
102
• Verifying Match Levels Assignments3 A Query Verification pop-up dialog appears and Match Levels
for all atoms display. Click OK.
(You can better verify clicking on the Q button)
Levels in Marpat
103
=> FILE CASLINK
=>
Uploading mar2.str
L1 STRUCTURE UPLOADED
=> D L1
L1 HAS NO ANSWERS
L1 STR
CLASS match level on:Cl, Ak, O, Hy
ATOM match level on:All atoms in the benzopyran ring.
Levels in Marpat
104
=> S L1 SSS SAMS L1 SSS SAM FILE=REGISTRY
o o oL2 0 SEA SSS SAM L1
o o oS L2 SSS SAM FILE=MARPAT
o o o L3 4 SEA SSS SAM L1
o o o
=> D SCAN L3 FQHIT
Levels in Marpat
105
L3 4 ANSWERS MARPAT COPYRIGHT 2000 ACS MSTR 1A
G1 = 11
G2 = alkyl<(1-6)>G3 = alkyl<(1-6)>G7 = Hy<EC (4-5) C (1-2) N (0) OTHERQ, AN (1-) N
(1-) C, BD (1) D (0) T, RC (1), RS (1) E6 (0) OTHER> (SO)
o o o HOW MANY MORE ANSWERS DO YOU WISH TO SCAN? (1):0
Note that FQHIT shows only the fragments that caused the structure to hit. For example, the complete definition of G6 is not shown - only the part that caused the answer to be a hit (G7) is shown.
Levels in Marpat
106
=> S L1 SSS FULLS L1 SSS FUL FILE=REGISTRY
o o oL4 6 SEA SSS FUL L1
o o oS L4 SSS FUL FILE=MARPAT
o o oL5 42 SEA SSS FUL L1
o o oS L4 FILE=CAPLUSL7 14 FILE CAPLUS
o o o L8 51 DUP REM L6 L5 L7 (5 DUPLICATES REMOVED) ANSWERS '1-42' FROM FILE MARPAT
ANSWERS '43-51' FROM FILE CAPLUS
Levels in Marpat
107
=> D 1 BIB ABS FQHITL8 ANSWER 1 OF 51 MARPAT COPYRIGHT 2000 ACS DUPLICATE 1
AN 121:255814
MSTR 1
G1 = X
G3 = alkyl<(1-6)>
MPL: claim
Answers from MARPAT.
Levels in Marpat
108
Levels in Marpat=> D 45 51 BIB ABS HITSTR
L8 ANSWER 45 OF 51 CAPLUS COPYRIGHT 2000 ACS
AN 1988:112204 CAPLUS
DN 108:112204
Answers from CAplus/REGISTRY.
109
Search REGISTRY/CAplus and MARPAT to locate references discussing substances with the following structure
R1 = nitrogen in a ring or chainR1 = nitrogen in a ring or chain
R2 = anything, including hydrogenR2 = anything, including hydrogen
R3,R4 = alkyl chainR3,R4 = alkyl chain
Nitrogen-containing ring may be isolated or Nitrogen-containing ring may be isolated or embedded in a larger ring systemembedded in a larger ring system
Any substitution is allowed at all open sitesAny substitution is allowed at all open sitesConsider Class the rings and Atoms the chainsConsider Class the rings and Atoms the chains
Display the final answer set using the following formats:
For MARPAT answers use BIB ABS FQHIT
For CAplus answers use BIB ABS HITSTR
Skills Practice
111
=> l1FULL FILE PROJECTIONS: ONLINE **COMPLETE** BATCH **COMPLETE**PROJECTED ITERATIONS: 1864 TO 3216PROJECTED ANSWERS: 8 TO 329 L2 8 SEA SSS SAM L1 => d scan
NHEtMeO
CH2 CH2
NH
CH2SNH
O
O
CH2CH2
113
=> d l5 bib abs hitstr L5 ANSWER 1 OF 1132 CAPLUS COPYRIGHT 2006 ACS on STN AN 2006:151208 CAPLUS Full-textTI Transnasal composition having immediate action and high absorbability. . . . . . . . . . . . . . . . . . . . AB Disclosed is a powdery composition for transnasal administration which contains a nonpeptidic nonproteinaceous drug and crystalline cellulose masses having a specific mesh-size as a carrier therefor. This composition can exert an immediate action of the drug and a high absorbability. For example, morphine hydrochloride 65 mg and Avicel PH-F20 (crystalline cellulose) 135 mg were blended and nasally administered to monkeys for the determination of pharmacokinetic parameters of morphine.
114
IT 103628-46-2, SumatriptanRN 103628-46-2 CAPLUS CN 1H-Indole-5-methanesulfonamide, 3-[2- (dimethylamino)ethyl]-N-methyl- (9CI) (CA INDEX NAME)
NMe2MeNH CH2 CH2
NH
CH2S
O
O
115
=> fil marpat => l4FULL FILE PROJECTIONS: ONLINE **COMPLETE** BATCH **COMPLETE**PROJECTED ITERATIONS: 16297 TO 19783PROJECTED ANSWERS: 2 TO 125 L7 2 SEA SSS SAM L1
=> d scan
116
L7 2 ANSWERS MARPAT COPYRIGHT 2006 ACS on STN . . . . . . . . . . . . . . . . . . . . . . . . TI Preparation of indole derivatives as antagonists of gonadotropin releasing hormone. . . . . . . . . . . . . . . . . . . . . . . . . MSTR 1
N9
G8G15
G16G17
G16G16
G48 12N 13G1 14G49
G47
991G9
117
=> l4 fullL9 63 SEA SSS FUL L1 => d bib abs fqhit
L10 ANSWER 1 OF 44 MARPAT COPYRIGHT 2006 ACS on STN AN 143:341070 MARPAT Full-textTI Synergistic broad-spectrum microbicide compositions containing sulfamoyl compounds and dipeptides or basic copper chloride. . . . . . . . . . . . . . . . . . . . . . . . . . . .GI
NSO2
N
NN SO2
O
O
NH
Pr-iNH
O O
NR1R2
R3R4R5
R6
R7
R8
R1 OR3
R2
I
II
118
AB The microbicide compns. contain (A) sulfamoyl compds. I [R1, R2 = C1-4 alkyl; R1R2 may form C4-6 alkylene; Y = H, halo, C1-8 alkyl, C1-6 alkoxy, C1-10 alkylthio, C1-6 haloalkyl, C1-6 haloalkylthio, (un)substituted benzylthio, (un)substituted Ph, (un)substituted benzyl; R3-R8 = H, C1-8 alkyl, C3-8 cycloalkyl, C2-8 alkenyl, C5-8 cycloalkenyl, C2-8 alkynyl, C1-8 alkoxy, etc.] and/or their agrochem. acceptable salts and (B) dipeptides II (R1 = iso-Pr, Ph; R2 = Me; R3 = Ph substituted with R4 at the 4-position, 2-benzothiazolyl which may be substituted with R5; R4, R5 = F, Cl, Me, Et, MeO, cyano) or (C) basic copper chloride (copper oxychloride) (III). Concomitant application of 1-(N,N-dimethylsulfamoyl)- 3-(3-bromo-6-fluoro-2-methylindol-1-yl)sulfonyl-1,2,4-triazole (at 0.625 g/ha) and Me ()-RS-[3-(N-isopropoxycarbonyl-S-valinyl)amino]-3-(4- chlorophenyl)propanoate (at 2.5 g/ha) showed 80% control of disease caused by Phytophthora infestans in potato.
NSO2
N
NN SO2 G1
G2G5
G5G5G5
G5G5
G5 = alkyl <containing 1-4 C> (substd. by G7) G7 = dialkylamino <each alkyl containing 1-6 C> / dialkylaminosulfonyl <each alkyl containing 1-6 C>
120
• Techniques for modifying structure queries for Markush searching are available.
• These structure drawing tools will either expand or reduce the number of answers retrieved in MARPAT file.
Precision Tools
121
Match level (limited, unlimited) Generic definitions (Generic Groups) Element count (Generic Groups)
Precision Tools
122
• Each atom in a query structure is assigned a default Match Level.
• Match Level controls how query atoms match specific atoms and generic groups in the MARPAT database.
• Three Match Level options are possible
Database Retrieval possibilities
REGISTRY Specific atoms
MARPAT Specific atoms
Generic groups that matchthe query definition
Match Level is ignored in REGISTRY
Precision Tools
123
Match Level
• Markush structures include both real atoms and generic nodes, both of which may be matched against the real atoms and generic nodes (Ak, Cb, Hy,Cy) of the search query.
• Match levelMatch level determines the degree to which query nodes match with nodes in the candidate answers.
• Changing the degree of matching will increase or decrease the number of answers retrieved.
Precision Tools
125
Match Level Atom
• Match level atom is the most restrictive match level.
• It retrieves the most precise set of answers: Specific atoms in the query match only
specific atoms in candidate answers. Generic groups in the query match only
specific atoms in candidate answers.
Precision Tools
126
Example Match Retrieval
Query atom Database hit
Atom
Real atoms
Br
Br
X Br, Cl, F, etc
Pyridine ring Pyridine ring
Hy Pyridine, thiophene, benzofuran, etc
Precision Tools
Match Level Atom
127
Match Level Atom
Query node
Candidate answer node
Match?
Cl Cl Yes
Cl Br No
Cl X No
X Cl or Br or F or I or At
Yes
X X No
Precision Tools
128
Match Level ClassMatch level class causes more answers to be retrieved than match
level atom: Specific atoms in the query match specific
atoms and “generic nodes” in candidate answers.
Generic groups in the query match specific atoms and generic groups in candidate answers.
HINTHINT:: the Class level is the most important level and you the Class level is the most important level and you
should always run searches in Marpat at this levelshould always run searches in Marpat at this level
Precision Tools
129
Example Match Retrieval
Query atom Database hit
Class Real atoms Generic
groups (Q X M Ak Hy Cb Cy)
Br
X
Br, X
Br, Cl F, etc., X
Pyridine ring Pyridine ring, “Hy”
Hy Pyridine, thiophene, benzofuran, etc., Hy
Precision Tools
Match Level Class
130
Query node
Candidate answer node
Match?
Cl Cl Yes
Cl Br No
Cl X Yes
X Cl or Br or F or I or At
Yes
X X Yes
Precision Tools
Match Level Class
131
Match Level Any• Match level any is the least restrictive match level
option. • In addition to specific atoms and generic nodes,
candidate answers also include R-nodes. • R-nodes are indefinite substituents described with
text terms such asOrganic groupGroup to form ringAnionProtecting group
Precision Tools
132
Example Match Retrieval
Query atom Database hit
Any Real atoms
Generic groups
Any Rb
Br Br, X, R
Pyridine ring Pyridine ring, Hy, R
Precision Tools
Match Level Any
133
Query node
Candidate answer node
Match?
Cl Cl Yes
Cl Br No
Cl X Yes
Cl R-node Yes
X Cl or Br or F or I or At
Yes
X X Yes
X R-node Yes
Match Level Any
Precision Tools
134
Helpful HINT
Using match level any for an entire query is a nonsense. It retrieves far too many irrelevant answers and greatly extends the search time. Assign match level any to selected nodes in the query to
Broaden the query Increase the recall Match R-nodes like “protecting group”
Precision Tools
Match Level Any
135
In Marpat all pieces of structures (in query, or in the file) are also converted in
Generic GroupsGeneric Groups
Precision Tools
137
CH
CHN
CHCH2
CH2Me
Cl
In Marpat then, to the above structure is associated, in any case, the following string:
Ak*-Hy*(or Cy*)-X*
* Indicates that are generated Generic Groups and not original Generic Groups
Precision Tools
138
In searches at atom level the query does not match generated generic groups or original generic groups, but only specific atoms or groups (i.e. phenyl, ethyl, etc.)
In searches at class level the query matches also generated generic groups or original generic groups, but never query generated generic but never query generated generic groups match file generated generic groupsgroups match file generated generic groups
Precision Tools
139
O ||C=C-C-O-Ak-X
G1 O | ||CH2=C-C-O-CH2-ClG1 = H / Me
Hit (atom)
O ||CH2=CH-CH2-C-O-Ak-Br No Hit
O ||G2-C-O-G1G1 = alkyl (SO X) / PhG2 = alkenyl <2-6> / loweralkyl
Hit (class)
Q*
||
Ak*-Q*-Ak-X
O
||
Ak-C-O-Ak-BrHit (class)
Precision ToolsQUERY FILE
140
CH
CHN
CH
Ak
CH2 CH2 N
QUERY
CH
CHN
CH
Ak
CH2 CH2 NH2
M.le All Class
M.le All Atom
Yes
No
Precision Tools
FILE
141
CH
CHN
CH
Ak
CH2 CH2 N
QUERY
CH
CHN
CH
Me
CH2 CH2 NH2
M.le All Class
M.le All Atom
Yes
Yes
Precision Tools
FILE
142
CH
CHN
CH
Ak
CH2 CH2 N
QUERY
CH
CHN
CH
Ak
Ak NH2
M.le All Class
M.le All Atom
Yes
No
Precision Tools
FILE
143
CH
CHN
CH
Ak
CH2 CH2 N
QUERY
CH
CHN
CH
Ak
Ak CH2 NH2
M.le All Class
M.le All Atom
Yes
No
Precision Tools
FILE
144
CH
CHN
CH
Ak
CH2 CH2 N
QUERY
CH
CHN
CH
Ak
CH2 CH2 NH2
M.le All Class
M.le All Atom
No
NoAk
Precision Tools
FILE
147
CH
CHN
CH
Ak
CH2 CH2 N
QUERY
M.le All Class
M.le All Atom
Yes
NoAk-Hy-Ak-NH2
Precision Tools
FILE
148
CH
CHN
CH
Ak
CH2 CH2 N
QUERY
M.le All Class
M.le All Atom
No
NoAk-Cb-Ak-NH2
Precision Tools
FILE
149
CH
CHN
CH
Ak
CH2 CH2 N
QUERY
M.le All Class
M.le All Atom
Yes
NoAk-Cy-Ak-NH2
Precision Tools
FILE
150
Default Match Level in STN Express
• Structure queries automatically include default match level assignment.
• This assignment is only taken into account when searching MARPAT or MARPATprev.
• The default settings for match levels are AtomAtom, for ring atoms and the ring
generic groups Cy, Cb, and Hy ClassClass, for chain atoms and the
chain generic group Ak
Hint: change this default, in Preferences, and put all atoms at class level
Precision Tools
151
Match Level Assumptions in STN ExpressBy default, Match
Level is set asFor the following parts of a structure
Atom All atoms in a ring system
Hy, Cb, Cy
Class All chain atoms in a structure
Precision Tools
152
If in the query there is Consider this Match Level option
A chain spacer of a certain length that must be present
Change the Match Level for the chain to ATOM
Precision Tools
Tips for setting ATOM Match Levels
153
Tips for setting CLASS Match LevelsIf in the query there is Consider this Match Level
option
A specific ring system When you want to retrieve thereal atom ring system, plusgeneric groups that encompassthe ring definition, change MatchLevel for all atoms in the ring toCLASS
Change Match Level for all ring atoms to CLASS.
Precision Tools
154
If in the query there is Consider this Match Leveloption
Hy, Cb, Cy When you want to retrieve realatom rings matching the ringgeneric group, plus the genericgroup, change Match Level onthe generic group to CLASS
Change Match Level to CLASS.
Precision Tools
Tips for setting CLASS Match Levels
155
If in the query there is Consider this Match Leveloption
M, Q, X, or A as part of aring system
When you want to match specificelements, as well as thecorresponding generic groups,change the Match Level on thoseatoms to CLASS.
Tips for setting CLASS Match Levels
Precision Tools
Change Match Level to CLASS. Rest of ring atoms are ATOM.
156
If in the query there is Consider this Match Leveloption
A substituent that is oftendescribed generically inpatent claims, e.g., anelectron withdrawinggroup
Assigning Match Level ANY tothose types of substituents.
Change Match Level for the G1 G1 substituentssubstituents to ANY to also match R which might be defined as an “electron withdrawing group.”
Precision Tools
Tips for setting ANY Match Levels
157
Consider match level any, if the query contains
For example:
A substituent that you want to also match on that substituent with "substitution optional" (SO) or "substitution required" (SR) in the textual information
A portion that could match on "acyl" as a textual description of the attachment
Change match level for Ak to any to retrieve hits such as
G1=X/NO2/phenyl (SR)
Change match level for X to any to match the textual phrase "acyl".
Precision Tools
Tips for setting ANY Match Levels
158
1 Use MARPAT to locate patents on compounds with the following structure:
R1, R3, R4, R5 = an alkyl chain of any length with no substitutionsR2 = any ring system with no substitutionsNitrogen-containing ring is not isolated
All the ring systems may match real atom rings or generic groupsThe -CH2-Si-CH2-Si- chain may match only real atomsThe alkyl groups may match real atoms or generic groups
(follows in the next slide)
Skills Practice
159
1 Use MARPAT to locate patents on compounds with the following structure:
R1, R3, R4, R5 = an alkyl chain of any length with no substitutionsR2 = any ring system with no substitutionsNitrogen-containing ring is not isolated
All the ring systems may match only real atom ringsThe -CH2-Si-CH2-Si- chain may match only real atomsThe alkyl groups may match only real atoms
(Look at the differences between the results in this search and the previous one)
Skills Practice
160
=> fil marpat=> Uploading C:\Program Files\stnexp\Queries\19a.strL1 STRUCTURE UPLOADED
isolated ring systemsisolated ring systems :containing 1 : Connectivity Connectivity :10:1 E exact RC ring/chain 11:1 E exact RC ring/chain 12:1 E exact RC ring/chain 13:1 E exact RC ring/chain 14:1 E exact RC ring/chain Match levelMatch level :1:CLASS 2:CLASS 3:CLASS 4:CLASS 5:CLASS 6:Atom 7:Atom 8:Atom 9:Atom 10:CLASS 11:CLASS 12:CLASS 13:CLASS 14:CLASS
161
=> l1 fullL3 1 SEA SSS FUL L1 => d l3 fhit L3 ANSWER 1 OF 1 MARPAT COPYRIGHT 2003 ACS MSTR 1
G1 = loweralkyl / CH=CH2 / CH2CH=CH2 / CH2Ph / Ph (SO) /20 / biphenylyl
NN
NCH2SiCH2SiG1
G1
G1
G1
G1
162
=> Uploading C:\Program Files\stnexp\Queries\19abis.strL4 STRUCTURE UPLOADED
Match levelMatch level :1:Atom 2:Atom 3:Atom 4:Atom 5:Atom 6:Atom 7:Atom 8:Atom 9:Atom 10:Atom 11:Atom 12:Atom 13:Atom 14:Atom
163
=> l4 fullL5 1 SEA SSS FUL L4 => d l5 fhit
L5 ANSWER 1 OF 1 MARPAT COPYRIGHT 2003 ACS MSTR 1G1 = loweralkyl / CH=CH2 / CH2CH=CH2 / CH2Ph / Ph (SO) /20 / biphenylyl
NN
NCH2SiCH2SiG1
G1
G1
G1
G1
164
Run two searches in Marpat on the previous structure.
R1 = unsubstituted carbon chain (Level: atom, class) R2 = O, S, or N,no other substitutions on this atom (Level: atom) R3 = Any type of ring system (Level: atom, class) R4 = Nitrogen in a chain (Level: atom) The nitrogen-containing ring is isolated.
Look at the differences
Skills Practice
166
N N
NG8
G1
G2010CH
G4012G33
G3824G7
G41
G7 = alkyl<(1-6)> (SO) Ak
G41 = 13 / aryl (SO (1-) G28) / Cy cycloalkyl<(3-9)> (SO (1-) G28) / cycloalkenyl<(4-9)> (SO (1-) G28) / Hy (SO (1-) G28)
Ak, Cy Class
169
Suggestions:Suggestions:
Start always a structure search in Marpat with all nodes at Start always a structure search in Marpat with all nodes at Class LevelClass Level and save it (results in Lx); then, if necessary, run and save it (results in Lx); then, if necessary, run a subset structure search on Lx, changing some nodes at a subset structure search on Lx, changing some nodes at Atom Level.Atom Level.
Remember that if you draw in your query a specific group (i.e. Remember that if you draw in your query a specific group (i.e. pyridine), also at Class Level (pyridine), also at Class Level (LimitedLimited), you find ), you find onlyonly Generic Generic Groups which encompass your designed group (i.e. … a Groups which encompass your designed group (i.e. … a heterocycle with N …) heterocycle with N …) notnot other Generic Groups. other Generic Groups.
In order to get the best results it is a general rule to run a In order to get the best results it is a general rule to run a structure search, in Marpat, on a set previously created either structure search, in Marpat, on a set previously created either from Registry eiher from CAPlus.from Registry eiher from CAPlus.
170
Skills Practice
Find patents on cytomegalovirus, and Markush structures that match the structure query:
O Hy
Hy
171
=> fil caplus => (cytomegalovir? or cmv? or cytamegalo?(s)virus) and p/dt L1 2433 (CYTOMEGALOVIR? OR CMV? OR . . . . . . .=> sel rn SmartSELECT INITIATED TERM LIMIT EXCEEDED: 1458 ANSWERS PROCESSEDL2 SEL L1 1- RN : 50359 TERMS => sel l1 rn 1459- SmartSELECT INITIATED L3 SEL L1 1459- RN : 12961 TERMS
172
=> fil reg=> l2 or l3L6 61198 L4 OR L5 => Uploading C:\Program Files\stnexp\Queries\virus.str=> dL7 HAS NO ANSWERSL7 STR
O Hy
Hy
173
=> l7 full subset=l6FULL SUBSET SEARCH INITIATED 10:15:52FULL SUBSET SCREEN SEARCH COMPLETED - 30485 TO ITERATE 100.0% PROCESSED 30485 ITERATIONS 85 ANSWERSSEARCH TIME: 00.00.03 L8 85 SEA SUB=L6 SSS FUL L7
174
FILE 'CAPLUS' ENTERED AT . . . .L1L1 2433 (CYTOMEGALOVIR? . . . . . . . . . L2 SEL L1 1- RN : 50359 TERMSL3 SEL L1 1459- RN : 12961 TERMS FILE 'REGISTRY' ENTERED AT 10:10:05 ON 27 OCT 2003L4 50359 S L2L5 12961 S L3L6 61198 S L4 OR L5L7 STRUCTURE UPLOADEDL8L8 85 L7 FULL SUB=L6
175
=> fil caplus=> l8 and l1 11188 L8L9 15 L8 AND L1 => d hitstr
MeMe
HOO
O
S
S
NO
+
S
R
S
R
O Hy
Hy
176
FILE 'CAPLUS' ENTERED AT 10:04:43 ON 27 OCT 2003L1L1 2433 (CYTOMEGALOVIR? OR CMV? OR . . . . . L2 SEL L1 1- RN : 50359 TERMSL3 SEL L1 1459- RN : 12961 TERMS FILE 'REGISTRY' ENTERED AT 10:10:05 ON 27 OCT 2003L4 50359 S L2L5 12961 S L3L6 61198 S L4 OR L5L7 STRUCTURE UPLOADEDL8L8 85 L7 FULL SUB=L6 FILE 'CAPLUS' ENTERED AT 10:16:17 ON 27 OCT 2003L9 15 L8 AND L1
177
=> fil marpat=> l1L10 325 L1
=> l8 full subset=l10FULL SUBSET SEARCH INITIATED 10:19:16FULL SUBSET SCREEN SEARCH COMPLETED - 278 TO ITERATE 100.0% PROCESSED 278 ITERATIONS 34 ANSWERSSEARCH TIME: 00.00.02 L11 34 SEA SUB=L10 SSS FUL L7
178
FILE 'CAPLUS' ENTERED AT 10:04:43 ON 27 OCT 2003L1L1 2433 (CYTOMEGALOVIR? OR CMV? OR . . . . .L2 SEL L1 1- RN : 50359 TERMSL3 SEL L1 1459- RN : 12961 TERMS FILE 'REGISTRY' ENTERED AT 10:10:05 ON 27 OCT 2003L4 50359 S L2L5 12961 S L3L6 61198 S L4 OR L5L7 STRUCTURE UPLOADEDL8 85 L7 FULL SUB=L6 FILE 'CAPLUS' ENTERED AT 10:16:17 ON 27 OCT 2003L9L9 15 L8 AND L1 FILE 'MARPAT' ENTERED AT 10:17:40 ON 27 OCT 2003L10 325 L1L11L11 34 L8 FULL SUB=L10
179
=> file caplus=> l11L12 34 L11 => l12 not l9L13 31 L12 NOT L9 => sel l13 rn SmartSELECT INITIATED New TRANSFER and ANALYZE Commands Now AvailableSee HELP TRANSFER and HELP ANALYZE for Details L14 SEL L13 1- RN : 2630 TERMS
180
=> fil reg=> l14L15 2630 L14 => l7 full subset=l15FULL SUBSET SEARCH INITIATED 10:26:22FULL SUBSET SCREEN SEARCH COMPLETED - 2444 TO ITERATE 100.0% PROCESSED 2444 ITERATIONS 0 ANSWERSSEARCH TIME: 00.00.01 L16 0 SEA SUB=L15 SSS FUL L7
181
FILE 'CAPLUS' ENTERED AT 10:16:17 ON 27 OCT 2003L9L9 15 L8 AND L1 FILE 'MARPAT' ENTERED AT 10:17:40 ON 27 OCT 2003L10 325 L1L11L11 34 L8 FULL SUB=L10 FILE 'CAPLUS' ENTERED AT 10:19:35 ON 27 OCT 2003L12 34 L11L13L13 31 L12 NOT L9L14 SEL L13 1- RN : 2630 TERMS FILE 'REGISTRY' ENTERED AT 10:25:22 ON 27 OCT 2003L15 2630 L14L16 0 L7 FULL SUB=L15
183
NN
CH2NO
OG2
OC(O) NH CH2 G1
G2 = alkyl<(1-6)> (SO (1-3) G3)alkyl<(1-6)> (SO (1-3) G3) / (SC Me / Et / 129 / 139 / 144 / 148 / 154 / 161)
G3 = OH / NH2 / 37 / morpholinomorpholino /
O Hy
Hy
185
Generic Groups are very important in Marpat because, at level Class, they are they are always involvedalways involved (generated or original)
Generic Groups
186
• Ak Any carbon chain (only first atom need be carbon); any bond value allowed
• Cy Any cyclic group• Hy Any cyclic group with one (1) or
more non-carbon atoms• Cb Any cyclic group with all carbon atoms
Generic Groups
187
Attribute Options Generic groups Saturated: all of the bonds are single exact
Unsaturated: at least one bond is double, triple or normalized
<7 ≥7 Monocyclic
Polycyclic Linear: all of the AK atoms have only one or two attachments to other non-hydrogen atoms
Branched: at least one of the AK atoms has more than two attachments to a non-hydrogen atom 1 >1
Saturation Hy, Cb, Cy, Ak
No. of C atoms
Hy, Cb, Cy, Ak
No. of hetero atoms
Hy, Cy
Type of ring system
Hy, Cb, Cy
Type of chain
Ak
Generic Groups
188
Generic Groups
Ak Cy Hy Cb
Remember that you can use the Element Count:
Minimum, Maximum, Exact, Range
to estabilish the number of atoms
189
Generic Text Shortcuts Ak-based• Ak is a chain of 1 or more carbons, linear or
branched, saturated or unsaturatedAlkanoyl Alkyl -C(O)-, H - C(O) -
AlkenylAk with one or more double bonds, no triple bonds, and two or more carbons, Ak<EC(2-) C,BD (1-) D (0) T> -
An unsaturated monovalent radical chain of two or more carbons, branched or linear, containing one or more carbon-to-carbon double bonds, but no triple bonds
Formed by the removal of one hydrogen from thecorresponding alkene, e.g., CH3 - CH = CH - CH2 -
Generic Groups
190
Alkenylene
An unsaturated divalent hydrocarbon chain radical of at least two carbon atomscontaining one or more double bonds, but notriple bonds
Formed by the removal of two hydrogens from the parent branched or linear alkene,
e.g., - CH = CH -
Alkenylenedioxy - O - alkenylene - O -
Alkoxy alkyl - O -
also called alkyloxy and alkoxyl
Generic Text Shortcuts Ak-based
Generic Groups
- Ak<EC (2-) C,BD (1-) D (0)T> -
191
Alkyl Ak with all bonds single exact
A totally saturated monovalent radical chain, branched or linear
Formed from an alkane by removal of onehydrogen, e.g., Me-, Et-, t-Bu-
Alkylene - alkyl -
A divalent saturated hydrocarbon radical
Formed by the removal of two hydrogens
from the branched or linear parent alkane,e.g., - CH2 -
Generic Text Shortcuts Ak-based
Generic Groups
192
Alkylidene alkyl = A divalent alkyl radical that is attached to the parent by a double bond or two (2) singlebonds from the same carbon,
e.g., = CH - CH2 - CH3
Alkynyl Ak with no double bonds, one (1) or moretriple bonds, and two (2) or more carbons, Ak< EC (2-) C, BD (1-) T (0) D>An unsaturated monovalent radical chain oftwo (2) or more carbons, branched or linear,containing one or more carbon-to-carbontriple bonds, but no double bonds
Formed by the removal of one hydrogen
from the corresponding alkyne, e.g., HC C , HC C CH2
Generic Text Shortcuts Ak-based
Generic Groups
193
Lower Any of the “alk” terms may be preceded bythe term “lower” which implies a total
carboncount one (1) to six (6) carbons for
alkanes;2-6 carbons for alkenes and alkynes. Thisterm is used only when no carbon count isgiven in the patent
Loweralkyl Ak with all bonds single exact and one (1)to six (6) carbons
PerhaloalkylAlkyl with all hydrogens replaced by halogenatoms
Generic Text Shortcuts Ak-based
Generic Groups
194
• Cb is any monocyclic or polycyclic group containing all carbon atoms with any bond values between atoms.
• Hy is any monocyclic or polycyclic group containing one or more non-carbon atoms with any bond values between the atoms.
Generic Text Shortcuts Cb/Hy-based
Generic Groups
195
Aryl Cb with one or more aromatic rings, six (6)or more normalized bonds, and one (1) ormore six (6)-membered rings
Arylene - aryl -
A divalent aromatic radical
Formed by the removal of two (2) hydrogensfrom two different carbon atoms on thearomatic molecule, e.g., phenylene
Generic Text Shortcuts Cb/Hy-based
Generic Groups
196
Cycloalkenyl Cb with one (1) or more double bonds andno triple bonds
An unsaturated monovalent monocyclic orpolycyclic radical containing one (1) or morecarbon-to-carbon double bonds
Formed by the removal of one (1) hydrogenfrom the corresponding cycloalkene, e.g.,cyclopentadienyl, cyclohexenyl
Cycloalkyl Cb with all bonds single exact
A saturated monovalent alicyclic radical
Formed by the removal of one (1) hydrogenfrom the corresponding cycloalkane, e.g.,cyclopropyl, decahydronaphthyl
Generic Text Shortcuts Cb/Hy-based
Generic Groups
197
Heteroaryl Hy with one (1) or more aromatic rings withsix (6) or more normalized bonds and one(1) or more six (6)-membered rings
Hy with one (1) or more aromatic rings andtwo (2) or more double bonds and one (1) ormore five (5)-membered rings
A monovalent radical derived from anaromatic molecule that contains at least one(1) heteroatom.
Formed by removal of one (1) hydrogen fromthe pyridyl, benzopyranyl
Generic Text Shortcuts Cb/Hy-based
Generic Groups
198
Acyl Carbonyl bonded to Ak (which may have R’son it), at the carbonyl carbon
Carbonyl bonded to R, at thecarbonyl carbon
Formyl
Aralkyl Alkyl bonded to one-to-three (1-3) aryls,
HydrocarbylAk
Cb bonded to Ak
Generic Text Shortcuts Hybrids
Generic Groups
200
Structure Displays
• MSTR
Display label for the Markush Structure
Example: MSTR 4
Translation: This is the 4th Markush structure in the document
MARPAT Codes
201
• VAR G#Defines the alternatives for a G#
Example: VAR G1 = O / S / 16 / NULL
Translation: G1 is O or S or node 16 or G1 is a direct bond
Node sixteen (16) appears in the structure diagram portion of the display
Structure Displays
MARPAT Codes
202
• REP G#=
Defines a REPeating group and the number of times it repeats
Example: REP G3 = (0-7) CH2
Translation: CH2 repeats (0-7) times
Structure Displays
MARPAT Codes
203
• CVAThe Conditional Variable statement, currently ignored at search time
Example: VAR G1 = H / OH / X VAR G2 = NO2 / Me / Et CVA = If G1 = OH, THEN G2 = MeTranslation: If G1 is the OH alternative, then G2 must be Me and not NO2 or Et. Searched as G1 = H, OH, X and G2 = NO2, Me, Et (STN Express) as VAR G1 = H / OH / X and VAR G2 = NO2 / Me / Et
Structure Displays
MARPAT Codes
204
• DER
DERivative information that cannot be structured
Searched as single words in the Basic Index (default)
Example: DER: or salts or metal complexes
Structure Displays
MARPAT Codes
205
• NTE
General NoTEs
Searched as single words in the Basic Index (default)
Example: NTE: substitution restricted
Structure Displays
MARPAT Codes
206
• MPL
Location of the Markush structure in the patent
Searched as single words in the Basic Index (default)
Example: MPL: claim 1
Structure Displays
MARPAT Codes
207
• STE
STEreochemistry
Searched as single words in the Basic Index (default)
Example: STE: 41,42-cis
Structure Displays
MARPAT Codes
208
• Generic Group Attributes • AN, AR, BD, CH, DC,EC, FA, RC, RS, TXAN, AR, BD, CH, DC,EC, FA, RC, RS, TX
– Found in < > following the generic group they modify– The marked attributes are searched as Generic Definitions
(STN Express queries) or Generic Group Categories (STR command queries)
– All others are ignored while searching
MARPAT Codes
Structure Displays
209
Abbreviation Example Translation
AN Attachment Nodes AN (3) A
AN (2) N
Attached through 3nodes of any kind (A)
Attached through 2 Nnodes
AR Aryl AR (1-)AR (0)
1 or more aromatic ringsis not aromatic
BD* Bonds BD (0) TBD (1-) D (2) SE
no triple (T) bonds1 or more double (D)bonds and two singleexact (SE) bonds
CH Charge CH (2) +CH (1) +-
total of two (2) positivechargesone (1) positive ornegative charge
DC** Degree of Connectivity
DC (0) M3DC (1-) M3
no branchingbranched
EC*** Element Count
EC (1-8) CEC (2-3 N
1-8 carbon (C) atoms2-3 nitrogen (N) atoms
MARPAT Codes
Structure Displays
210
Abbreviation Example Translation
FA**** Fusion Atoms
FA (2-4) CFA (2-) C
2-4 carbon (C) atomsfuse rings2 or more carbon Catoms fuse rings
RC**** Ring Count
RC (2-)RC (1)
polycyclic with 2 ormore ringsmonocyclic
RS Ring Size RS (2-3) E6RS (1) M5 (1) X6and RC (1)
2-3 6-membered rings5- to 6-membered ring
TX Text Qualifiers
R <TX“protectinggroup”>R <TX“residue”>
the patent said“protecting group”the patent said “residue”
* See workshop manual for definitions of asterisks
MARPAT Codes
Structure Displays
211
Precision Qualifiers - EX and SC
• Found in parentheses at the end of the variable definition
• Additional alternatives for the G-group
MARPAT Codes
212
EXAlternatives are found in the EXamples (in the disclosure)Example: VAR G1 = R<TX “leaving group”> /(EX Cl/ Br / I)Translation: Cl, Br and I are alternatives for G1 and were found in the disclosure. VAR denotes variability G1 is the tag on the variable group
Precision Qualifiers
MARPAT Codes
213
SC
Alternatives are Specifically Claimed
Example: VAR G3 = H / alkyl / alkoxy /
(SC Me / OMe)
Translation: Methyl and methoxy are alternatives for G3 and are specifically claimed
Precision Qualifiers
MARPAT Codes
214
Substitution Qualifiers - SO and SR
• Found in parentheses immediately after the group they modify
• In the current implementation, SO and SR are both searched as SO; i.e., as if the substituent were present on the group any number of times, including zero
MARPAT Codes
215
Substitution Qualifiers
SO
Alternative is Substituted Optionally by this group
Example: VAR G1 = Ph (SO 5X) / alkoxycarbonyl (SO CO2H)
Translation: Phenyl is substituted with zero (0) to five (5) halogens Alkoxycarbonyl is substituted with zero (0) or more carboxy groups
MARPAT Codes
216
SR
Alternative has Substitution Required by one (1) or more of this groupCurrently searched as SOExample: VAR G1 = Ph (SR 5X) / alkoxycarbonyl (SR CO2H)Translation: Phenyl is substituted with one (1) to five (5) halogens Alkoxycarbonyl is substituted with one (1) or more carboxy groups Currently searched as zero (0) to five (5) halogens and zero (0) or more carboxy groups
Substitution Qualifiers
MARPAT Codes
217
Occurrence Counts
Occurrence Counts are found in front of the alternative they modify
Used to limit the substitution by specifying how many times an alternative is present
The occurrence counts are not currently searchable
MARPAT Codes
218
(n)
Alternative occurs exactly “n” times
Example: VAR G1 = (1) X / Me
Translation: Exactly one (1) of the G1’s is X Searched as G1 = X,Me (STN Express); VAR G1=X/Me (STR command)
Occurrence Counts
MARPAT Codes
219
(n-)
Alternative occurs “n” or more timesExample: VAR G3 = (2-) H / loweralkyl / X
Translation: Two (2) or more of the G3’s are H Searched as G3 = H, Ak , X (STN Express); VAR G3= H / Ak, X Me (STR command)
Occurrence Counts
MARPAT Codes
220
(-n)Alternative occurs zero (0) to “n” timesExample: VAR G6 = (-1) H / alkyl / CH2PhTranslation: Zero (0) or one (1) of the G6’s are H Searched as G6 = H, Ak , CH2Ph (STN Express); VAR G6 = H / Ak, CH2Ph (STR command)
Occurrence Counts
MARPAT Codes