organization ofthe human a2-plasmin inhibitor genehad 11-kb and 1.8-kb ecorifiagments....

7
1612 Genetics: Correction Proc. Nati. Acad. Sci. USA 86 (1989) Correction. In the article "Organization of the human a2- Natl. Acad. Sci. USA (85, 6836-6840), a section of the plasmin inhibitor gene" by Shinsaku Hirosawa, Yuichi Na- sequence figure (Fig. 2) was not fully reproduced due to a kamura, Osamu Miura, Yoshihiko Sumi, and Nobuo Aoki, printing error. The complete figure with legend should have which appeared in number 18, September 1988, of Proc. read as follows: CCA ICC AGG CTA ACA TGG TAA AAC CCC GTC TCT ACC AAA AAT ACA AAA AAT TAG CCA GGC GTG GTG GTG GAC GCC TGT AGT CCC AGC TAT TCG GGA-1046 GGC TGA GGC AAG AGA ATT GCT TGA ACC TGG GAG GTC GAG GTT GCA GTG AGC CGA GAT TCC ACC ACT GGC ACC ACT GCA CTC CAG CCT GGG CGA CAA -95A AGC AAC TCC GTC TCA AAA AAA GAG AAA CAT CTT TAG CAT TTT CTA AGG ATC CCT GGG GGA CGG GAG GCA GGT GTC CGG TGA GTT GGG GGA TTA GAA -854 GCT CCC AGG GCT CTT CCG TCA GCT GCT GGG 8CC CCA GAT CCA C Tt TGA CTII TCC) TTC CCA GGG AAG ACC CTT CGC ACA GTG GAG CCG CTG GAC AGC - 758 CGG GIG CCC CIC ACT GCG GTG GCT GTC ATG If~~ C CAC ACC AGC ATC ACC ATG GCC AGC TCT GAC TCT ACC CTG CGC TTT GTG GAC TGC AGG -662 AAG CCT GGT CTG CAG GTC AGG GGG GTC CAG TTC CCT GAG CAC TCG CCT GGT TCT CTG GGG ACC TGG C AA GGA GGA GAG ACT CCC CAA AAA CAG AAA -566 GCC AGG ATG TTG TTC TGG GGC CCT AGT TAG TTT CTC TTT GGT GCT AGA TCA CCC ACA GCC ACA CAT CCI GCG GGG CAG GAC TCT GGC CTG TGA TGG -47A GGG TGG CGG TCT GGC TTT TCA TGC CCC CTG ATG AGG GTC AGA GCT CAG GCC TIC CTG CTG TGI GGG CTT GGG TGG TGG GCA GGG CCT TGG GGA GTG - 374 -370 -360 -350 - 34A -338 -32A - 31 -300 - 290 -280 TGA GAT GGG AAG GTG GIG CCT CAG CTC AGIrTCGC C CTC CCT GCA GCA CGA GTT CCI; ACT MG CCTGG GCT GAA CCC TGG GCT TGT CCG TGC CCT -278 -268 -258 -240 -230 -22A -218 -200 -1 98 G~qjj..GCC CAG TGG CCG TAG TGT cGr GGC CGG CTT CTC CTC AGC CTT CAT CCI GCT CCT GGA CAC CCG CAC CGG CCT GGT TCT GCGC AGC TGG I18A -1780 - 160 - 158 - 140 -138 - 12A - 118 - A -90 CCA GCCC CAC GAG GGG GAC AGT TCT GCA GGT CAA GGT GAC ....IC .G..T.C CCTGTGCCACC CTG TCC Aj$j**Iq*T**;74C CCC -88 - 780 -68 -58 -48 -380- ICC TTA ~.G-Q..1~TGCT- TG1CCG__~.T TCA CCC TCG AAC TGC TCT GTC TffI]2j:]CC TGG TCC 8CC ACTG C GG CGTA GCC CTC TCG GTC CAC CTT GGG 8CC CAG TTG CAC TGC AGG CCI CTC 8CC AGC AGG CAG CCC ATG GGG GCG GGA GAC CGC GGC ACT GGC GCG GGA CGT ATG GGG CAC GCC TGA CCC TGT TCC TCA ICC CGA TGT TCT AAT GAG TAA CCC TTC TCC ATA TTT CTC TTG CTT CCA CGA TCA CCC GTC 8CC CCC TCT CCC TGA CCT ACT TCA GCT TAA ATA AAG CTG ACC TGG GAG CCI CTT TAC TGC CTG TAG 8CC GCT GAG CAG ACT CCA TCG CCC TCC CCC GCT CCC CTG GCT CCC GGC ACC 8CC 8CC CCT CCC CAT CCC CAT CCT CCC CCC GTC TCC TTC CCC TCC CCC TCC CTC CAT TCT CAC TCA CAG CCC TGG GCA CCT CCT CAG GCA CCC CCA GGA GGC 8CC AAC 8CC CCI TCC TGG GAG CAT TTC TGC 8CC CAC CTG CCA CTT TCC TCC CCA GCT TGC CTG 8CC CCA GCA GTC TTG CAT ICC CCT GCT GAC AGA CCT GTC CCT CCC TCC CTC CAC 8CC CCC 8CC GAA CCC TCA GCT CAA GGT GCCC CC ... i n tron I1( --6kb ). ....A 888 ATC CCA AAA AGA CCC TCT TAT TTT GGT CCT CAC CAT GCA TGT GAG AAG ACT GAG GGA CTT GTC CCA CCC TTT TAC AAC GTA 8CC CCA AGC CTG GIG CAG TTA CCC GAA ATG CCA CCI CCT TTG GCA AGA CCT AGC CTG CAT TCA CAC ACA CAT CTC ATT CAC AGC GCA CCC CCT TGT AGA ATG ACA ACG TTT TTG ATT ICC TAT CTC CCT CCT ATT CAC CAA AAC ACC CTC ACT CCA TGA AAT CCA TGA AAT ATC AAA CAC GAG AAA CTA AAA 8CC CCC AGA ACC CAG CCC GGA TGT CCC 8CC TGC 8CC ACT CCC 8CC CCC TGC TCG TGT CTT TCC GGT CTG TTC TGA TTC TGA CCC TGC TTC TTC CCC TTG GCA ATC ATG 8CC CCA GGA CTT CCC GTT AIC TGT CAT CCC GTG GGT 8CC ATT CCT CCC CCC CGT CCC CAT GTG CAC ATG GGA ACA GAG CTT TCT CTC CCT CCC CAC 8CC AAC ATG GCC CTG CTC Met AlIa Le 67L-e 28 38 48 58 68 ICC CCC CTC CTC GTC CTC 8CC TGG TCC TGC CTC CAA CCC CCC TGC TCC GTG GTG 8CC TGG TGA ACT GCA ACT CCC TGG GIG 8CC GGA ACA ACA CCC Tfji-pl y-17e U eV~i iiVL Teo Sor T rp Ser -Cys-Teu -VIi h Fo TO5Y-s-erVa - 38 -28 intron 2 78 88 90 CTT CCC ATG 8CC 8CC CCI TGG CTC CGA CCC GAC CTC CTA TCC TCA TCC CTT TCT CCA CAG TTC TCC CCT CTG 8CC CCC ATG GAG CCC ITC CCC CCC Phe Ser Pro Val Ser Ala Met Clu Pro Leu Cly Arg 188 intron 3 - 18 CAG CIA CTG CCC ACT GAG GAG CCT CTG ATG CCC CCA 8CC TCC CCC CCC TCT CAC TGG TGG CCT TGG GCA CCC TGG CCC CCC TGT CCC AAG GGT CCC a -in ~~~ ~ ~~~~~~118 128 138 148 158 168 TCT CCA TCT GCT TGC TCC TTT CCC CAG CTA ACT 8CC CCC CCC AAC CAC GAG CAC GTG TCC CCA CIT 8CC CTC CTC AAG TTC CCC AAC CAC GTA CAA Leu Tkr Ser Cly Pro Asn Gin Gin Gin Val Ser Pro Lea Thr Leo Leo Lys Leu Cly Ann Cmn CCA GGT CCC GCT CCC GAA GAG ICC CCC CCC CTA GAG GGA GGA CCC CCC ATC CCC 8CC GGT CCC CCC GTC CCC CCC CGT GCT GAG CCI GAG CCI CTG intron 4 GAG TCC AGA CCC CAG AAG GGA AAG GCT CCC GAG CAC CGA 8CC TGG CCC CCA CCC CCC AGA ATG CCA GTG CCC TCC GTC TGA CCC TCC CIC TIC CCT CCC GCT CCC ACA 8CC CCC TGC TGT CCT CAG CCA CAC CCC CTG TGA CAA CCC CTT CAA CAC AGA ACC TGG 8CC IC AC CCC TTG ACC TCC CTG 8CC 178 180 190 200 210 220 230 248 CCI CAT CTG TCC CIG COG GAG CCI CGT CCC CAG ACT CCC CTG AAG ACT CCC CCA GCA GIC TGC 8CC AGA G4C CCC 8CC CCA GAG COG 8CC CAC 8CC Glu Pro Gly Cly GIn Thr Ala LeuLy SerPro Pro Cly Val Cys Ser Org Asp Pro Th r6i:& u Gin Thr His Arg 28 38 40 258 260 278 280 290 380 310 320 330 CTG CCC CCC CCC ATC ATG CCC TIC ACT CCC CAC CIG TIC TCC CTG CTG CCI CAA 8CC TCC 8CC ICC CCC AAC CTC ATC CTIC TCA CCC CTG ACT GIG Leo AlIa Arg AlIa Met Met AlIa Phe Thr AlIa As p Leu Phe Ser Leu Va AlIa G In Thr Ser Thr Cys Pr s e leLuS rPr 58 60 70 340 358 360 CCC CIC CCC CTG TCT CAC CTG CCA CIA CCI 0CC CTG GCA CCA CIT CIC COG 0CC OAC AGA CTG GGA CCC COG COA CTC ACT ACT CCA GIG CIT CTC Ala Leo Ala Leu Ser His Leu Ala Leu C 880 CCC CCC CCI ICC ICC 8CC 8CC GTC 8CC ICC CTG ITT CCI AAA OAT CCC AGA TIC CIA CCC CCC CCC CCI CCC TCA CCC CTG TAO ICC CAA CAC ITTT GGA CCC TGA CCC CCC ICC AIC 8CC 8CC TCA GCA CIT COO GAC COG CCI CCC CAA CAT GIG 088 CIC TCT CIA CIA 888 AlA CAA 888 ATT TAG CTC ICC CIC GIG GIG CCC 8CC TCT 881 ICC 8CC 181 TCA GCA CCC TGA CCC AGO GAOA CTG ITT GAOA CCI CCC A GI ICC AGG 118 CM AICTA CCC COG ATC CCC CCA CTG CAC ICC 8CC CTG CCI GAC ACA CCA AGA TIC CCI CIC 088 CAA CAA CAA CAA COO AIG CAC All CCI CCC CCC CCA CCC ATC TCI CIA IGT GAA ICA CAT CTC ICC CCC CCC GAA ICT CCI 181 ITO CAA GIG CTC CIC GIG AlITT ITT ITT ITT TIT T AGA COG AGO CITI CCC ICC TCA CCC 8CC CIA GAG ICC ACT CCI GIG ATC TAG CTC ACT GCC 0CC ICI GTC ICC GAG CIT COO GCA OTT CTC GIG CCI COG CCI CCC 088 TAG CGIG CCA ICA CAG GCA CCA CCC 8CC ATG CAC ACC TGA III TTG TAT ITT TAG TAG TOG AGA CCC CII ICA CCA IGI TIC GCC AGG GIG GTC ICC AAC ICI CCA CCI AAG GIG AIC AAC IC`C CIA CCI CCC 888 GIG CTC GGA 118 CAC CCC ICC CAC GCG CCC CCC CCC CTC CIC GIG OTT ClTI A TG CAA GAG ITT CCI 8CC 188 TII CC. ........introo 5 (-1. 5k b)............................ AG ATC CCI CCC CIC ICC OG GCA T CCC TGI CCI CCC ICC 0CC ICC ICC 370 TCA CCC CIA ICC 8CC 8CC CAC ICC ACT CCC COG ICC CCC GIG 8CC 08 ACGA CCC GCA CCC CCC CCI COG CCI GIG CCC ICC CCI CC CCIGG GCT COG V~ Ul a- CTM 388 398 480 410 420 430 440 450 460 470 AAC CAC 8CC TIC CAC 8CC CIC CAA CAC GIG CTG CAG GCA CCC TCA CCC CCC ICC CTC CCC CAT CTG CTG 8CC CCC CTC ICC CAG COC CTC CCC CCC Asn His Thr Lea G I n Fr-g Leu Gin GIn Val Leu His Ala GlIy Ser GCIy Pro CpysLeT [ Tr ToR 1is Le-u Leo Ser Arg Leu Cy'SOVo AWsp Leaul5T1YPi: 98 100 110 488 490 500 510 CCC CCC TIC CGA CIG CCI CCC 8CC ATIG TAC CIC CAC 808 CCI 0CC CCC TCA ICC CAG GGA CCI CCC ICA GTC CTG CCC ICC GIG GAG GAG CCIS GAG Gly Ala Phe Arg Leu Ala Ala Org Met Tyr Lou Gin Lys T 128 130 ontron 6 ACC OAA CCC CTG CCC CTC ICC TAG CGA CIA CCC CCC TGT CTG CCI GIG GAG CCI CGG CCC CTG GGA ACA CCI TCT CCI CCC ICC GIG CAC GA ITTT TyVPhe 528 530 548 550 560 570 580 590 600 610 CCC ATC 888 GAA COT TTC CTC GAO CAA ICC CAA CAC CIA ITT CCC CCA AAG CCC GIG hCC CTG 0CC GGA AAG COG GAA GOT GAC CGIGCCA AAC AIC Pro11Ie Lys Gl. As Ph. IL-ou .. rl. q Sr Ci. Ginl Leo. Ph. Cly Ala yI o a e-u C~ -.s T-1o- OI Aspi qn As LeoI--. AIaAsn IeI 140 150 160 620 630 640 650 660 670 680 690 700 AAC CAA TGG GTG AAG GAG GCC ACG GAG GGC AAG ATT CAG GAA TTC CTC TCT GGG CTG CCG GAA GAC ACC GTG TTC CTT CTC CTC AAC GCC ATC CAC Asn Gln Trp ValFTysC AaiThr Glu Cly Lys lie GlnClu Pbhe Leu Ser Gly Lou Pro Gl OspT~r al Leu Leu Leu Leu AiYT His 170 180 190 FIG. 2. (Figure continues on the opposite page.) Downloaded by guest on May 23, 2021 Downloaded by guest on May 23, 2021 Downloaded by guest on May 23, 2021 Downloaded by guest on May 23, 2021 Downloaded by guest on May 23, 2021 Downloaded by guest on May 23, 2021 Downloaded by guest on May 23, 2021

Upload: others

Post on 21-Jan-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Organization ofthe human a2-plasmin inhibitor genehad 11-kb and 1.8-kb EcoRIfiagments. BothAPI2andAPI6 contained the 1.8-kb fragments with identical sequence, indicating that both

1612 Genetics: Correction Proc. Nati. Acad. Sci. USA 86 (1989)

Correction. In the article "Organization of the human a2- Natl. Acad. Sci. USA (85, 6836-6840), a section of theplasmin inhibitor gene" by Shinsaku Hirosawa, Yuichi Na- sequence figure (Fig. 2) was not fully reproduced due to akamura, Osamu Miura, Yoshihiko Sumi, and Nobuo Aoki, printing error. The complete figure with legend should havewhich appeared in number 18, September 1988, of Proc. read as follows:

CCA ICC AGG CTA ACA TGG TAA AAC CCC GTC TCT ACC AAA AAT ACA AAA AAT TAG CCA GGC GTG GTG GTG GAC GCC TGT AGT CCC AGC TAT TCG GGA-1046GGC TGA GGC AAG AGA ATT GCT TGA ACC TGG GAG GTC GAG GTT GCA GTG AGC CGA GAT TCC ACC ACT GGC ACC ACT GCA CTC CAG CCT GGG CGA CAA -95AAGC AAC TCC GTC TCA AAA AAA GAG AAA CAT CTT TAG CAT TTT CTA AGG ATC CCT GGG GGA CGG GAG GCA GGT GTC CGG TGA GTT GGG GGA TTA GAA -854

GCT CCC AGG GCT CTT CCG TCA GCT GCT GGG 8CC CCA GAT CCA CTt TGA CTII TCC) TTC CCA GGG AAG ACC CTT CGC ACA GTG GAG CCG CTG GAC AGC -758

CGG GIG CCC CIC ACT GCG GTG GCT GTC ATG If~~ C CAC ACC AGC ATC ACC ATG GCC AGC TCT GAC TCT ACC CTG CGC TTT GTG GAC TGC AGG -662

AAG CCT GGT CTG CAG GTC AGG GGG GTC CAG TTC CCT GAG CAC TCG CCT GGT TCT CTG GGG ACC TGG C AA GGA GGA GAG ACT CCC CAA AAA CAG AAA -566

GCC AGG ATG TTG TTC TGG GGC CCT AGT TAG TTT CTC TTT GGT GCT AGA TCA CCC ACA GCC ACA CAT CCI GCG GGG CAG GAC TCT GGC CTG TGA TGG -47AGGG TGG CGG TCT GGC TTT TCA TGC CCC CTG ATG AGG GTC AGA GCT CAG GCC TIC CTG CTG TGI GGG CTT GGG TGG TGG GCA GGG CCT TGG GGA GTG -374-370 -360 -350 -34A -338 -32A - 31 -300 -290 -280TGA GAT GGG AAG GTG GIG CCT CAG CTC AGIrTCGC C CTC CCT GCA GCA CGA GTT CCI; ACT MG CCTGG GCT GAA CCC TGG GCT TGT CCG TGC CCT

-278 -268 -258 -240 -230 -22A -218 -200 -198G~qjj..GCC CAG TGG CCG TAG TGT cGr GGC CGG CTT CTC CTC AGC CTT CAT CCI GCT CCT GGA CAC CCG CAC CGG CCT GGT TCT GCGC AGC TGG

I18A -1780 -160 - 158 -140 -138 -12A -118 - A -90CCA GCCC CAC GAG GGG GAC AGT TCT GCA GGT CAA GGT GAC ....IC .G..T.C CCTGTGCCACC CTG TCC Aj$j**Iq*T**;74C CCC

-88 - 780 -68 -58 -48 -380-ICC TTA ~.G-Q..1~TGCT- TG1CCG__~.T TCA CCC TCG AAC TGC TCT GTC TffI]2j:]CC TGG TCC 8CC ACTG C GG CGTA GCC CTC TCG GTC

CAC CTT GGG 8CC CAG TTG CAC TGC AGG CCI CTC 8CC AGC AGG CAG CCC ATG GGG GCG GGA GAC CGC GGC ACT GGC GCG GGA CGT ATG GGG CAC GCC

TGA CCC TGT TCC TCA ICC CGA TGT TCT AAT GAG TAA CCC TTC TCC ATA TTT CTC TTG CTT CCA CGA TCA CCC GTC 8CC CCC TCT CCC TGA CCT ACT

TCA GCT TAA ATA AAG CTG ACC TGG GAG CCI CTT TAC TGC CTG TAG 8CC GCT GAG CAG ACT CCA TCG CCC TCC CCC GCT CCC CTG GCT CCC GGC ACC

8CC 8CC CCT CCC CAT CCC CAT CCT CCC CCC GTC TCC TTC CCC TCC CCC TCC CTC CAT TCT CAC TCA CAG CCC TGG GCA CCT CCT CAG GCA CCC CCA

GGA GGC 8CC AAC 8CC CCI TCC TGG GAG CAT TTC TGC 8CC CAC CTG CCA CTT TCC TCC CCA GCT TGC CTG 8CC CCA GCA GTC TTG CAT ICC CCT GCT

GAC AGA CCT GTC CCT CCC TCC CTC CAC 8CC CCC 8CC GAA CCC TCA GCT CAA GGT GCCC CC... i n tron I1( --6kb ). ....A 888 ATC CCA AAA AGA CCC

TCT TAT TTT GGT CCT CAC CAT GCA TGT GAG AAG ACT GAG GGA CTT GTC CCA CCC TTT TAC AAC GTA 8CC CCA AGC CTG GIG CAG TTA CCC GAA ATG

CCA CCI CCT TTG GCA AGA CCT AGC CTG CAT TCA CAC ACA CAT CTC ATT CAC AGC GCA CCC CCT TGT AGA ATG ACA ACG TTT TTG ATT ICC TAT CTC

CCT CCT ATT CAC CAA AAC ACC CTC ACT CCA TGA AAT CCA TGA AAT ATC AAA CAC GAG AAA CTA AAA 8CC CCC AGA ACC CAG CCC GGA TGT CCC 8CC

TGC 8CC ACT CCC 8CC CCC TGC TCG TGT CTT TCC GGT CTG TTC TGA TTC TGA CCC TGC TTC TTC CCC TTG GCA ATC ATG 8CC CCA GGA CTT CCC GTT

AIC TGT CAT CCC GTG GGT 8CC ATT CCT CCC CCC CGT CCC CAT GTG CAC ATG GGA ACA GAG CTT TCT CTC CCT CCC CAC 8CC AAC ATG GCC CTG CTCMet AlIa Le 67L-e

28 38 48 58 68ICC CCC CTC CTC GTC CTC 8CC TGG TCC TGC CTC CAA CCC CCC TGC TCC GTG GTG 8CC TGG TGA ACT GCA ACT CCC TGG GIG 8CC GGA ACA ACA CCCTfji-pl y-17eU eV~iiiVLTeo Sor T rp Ser -Cys-Teu -VIih FoTO5Y-s-erVa

-38 -28intron 2 78 88 90

CTT CCC ATG 8CC 8CC CCI TGG CTC CGA CCC GAC CTC CTA TCC TCA TCC CTT TCT CCA CAG TTC TCC CCT CTG 8CC CCC ATG GAG CCC ITC CCC CCCPhe Ser Pro Val Ser Ala Met Clu Pro Leu Cly Arg

188 intron 3 -18CAG CIA CTG CCC ACT GAG GAG CCT CTG ATG CCC CCA 8CC TCC CCC CCC TCT CAC TGG TGG CCT TGG GCA CCC TGG CCC CCC TGT CCC AAG GGT CCCa-in ~~~ ~~~~~~~118128 138 148 158 168

TCT CCA TCT GCT TGC TCC TTT CCC CAG CTA ACT 8CC CCC CCC AAC CAC GAG CAC GTG TCC CCA CIT 8CC CTC CTC AAG TTC CCC AAC CAC GTA CAALeu Tkr Ser Cly Pro Asn Gin Gin Gin Val Ser Pro Lea Thr Leo Leo Lys Leu Cly Ann Cmn

CCA GGT CCC GCT CCC GAA GAG ICC CCC CCC CTA GAG GGA GGA CCC CCC ATC CCC 8CC GGT CCC CCC GTC CCC CCC CGT GCT GAG CCI GAG CCI CTGintron 4

GAG TCC AGA CCC CAG AAG GGA AAG GCT CCC GAG CAC CGA 8CC TGG CCC CCA CCC CCC AGA ATG CCA GTG CCC TCC GTC TGA CCC TCC CIC TIC CCT

CCC GCT CCC ACA 8CC CCC TGC TGT CCT CAG CCA CAC CCC CTG TGA CAA CCC CTT CAA CAC AGA ACC TGG 8CC IC AC CCC TTG ACC TCC CTG 8CC178 180 190 200 210 220 230 248

CCI CAT CTG TCC CIG COG GAG CCI CGT CCC CAG ACT CCC CTG AAG ACT CCC CCA GCA GIC TGC 8CC AGA G4C CCC 8CC CCA GAG COG 8CC CAC 8CCGlu Pro Gly Cly GIn Thr Ala LeuLy SerPro Pro Cly Val Cys Ser Org Asp Pro Th r6i:& u Gin Thr His Arg

28 38 40258 260 278 280 290 380 310 320 330

CTG CCC CCC CCC ATC ATG CCC TIC ACT CCC CAC CIG TIC TCC CTG CTG CCI CAA 8CC TCC 8CC ICC CCC AAC CTC ATC CTIC TCA CCC CTG ACT GIGLeo AlIa Arg AlIa Met Met AlIa Phe Thr AlIa Asp Leu Phe Ser Leu Va AlIa G In Thr Ser Thr Cys Pr s e leLuS rPr

58 60 70340 358 360

CCC CIC CCC CTG TCT CAC CTG CCA CIA CCI 0CC CTG GCA CCA CIT CIC COG 0CC OAC AGA CTG GGA CCC COG COA CTC ACT ACT CCA GIG CIT CTCAla Leo Ala Leu Ser His Leu Ala Leu C

880CCC CCC CCI ICC ICC 8CC 8CC GTC 8CC ICC CTG ITT CCI AAA OAT CCC AGA TIC CIA CCC CCC CCC CCI CCC TCA CCC CTG TAO ICC CAA CAC ITTT

GGA CCC TGA CCC CCC ICC AIC 8CC 8CC TCA GCA CIT COO GAC COG CCI CCC CAA CAT GIG 088 CIC TCT CIA CIA 888 AlA CAA 888 ATT TAG CTC

ICC CIC GIG GIG CCC 8CC TCT 881 ICC 8CC 181 TCA GCA CCC TGA CCC AGO GAOA CTG ITT GAOA CCI CCC AGI ICC AGG 118 CMAICTA CCC COG ATC

CCC CCA CTG CAC ICC 8CC CTG CCI GAC ACA CCA AGA TIC CCI CIC 088 CAA CAA CAA CAA COO AIG CAC All CCI CCC CCC CCA CCC ATC TCI CIA

IGT GAA ICA CAT CTC ICC CCC CCC GAA ICT CCI 181 ITO CAA GIG CTC CIC GIG AlITT ITT ITT ITT TIT T AGA COG AGO CITI CCC ICC TCA CCC

8CC CIA GAG ICC ACT CCI GIG ATC TAG CTC ACT GCC 0CC ICI GTC ICC GAG CIT COO GCA OTT CTC GIG CCI COG CCI CCC 088 TAG CGIG CCA ICA

CAG GCA CCA CCC 8CC ATG CAC ACC TGA III TTG TAT ITT TAG TAG TOG AGA CCC CII ICA CCA IGI TIC GCC AGG GIG GTC ICC AAC ICI CCA CCI

AAG GIG AIC AAC IC`C CIA CCI CCC 888 GIG CTC GGA 118 CAC CCC ICC CAC GCG CCC CCC CCC CTC CIC GIG OTT ClTI ATG CAA GAG ITT CCI 8CC

188 TII CC.........introo 5 (-1.5kb)............................ AG ATC CCI CCC CIC ICC OG GCAT CCC TGI CCI CCC ICC 0CC ICC ICC370

TCA CCC CIA ICC 8CC 8CC CAC ICC ACT CCC COG ICC CCC GIG 8CC 08ACGA CCC GCA CCC CCC CCI COG CCI GIG CCC ICC CCI CC CCIGG GCT COGV~ Ul a- CTM

388 398 480 410 420 430 440 450 460 470AAC CAC 8CC TIC CAC 8CC CIC CAA CAC GIG CTG CAG GCA CCC TCA CCC CCC ICC CTC CCC CAT CTG CTG 8CC CCC CTC ICC CAG COC CTC CCC CCCAsn His Thr Lea G I n Fr-g Leu Gin GIn Val Leu His Ala GlIy Ser GCIy Pro CpysLeT[ TrToR 1is Le-u Leo Ser Arg Leu Cy'SOVo AWsp Leaul5T1YPi:

98 100 110488 490 500 510

CCC CCC TIC CGA CIG CCI CCC 8CC ATIG TAC CIC CAC 808 CCI 0CC CCC TCA ICC CAG GGA CCI CCC ICA GTC CTG CCC ICC GIG GAG GAG CCIS GAGGly Ala Phe Arg Leu Ala Ala Org Met Tyr Lou Gin LysT

128 130 ontron 6ACC OAA CCC CTG CCC CTC ICC TAG CGA CIA CCC CCC TGT CTG CCI GIG GAG CCI CGG CCC CTG GGA ACA CCI TCT CCI CCC ICC GIG CAC GA ITTT

TyVPhe528 530 548 550 560 570 580 590 600 610

CCC ATC 888 GAA COT TTC CTC GAO CAA ICC CAA CAC CIA ITT CCC CCA AAG CCC GIG hCC CTG 0CC GGA AAG COG GAA GOT GAC CGIGCCA AAC AICPro11Ie Lys Gl. As Ph. IL-ou .. rl. qSr Ci. Ginl Leo. Ph. Cly Ala yI o ae-u C~ -.s T-1o- OI Aspiqn As LeoI--. AIaAsn IeI

140 150 160620 630 640 650 660 670 680 690 700

AAC CAA TGG GTG AAG GAG GCC ACG GAG GGC AAG ATT CAG GAA TTC CTC TCT GGG CTG CCG GAA GAC ACC GTG TTC CTT CTC CTC AAC GCC ATC CACAsn Gln Trp ValFTysC AaiThr Glu Cly Lys lie GlnClu Pbhe Leu Ser Gly Lou Pro Gl OspT~r al Leu Leu Leu Leu AiYT His

170 180 190

FIG. 2. (Figure continues on the opposite page.)

Dow

nloa

ded

by g

uest

on

May

23,

202

1 D

ownl

oade

d by

gue

st o

n M

ay 2

3, 2

021

Dow

nloa

ded

by g

uest

on

May

23,

202

1 D

ownl

oade

d by

gue

st o

n M

ay 2

3, 2

021

Dow

nloa

ded

by g

uest

on

May

23,

202

1 D

ownl

oade

d by

gue

st o

n M

ay 2

3, 2

021

Dow

nloa

ded

by g

uest

on

May

23,

202

1

Page 2: Organization ofthe human a2-plasmin inhibitor genehad 11-kb and 1.8-kb EcoRIfiagments. BothAPI2andAPI6 contained the 1.8-kb fragments with identical sequence, indicating that both

Genetics: Correction Proc. Nadl. Acad. Sci. USA 86 (1989) 1613710TTC CAC CCT CCC CTC CTC CTC CTC TCA CAT CCC CCA CCC TCT ACC CTG ACC TCC GAC GTG CAG CCC TTT TTC TTT TTT GAC ACA ACT CTC GCT CTC

TCA CCC ACC CTC CAC CCC ACT CCC CCC AC TCC TCT CA..................introm 7(-1.Okb)............... .........................C CTC CTC TCC AAC TCC TCC CCC TCC ACC TCA CCC CTC ACC CTC TCC TCC CT TCA CC?

200y720 730 140 750 760 770 780 790 800 810TC ?CC ACI AAC AAC TTT CAC CCC ACC CTT ACC CAC ACA CAC TCC TTC CAC CTC CAC CAC CAC TTC ACC CTC CCC CTC CAA ATC ATC CAC CCC CCCPhe Trp Arc Aso Lys Ph. Asp Pro Ser Lev Thr Gi. Arn Asp 5er Phe His Lev Asp Gi. Gii Ph. Thr Val Pro Val Gie Plot net Gin Ala Arg210 220 230820 830 840 850

ACC ?AC CCC CC CCC ?CC ??C ??C CC CAC CAC CCT CAC AC CAC CC ACC C?? CC? ?C? CCA CCA CCC ?CC C.....intron 8(-3.Ob) ....?hr Tyr Pro Lev Art Trp Ph. Lev Lev Cie GIs Pro Gli 11. GI-. ...?C C? ?A CACCAC ?C CCCC2CC CCC CCA CT? ACC ??C CCC CC? TTC TC? CC? CAT CC? C?? CCC TC CCT ??? CC TAC

860 87~~~0880 890 900 910 920 930 940 950CC CC? CA? ??C CCC ?T? AAC AAC AAC AC ACC TT CC CC C?? CTA CCC ACC CAC ??? CAA TCC AAC CC ?CC CAC CA CTC gCC AAC CC ACTVal Ala Nis Ph. Pro Ph. Lys As. Asa Not -Ser Ph. Val Val Lev Val Pro ?hr Nis Ph. CTirp Aso Val 5cr Gin Val Lev Ala AsaELeu 5c3r250 260 270960 970 980 990 1000 1010 1020 1030 1040 1050?CC CAC ACC ;TG CAC CCA CC? CTC CC ?CC CAC ACC CCC ACC AAC CCC CG CC CC? AAC CTC TA? CC AAA CAC COO AC CAC CC CC CCC 0CCTrp Asp ?hr Lev Nis Pro Pro Lev Val Trp Giv Org Pro Thr Lys Val Ort Lev Pro Lys Lev Tyr Lev Lys His IT. Met Asp Leu Val Aua ?hr280 290 300 3101060CC 0CC CAC CTC CC? AAC CAC CAC CC? CCC CCC CAC CCC CCO CC? CAC CC? CCC CAC CCC GCC ?OAA........itrom 9( '-1.0kb). ...?AC COOLou Ser Gls Lee CTTCO 0CC CC? AC TC? CAC ??C OAC CC ??C CC? CCC COC CAT CTC ACA CAC CC? CCA AAC CAC CCC CA CAC CC? CTC 0CC CCA 0CC CCA CC? CC

1070 1080 1090 liOO0 1110 1120 1130ACC 0CC CA? CC ?CC CCC ?CC GCC CCC CC CAC CAC ??C ??C CAC GCCC CCA CAC CC CC? CCC AC ?CC CAC CAC 0CC CC GTC GTC ?CC CCC CCly Lev Glo Civ Lev Ph. Gl. Ala Pro ASP Lev Org Giy lie Scr Glv Gl, Sir LEea -VaF VaiTe-r-lip Vi

320 3301140 1150 1160 1170 1180 1190 1200 1210 1220 1230CAC CA? CAC ?CC 0CC CC CAC CC 0CC CAC CC CCC CTC CAC CCC CCC CCC CCC 0CC 0CC 0?? CCC AC ?CC CCC AC ?CC CC ?CC TCC ??C ACCGIN 1i3 Gin Ser ?hr Lev Cle Lev 5cr Cle Val Cly Val GiU. Ala Ala Ala Ali ?hr Ser Ile Ala Net Sor Ora Neot 5cr Lev 5cr Ser Ph. Ser30

1240 1250 126030

1270 1280 129030

1300 1310 132037CC AAC CCC CCC ??C CTC ??C ??C ATC ??C CAC CAC 0CC ACA CCC C?? CCC CC ?TC GG CCC 0CC CC ACC OOC CCC AAC CCC AC? GCC CCC CCCValA8ftAeerigTPro-P h LeeaPh.- Phe lie Ph.e Civ Asp fhr T WTCGy Lou Pro L-ee Ph. Vol Cl1y Ser Val ArjAsa Pro As. -Pro Ser AlaT-Pro__ r

1330 1340 .135030

1360 1370 138030

1390 1400 141040

1420CAC CC AAC CAA CAC CAG CAT ?CC CCC CCC AAC AAG CAC TTC CC CAG 0CC CC 000 CCC ??C CCC CCC CIA COC AAC C?? TTC CCC CC? COC TTAGle Lou Lys GIe Gin Gls Asp 5cr Pro Glp Asn Lys Asp Phe~i 'Cl.-T Soer-Loe Lys GlyP P~h OrArg ipY oPT~ LePV TTri e

410 420 4301430 1440 1450 1460 1470

000_CT? CI CCC CCC ATC COG COG GOT TAC CCC CAG TTT GCC 0CC CCC AAC ?CA ICC CCC CI CC? I? CCA TCC AGO CC CC? CCC TIC 0CC 0CC 1518tEys Levi Vl~ Pro Pro Net live -Vli O T'r 1ro .i PI.ie 5c~Sr Pro Lys440 450

CTC TCC AC? CAT CTC ACT C?? ?CC AAC CCC CTT TIT CCC ACT GCI CCO CCC CCC ICC CCC ACT CTC ACA COG CCC 0?? CT? TCC COO CAC CTC TTC 1614

jGCAGT TTA CCC TIC ICC ICC CCI CCI CTC 110 CCA ICC COG ICA TCC CCC 0CC CCC GAC CC? CAC CCT COT CT? TCT TCC 000 CAC IC? COG 011 1710GTC TCC TIC 0CC CCI ICC TIC GCA CIA CCC 011 TIC TTC TAG TTC TIC COG COG ACA IC? TAG CTG CTC CCC 0CC TCA CC? ICC ACA CCC CIA CT? 1806TTG TTT 0CC ACA GAO 000 CCC 0CC CCC AGO ICC CTG CC? TTG GAC TTG TCC CCC GAC 0CC TOG GCC 0CC GTC CII 010 COC ICC CCC TIC TIC TCC 1902CTC GGO 0CC CGA 0CC ITT GTC CTC 0CC CCC ICC TIC AAC TCI TI? CTG ICA COG CCT CCC TGT ICC CTA 0CC TGC CIA COG TCC ATC 0CC CTC CAT 1998CC? 0CC CCC TI? GCCC TTG TCA CCC COG ACT TCC CAC CCC TCC TCG A10 TCC COO CAC TCC COG COT TTC CC? TCC TTC CTC TCC TI? CC CC? CC? 2094CTG CCC CCC 0CC TCA GGA 0CC GAG GCA ICC AAG CAT CCC ATGCOIC ?CC TTA 0CC CTC TTT TI? OAA ITT TTT CTA GTG OTT TT? ATG CCA CC? WEI 2190TW A A ATG AAT GGG CC? CCC TGG TTT GOT GTC 0CC 1?? CTG CC 2234

FIG. 2. Nucleic acid sequence of the ar-Plasmin inhibitor gene. Exons are underlined with solid lines. Bases in the exons and the 5'- and3'-flanking regions are numbered relative to the translation initiation site. Amino acids are numbered from the NHr-terminal residue in the plasmaprotein. Regions corresponding to a potential TATA box, the GC boxes, a potential transcriptional start site (-22), and a polyadenylylationrecognition site (2189-2194) are boxed. The direct repeats ofCCAAT box-like sequence are, indicated by dots. G + C-rich sequences are indicatedby the dashed underlines. The sequence (-123 to -108), similar to the hepatitis B virus enhancer sequence, is indicated by a waved line. Thesequence (-809 to -800), similar to the human immunodeficiency virus enhancer sequence or K-immunoglobulin light-chain gene enhancersequence, is bracketed.

Dow

nloa

ded

by g

uest

on

May

23,

202

1

Page 3: Organization ofthe human a2-plasmin inhibitor genehad 11-kb and 1.8-kb EcoRIfiagments. BothAPI2andAPI6 contained the 1.8-kb fragments with identical sequence, indicating that both

Proc. Nati. Acad. Sci. USAVol. 85, pp. 6836-6840, September 1988Genetics

Organization of the human a2-plasmin inhibitor gene(fibrinolysis/serine protease inhibitors/serpin gene superfamily/human genomic dones)

SHINSAKU HIROSAWA*, YUICHI NAKAMURA*, OSAMU MIURA*, YOSHIHIKO SUMIt, AND NOBUO AOKI*t*The First Department of Medicine, Tokyo Medical and Dental University, Yushima, Bunkyo-Ku, Tokyo 113, Japan; and tDepartment of Biochemistry,University of Tokyo School of Medicine, Hongo, Bunkyo-Ku, Tokyo 113, Japan

Communicated by Earl W. Davie, May 2S, 1988 (received for review October 28, 1987)

ABSTRACT We have isolated overlapping phage genomicclones covering an area of 26 kilobases that encodes the humana2-plasmin inhibitor. The a2-plasmin inhibitor gene contains10 exons and 9 introns distributed over -16 kilobases of DNA.To our knowledge, the number of introns is the highest yetreported for a member of the serine protease inhibitor (serpin)superfamily. All introns are located in the 5'-half of thecorresponding mRNA. The 5'-untranslated region and theleader sequence are interrupted by 3 introns totaling -6kilobases. A "TATA box" sequence is located 17 nucleotidesupstream from the proposed transcription initiation site.Multiple "GC box" sequences, G+C-rich sequences, and"CCAAT box"-like sequence, the hepatitis B virus enhancerelement-like sequence and the human immunodeficiency virusenhancer-like sequence appear in the 5'-flanking region. TheNH2-terminal region, which implements factor XI-catalyzedcross-linking of ir2-plasmin inhibitor to fibrin, is encoded by the4th exon. The reactive site and plasminogen-binding site, bothlocated in the COOH-terminal region, are encoded by the 10thexon. When similar amino acids of a2-plasmin inhibitor andother members of the serpin gene superfamily are aligned, theposition ofthe 7th intron of the a2-plasmin inhibitor gene alignsprecisely with that of the second intron of the genes for ratangiotensinogen and human ar1-antitrypsin genes and is mis-aligned by only one nucleotide with that of the third intron ofantithrombin m, suggesting that the a2-plasmin inhibitor geneoriginates from the common ancestor of these serine proteaseinhibitors.

a2-Plasmin inhibitor (a2PI; a2-antiplasmin) is a plasma gly-coprotein that functions crucially in the regulation offibrinol-ysis (1-3). Human a2PI is one of the major serine proteaseinhibitors (serpin superfamily) and is highly structurallysimilar to the other serpin superfamily members (4-6).However, a2PI contains an extra -50-residue peptide be-yond the COOH-terminal ends of the other family members(4). This extra peptide contains a plasminogen-binding site (4,7) that endows the inhibitor with high affinity for plasminogenand enables the inhibitor to compete with fibrin for binding toplasminogen (8-10). During blood coagulation, a2PI is cross-linked by activated factor XIII to the a chain of fibrin at theglutamine residue proximal to the NH2-terminal end (11-13).The cross-linked a2PI inhibits in situ plasmin generation onthe fibrin surface by physiologically occurring fibrin-associated plasminogen activation (14, 15). These propertiespeculiar to a2PI enable it to be a much more specific andeffective inhibitor of plasmin-catalyzed fibrinolysis than anyother major protease inhibitors, such as a2-macroglobulin (2,9, 16, 17). In individuals with a congenital deficiency of a2PI,hemostatic plugs are dissolved prematurely by physiologi-cally occurring fibrinolytic processes before the restorationofinjured vessels, resulting in a severe hemorrhagic tendency

(18, 19). The role of a2PI in modulating fibrinolytic reactionshas been reviewed recently (2, 3).

Studies from our laboratories (4) and those of others (5, 6)have led to the isolation of the cDNA coding for human a2PI.Subsequently, the chromosomal localization of the a2PI genewas demonstrated (20). In this investigation, the cDNA forhuman a2PI was used for the isolation of overlapping ge-nomic clones from a A phage library. Organization ofthe genewas then analyzed§ and compared with those of the genes forother serine protease inhibitors.

MATERIALS AND METHODScDNA for a2PI. A partial cDNA clone for a2PI, pPI 39, has

been described (4). A longer cDNA, covering the regionscoding for the COOH-terminal 6 amino acids of the signalpeptide and the whole plasma protein plus the 3'-noncodingregion up to the poly(A) sequence was subsequently assem-bled from clonal members of a new human hepatoma cellcDNA library. The nucleotide sequence of the region codingfor the mature plasma protein was completely accordant withthose of the cDNA already reported (5, 6).

Screening of the Human GenomicDNA Library. The humangenomic library was provided by H. Matsushime and M.Shibuya (Medical Institute, University ofTokyo, Japan) (21).The library was prepared from human placenta DNA bypartial digestion with Alu I and Hae III and subsequentcloning in the bacteriophage vector Charon 4A with EcoRIlinker. The library was screened by in situ hybridization of1.2 x 106 phage plaques (22) with two a2PI cDNA fragmentscorresponding to amino acids 31-130 and 179-429 as probes(4, 6). A 15-mer synthetic oligonucleotide, 5'-ACTCCCC-TGCCAGCC-3', that is the complementary sequence tobases -15 to - 5 of the cDNA (6) plus the donor signal at 5'(AC) and EcoRI linker at 3' (CC), was used as a probe toobtain a fragment containing the 5'-untranslated region. Theprobes ofcDNA fragments were labeled by nick-translation,and the 5'-end of the oligonucleotide was labeled by T4polynucleotide kinase. Fragments of human genomic DNAwere mapped with the restriction endonucleases EcoRI,BamHI, HindIII, Dra I, and Xba I. Subcloning of thegenomic DNA fragments in the plasmid pUC-18 and -19 wasdone.

Southern Blotting (23). The plasmid containing a2PI genewas isolated and subjected to restriction endonuclease diges-tions. The DNA fragments were then separated on agarosegels, transferred to a nitrocellulose filter, and hybridized asdescribed (24) using cDNA and oligonucleotide probes,which correspond to several regions of the a2PI gene.DNA Sequencing. Appropriate DNA fragments, isolated

and digested with various restriction endonucleases, were

Abbreviations: a2PI, a2-plasmin inhibitor; nt, nucleotide(s).tTo whom reprint requests should be addressed.§The sequence reported in this paper is being deposited in theEMBL/GenBank data base (IntelliGenetics, Mountain View, CA,and Eur. Mol. Biol. Lab., Heidelberg) (accession no. J03830).

6836

The publication costs of this article were defrayed in part by page chargepayment. This article must therefore be hereby marked "advertisement"in accordance with 18 U.S.C. §1734 solely to indicate this fact.

Page 4: Organization ofthe human a2-plasmin inhibitor genehad 11-kb and 1.8-kb EcoRIfiagments. BothAPI2andAPI6 contained the 1.8-kb fragments with identical sequence, indicating that both

Proc. Natl. Acad. Sci. USA 85 (1988) 6837

sequenced by the dideoxy nucleotide chain-terminationmethod of Sanger et al. (25) using plasmid as described byHattori and Sakaki (26).Primer Extension. One nanogram of a 5'-end-labeled syn-

thetic oligonucleotide (5'-ACCAGGAGCCCCCAGAG-CAGCGCCATGTTC-3'), complementary to nucleotides (nt)- 4 to + 26 as shown in Fig. 2, was hybridized with 50 ,ug oftotal RNA from the hepatoma cell line Hep G2 or 2 Ag ofpoly(A) + RNA from normal human liver. The total RNA wasprepared by the guanidine thiocyanate extraction method.Poly(A) + RNA was prepared by oligo(dT)-cellulose chroma-tography. The hybridization occurred in 80o (vol/vol)formamide/0.4 M NaCl/40 mM Pipes, pH 6.8, 1 mM EDTAby heating to 80'C for 5 min and then incubating at 450C for24 hr. The hybrids were recovered, and primer-extensionreactions were done at 420C for 1 hr with 40 units of reversetranscriptase purified from Rous-associated virus 2 (TakaraShuzo, Kyoto, Japan) and analyzed on 12% sequencing gels(27).RNA-Blot Hybridization. Poly(A) + RNA prepared from

human normal liver cells was separated by electrophoresis ona formaldehyde/agarose gel, transferred to a nitrocellulosefilter, and hybridized as described (24).

RESULTS AND DISCUSSIONThree bacteriophage clones (designated as API1, API2, andAPI6) containing sequences of a2PI gene were isolated fromthe genomic DNA library. By restriction endonuclease map-ping, these clones were found to overlap (Fig. 1). API1 carriedthe DNA insert of the EcoRI fragment of 13 kilobases (kb),and API2 carried EcoRI fragments of 13, 0.5, and 1.8 kb. API6had 11-kb and 1.8-kb EcoRI fiagments. Both API2 and API6contained the 1.8-kb fragments with identical sequence,indicating that both clones (API2 and API6) overlap eachother. The synthetic 15-mer oligonucleotide probe, corre-sponding to the 5'-untranslated region of the cDNA, and thecDNA probe, corresponding to the 5'-region of the mRNAfor a2PI, hybridized to the 11-kb fragment contained in API6.The cDNA probe that corresponds to the 3'-region of themRNA of a2PI hybridized to the EcoRI fragment of 13 kbcontained in API1 and API2. These results show that theseclones contain the entire region of the a2PI gene (Fig. 1).The gene structure of a2PI was further characterized by

subcloning appropriate fragments in the plasmid pUC-18 and-19. Southern blotting analysis ofrestriction enzyme-digestedplasmid DNAs containing exons and exon-intron boundarieswere used to deduce the overall gene organization. Thegenomic DNA sequence of selected regions of the a2PI genewas compared with the cDNA sequences, and this compar-ison allowed a precise definition of the exon-intron bound-aries (Fig. 2). All boundaries were consistent with the

5' UT

+B B

"GT-AG" rule formulated by Breathnach and Chambon (28).The a2PI gene was found to be =16 kb in length and to consistof 10 exons and 9 introns.The sequence of the exons agrees perfectly with the

sequence of the entire coding region of the cDNA earlierreported (4-6). However, a small section of the 5'-untranslated region (nt - 17 to - 5 in Fig. 2) differed slightlyas compared with the cDNA reported by Tone et al. (6)-CTGGCAGGGGA for the cDNA and CTGG-QCAGGGAQGfor the genomic DNA. The sequence difference might eitherhave been caused by polymorphism or the origin of thelibraries-hepatoma cell line for the cDNA and placenta forthe genomic library.To identify the transcription initiation site, we constructed

a 5'-end-labeled, antimessage sequence 30-base oligonucle-otide primer complementary to a sequence of nt -4 to 26including the initiator methionine codon in the cDNA. Thisprimer was hybridized to human hepatoma cell line RNA orhuman normal liver cell poly(A)+ RNA and extended withreverse transcriptase. The products of the reaction weresized by denaturing PAGE and migrated as 48- and 65-basepair (bp) fragments (Fig. 3). Although the band correspondingto 65-bp transcript was very faint from hepatoma cell lineRNA, the results suggest two major transcription initiationsites; 22 nt and 39 nt upstream of the initiator methioninecodon in the cDNA. Therefore, one transcription initiationsite may be located at nt - 22, suggesting sequence nt - 22to -5 is exon 1. To further define the initiation site, RNAblotting was done with a synthetic oligonucleotide probe,corresponding to proposed exon 1 (nt - 22 to - 5) or to itsimmediate upstream sequences (nt -39 to - 23), and thecDNA probe. The probe corresponding to the proposed exon1 hybridized to a band that corresponds to the mRNA ofa2PI(-2.4 kb), which was identified by the cDNA, whereas theother probe (nt - 39 to - 23) failed to hybridize. These resultstogether with the presence of mRNA (G)GGT sequence(consensus sequence at the intron-exon junctions) indicatethat the region from nt - 22 to - 5 is the first exon. Anotherpossible transcription initiation site is not known, but thelonger transcript may represent crosshybridization to an-other mRNA.The 1120-nt sequence of the 5'-flanking region was deter-

mined. The result reveals the presence of a TATA box (29)and 4 GC box sequences (5'-GGGCGG-3' and its invertedcomplement sequence 5'-CCGCCC-3') (30) (Fig. 2). Three ofthese GC box sequences are present in the -350-nt regionupstream of the transcription initiation site. In this region arealso several G + C-rich sequences in addition to the segmentscontaining GC boxes (Fig. 2). McKnight and Kingsbury (30)stressed the importance of the GC box and G + C richsequences upstream of the TATA box for maintaining tran-scription efficiency in eukaryotes. They further reported the

-5Sto 16-18to-6 84 to 131-18 to -6 17 to 83 132 to 199 248 to 315 316 to 452,3'UT-39 to-=1900 to 247

X B Bt

E X E E1

D H B B HI~ ~

A P12A PI6 lkb

FIG. 1. Organization ofthe human a2-plasmin inhibitor gene. The first line shows the positions ofexons as rectangles, and the numbers abovethe line indicate the amino acids at which intron-exon junctions occur. Untranslated regions (UT) are shown as hatched areas. The second lineindicates the positions of restriction endonuclease-recognition sites. Straight lines at bottom indicate the region of the three phage clones (API1,AP12, and API6). B, BamHI; D, Dra I; E, EcoRI; H, HindIlI; and X, Xba I. Note from Fig. 2 that a small 5'-untranslated region exists in thesecond exon.

Genetics: Hirosawa et al.

Page 5: Organization ofthe human a2-plasmin inhibitor genehad 11-kb and 1.8-kb EcoRIfiagments. BothAPI2andAPI6 contained the 1.8-kb fragments with identical sequence, indicating that both

6838 Genetics: Hirosawa et al. Proc. Natl. Acad. Sci. USA 85 (1988)

CCA TCC AGG CTA ACA TGG TOA AAC CCC GTC TCT ACC AAA AAT ACA AAA AAT TAG CCA GGC GTG GTG GTG GAC GCC TGT AGT CCC AGC TAT TCG 46

GGC TGA GGC AAG AGA ATT GCT TGA ACC TGG GAG GTG GAG GTT GCA GTG AGC CGA GAT TGC ACC ACT GGC ACC ACT GCA CTC CAG CCT GGG CGA 50

AGC AAC TCC GTC TCA AAA AAA GAG AAA CAT CTT TAG CAT TTT CTA AGG ATC CCT GGG GGA CGG GAG GCA GGT GTG CGG TGA GTT GGG GGA TTA 54

GCT CCC AGG GCT CTT CCG TCA GCT GCT GGG ACC CCA GAT CCA Cilt TGA CTT TCC) TTC CCA GGG AAG ACC CTT CGC ACA GTG GAG CCG CTG GAC 58

CGG GTG CCC CTC ACT GCG GTG GCT GTC ATG C[~jg COCC ACC AGC ATC ACC ATG GCC AGC TCT GAC TCT ACC CTG CGC TTT GTG GAC TGC 62

AAG CCT GGT CTG CAG GTC AGG GGG GTC CAG TTC CCT GAG CAC TCG CCT GGT TCT CTG GGG ACC TGG CR0 GGA GGA GAG ACT CCC CAA AAA CAG 66

GCC AGG ATG TTG TTC TGG GGC CCT AGT TAG TTT CTC TTT GGT GCT AGA TCA CCC ACA GCC ACA CAT CCT GCG GGG CAG GAC TCT GGC CTG TGA 70

GGG TGG GGT TCT GGC TTT TCA TGC CCC CTG ATG AGG GTC AGA GCT CAG CCC TTC CTG CTG TCI CGG CTT CGC TGG TGG GCA GGG CCT TGG CGV 74-370 -360 -350 -340 -330o -320 -jg31-300 -290 -28TGA GAT GGG AAG GTG GTG CCT CAG CTC ACZ C C CTC CCT GCA GCA CGA GTT CCG ACT TGG GCT GAA CCC TCG GCT TGT CCC TCC

-270 -260 -250 -240 -230 -220 -210 -200 -190G 8&,gQFJ***G CCC COG TGG CCC TAG TGT CCr CCC CGG CTT CTC CTC AGG CTT CAT GGr GCT CCT GGA CAC CCC CAC CCC CCT GCT TCT CCC GCC-8 0- 170 -160 - 150 -140 -130 -120 - 110 -100 -90CCA CCC CAC GAG CCC CAC ACT TCT GCA GGT CAA GGT GAC ~~ ~~..CCTCCTCTG~CAC CTG TCC APJIL

-80 -10 -60 -50 -40 -30TCC TTA QG-GQ-TGCT .TGC-.q~q--~1T TCA GGC TCG AAC TCG TCT GTC TI CC TGG TCC ACC I~T AAC TGC GCA CCC ACC GTA CCC CTC TCG ui.

CAC CTT CCC ACC CAC TTG CAC TCC 0CC GGT CTC AGC ACC 0CC CAC CCC ATG CCC CCC GGA GAC CCC CCC ACT CCC CCC GGA CGT ATG CCC CAC GCCTGA CCC TCT TCC TCA ICC CCA TCT TCT AAT GAG TAA CCC TTG TCC ATA TTT CTC TIC CTT CGA GGA TCA CCC GTC 0CC CCC TCT CCC TGA CCT ACT

TCA GCT TAO AIR ARC CTG 8CC TGG GAG GCT CIT TAC ICC CTG TAG ACC CCI GAG CAG ACT CCA TGG CCC TGC CCC GCT CCC CTG GCT CCC CCC 8CC

0CC ACG CCT CCC CAT CCC CAT CCT CCC CCC GTC TCC TTC CCC TCC CCC TCC CTC CAT TCT CRC TCA CRC CCC TGG CCR GCT GCT CRC GGA CCC CCA

GGA CCC AGC ARC 8CC GCT TCC TCC GAG CAT TTC TGC AGG CAC CTG GGA CTT TCC TCC CCA GCT TGG CTG 0CC CCA CGA GTC TTG CAT TGC CCT GCT

GAC AGO CCI GTC GCC CCC TCC CTC CAC 0CC CCC 0CC CAR CCC TCA CCI CAA CCT CCC CC... i ntron 1( -6kb ) .... A AAA ATC CCA ARA RCA CCC

TCT TAT TTT CCT CCT CAC CAT GCA TCT COG AAC ACT COG GGA CTT GTG CCR CCC TTT TAC ARC GTA 8CC CCR AGC'CTG GTG CRC TTA CCC GAA ATG

CCA CCT CCT TTG GCA OCR GGT AGC CTG COT TCA CRC ACA CAT CTG OTT CAC 8CC GCA CCC CCT TGT ACA ATG ACA 0CC TTT TTG ATT TGG TOT CTC

CCT CCT ATT CAC COO ARC ACC CTC OCT CCA TGA AAT GCC TGA ART ATG OAAR CAC CRC AAA CTA ARA 8CC CCC RCA AGC COG CCC GGA TGT CCC 0CC

TGC 0CC ACT CCC 0CC CCC TGC TCG TGT CIT TGG GGT CTG TTC TGA TTC TGA CCC TGC TTC TTC CCC TTG GCC ATC RIG 0CC CCA GGA CTT CCC GTT-141 10

ATC TGT GOT CCC GIG GGT 0CC OTT CCT CCC CCC CGT CCC CAT GTG CAC ATG GGA RCA GAG CTT TCT GTC CCT CCC CRC 0CC AAC RTC CCC CTG CTC-Ret 6TaI-u--Uei

20 30 40 50 60TGG CCC CTC CTG GTG CTC 0CC TGG TCC TGC CTC CAA CCC CCC TGC TCC GTG GTG AGC TGG TGA ACT GCA ACT CCC TGG GIG AGG CGG OCR RCA CCCTi--Cry- e-u Tinu- 'al T-eiFSer Trp Se-r ZTsiie--Teu Gn-Uiy-,roy-siKqer-V`J

-30 -20intron 2 70 80 90

CIT CCC ATG 0CC 0CC GCT TGG CTC CGA CCC CRC CTC CTA TCC TCA TCC CTT TCT CCA CRC TTC TCC CCT CTC 0CC CCC ATG GAG CCC TTC CCC CCCPhe Ser Pro Val Ser Ala Met Ciu Pro Lou Gly Arg

100 intron 3 -10COG GTA CTG CCC ACT GAG CRC CCT GTG ATG CCC GGA AGG TCC CCC CCC TCT CRC TGG TGG CCT TGG GCA CCC TGG CCC CCC TGT CCC ARC GGT CCCCTn-

I ~~ ~~~~~~~~10120 130 140 150 160TCT CCA TCT GCT TGC TCC TTT CCC COG CIA OCT RCC CCC CCC AAC COG COG CRC GTG TCC CCA CTT RCC CTC CTC AAG TTG CCC ARC CRC GTA CAR

Leo Thr Ser Gly' Pro Asn Gin Glu Gin Vol Ser Pro Lou Thr Leou Leou Lys Leou Gly Asn Gin-1 +1 10

CCA GGT CCC GCC CCC COO GAG TCC CCC CCC CIA CRC GGA GGA CCC CCC ATC CCC AGG CGT CCC CCC GTG CCC CCC CGT GCT COG GCT CRC GCT CTGintron 4

GAG TCC ACA CCC COG RAG GGA AAG GCT CCC GAG CRC CGA 0CC TGG CCC CCA CCC CCC RCA RTG CCR GTG CCC TCC GTC TGA CCC TCC CTC TIC CCT

CCC GCT CCC RCA 0CC CCC TCC TGT CCT CRC GCA CAC CCC CTG TGA CAR CCC CTT CAR CRC AGA ACC TGG AGC TG AC CCC TTG 0CC TCC CTG 0CC170 180 190 200 210 220 230 240

CCT CAT CTG TCC CTG COG CRC CCT CGT CCC CRC ACT CCC CTC ARC ACT CCC CCA GGA GTC TGC ACC AGO COC CCC 0CC CCA COG CRC ACC CRC 0CCG In Pro GlIy GlIy Gin Thr AlIa L-eou Ly's Ser Pr-o Fr-o G yV aI Cys Ser Org Asp Pro Thr-7Fr0Tu Cln-Tihr' His Org

20 30 40250 260 270 280 290 300 310 320 330

CTC CCC CCC CCC RIG ATG CCC TTC ACT CCC CAC CTG TTC TCC CTC CTG GCT CAA 0CC TCC 0CC TGC CCC ARC CTC ATC CTC TCA CCC CTG ACT CTCLeu AlIa Arg AlIa Ret Met AlIa Phe Thr Ala Asp Leu Phe Ser Lou Val Ala Gin Thr Sge-r Thr Cy's Pro Aso Leou-lie Leo- SeroTo4i o

50 60 70340 350 360

CCC CTC CCC CTC TCT CAC CTG CCA CTO CCT ACC CTG GCA CCA CTT GTC CRC ACC RAG OCR CTC GGA CCC CAC COO CTC ACT OCT CCA CTG GTT CTCAla Leou Ala Lou Ser His Leou Ala Leou

80CCC CCC CGT TCC TCC 0CC AGG GTC ACG ICC CTG TTT GT OARA ART CCC OCR TTC CTA CCC CCC CCC GGT CCC TCA CCC CTG TAO TCC COO CRC TTT

GGA CCC TGA CCC CCC TCG ATC 0CC 0CC TCA GGA GTT CAA GAC COG CCT CCC CAR CAT TC OARA CTC TCT CTA CTA AAA ATA CAA OARA ATT TOG CTG

TGC GTG GTG GTG CCC 8CC TGT OAT TCC 0CC TOT TCA GGA CCC TGA CCC ACA COO CTG TTT CR0 CCT CCC OCT TCG ACC TTA CAG TCA CCC COG ATC

CCC CCA CTG CAC TCC AGC CTC GGT CAC AGO GCC RCA TTC CGT CTC OARA CAA CR0 COO CAR CAR ATC CRC OTT CCT CCC CCC CCA CCC ATC ITGrCIA

TGT GAO TCA COT CTC TGG CCC CCC CAR TCT CCI TAT TTA CAA GTC CTC CTG GTC ATT TTT TTT TTT TTT TTC AGA CAG AICT :T CCC TCG TCA CCC

0CC CIA GAG TGC ACT GGT GTG ATC TOG CTC ACT GCA 0CC TCT CTC TCC CAC CTT CAA GCA ATT CTC CTC CCT CRC CCI (CC 800A TAG CTC CGG ICR

COG GCA CCA CCC 0CC ATG CAC 0CC TGA TTT TTG TOT TTT TAG TAG TAG RCA CCC GTT TCA CCA TGT TTG CCC 0CC GTG GTC TCC AAC TCT CGA CCT

RAG GTG ATC ARC TGC CTA GCT CCC 000 GTC CTG GCR TTA COG CCC ICC CRC CCC CCC CCC CCC CTC CTG GTG OTT CTI ATG COO COG ITT GCT 0CC

TAR TTT CC......intron 5 '1. 5kb)...............ACGRTC CCI CCC CTG ICC AAC GOT CCC TGT CCI CCC ICC 0CC TCC TCG370

TCA CCC CIA ICC 0CC AGG GAC TGG ACT CCC CRC ICC CCC GIG 0CC AAA CGG CCC GCA CCC CCC CCT COG CCI GIG CCC ICTC CCI UC CCIGG CCI COG

380 390 400 410 420 430 440 450 460 470AAC CAC 0CC TIC COG 0CC CIC CAR COG GIG CIC CRC GCA CCC TCA CCC CCC ICC CTC CCC CAT CIG CTG AGC CCC CTC ICC CRC CRC CTG CCC CCC

-sn- His Thr Lou Gin Org Leu Gin GIn Val Leu His Ala Cli' Ser Cil' Pro Ci's lIeo PTr-oH~i-s Leo Leo Ser Org Lou-UC-ys-T-1n-ip- LTeou6-Fi'Pir-90 100 110

480 490 500 500CCC CCC TTC CGA CTC CCI CCC AGG ATC TAC CIC COG AAA CCI 0CC CCC ICR ICC CRC GGA CCI CCC ICR GIC CTG CCC ICC GIG GG GACR CCI CGACli' Ala Phe Arg Lea Ala Ala Arg Met Tyr Leu Gln LEy-s C

120 130 intron 60CC RAG CCC CIC CCC CTC ICC TAG CGA CIA CCC CCC ICI CIC CCI GIG GAG CCI CCR CCC CTC GGA ACA CCI TGT CCI CCC ICC GIG CRC CA III

520 530 540 550 560 570 580 590 600 610CCC ATC AAA CAR CAT TIC CTG GAO CR0 TCC CAR COG CIA III CCC GCC ARC CCC GIG 8CC CTG 0CC CGR ARC COG CAR CAT CRC CTC GCC AAC ATCPro IIe Li's Gin Asp Phe Leou Gin Gin Ser G In G In Leu Phe Cli' AIa Li's P-ro V-al Ser Leu Tliii Ci-U-i' sITI-C-I-i Ai-W0s Ieu--Ia sn Ile-

140 150 160620 630 640 650 660 670 680 690 700

ARC COO ICC GIG ARC CRC CCC 0CC COG CCC RAG 011 CAG CAR TIC CIC TCT CCC CTG CCC COO GAC 0CC GIG TIC CII CTC CIC AAC CCC ATC CACAnGnI'VlLsGiiTGininTCi'Li's IIe- CI GnPTl-he Leu Ser C Ii' Leo Pr~o- Ciu Ap -iVal Leu LeuLU~eO i1T8

170 188 190710TIC CRC CCI CCC CTC CTC CIC CIC ICR CAT CCC CCA CCC TCI 0CC CTC 0CC ICC CRC GIG COG CCC TII TIC TTI TTT CRC RCA ACT CTC CCI CTGPhe--Gin C

TCA CCC 0GC GIT GAG CGC ACT GGC GCG ATC TCC TCT CA..................................in tron 7((1I.0kb).. .

................................................... C CTCCTCTCC AC TGG TCC CCG TCC ACG TA CCC CTC ACC CTC TICC TGG TTTCII CGT200

FIG. 2. (Figure continues on the opposite page.)

Page 6: Organization ofthe human a2-plasmin inhibitor genehad 11-kb and 1.8-kb EcoRIfiagments. BothAPI2andAPI6 contained the 1.8-kb fragments with identical sequence, indicating that both

Proc. NatL. Acad. Sci. USA 85 (1988) 6839

720 730 740 750 760 770 780 790 800 810TTC TGG AGG AAC AAG TTT GAC CCG AGC CTt ACC CAG AGA GAC TCC TTC CAC CTG GAC GAG CAG TTC ACG GTG CCC GTG GAA 0T1 ATG CAG GCC CGCPhe Trp Arg Ass Lys Pbe Asp Pro Ser Leu Thr Gin Arg Asp Ser Phe His Lev Asp Glu GC Phe Thr Val Pro Val Glu Met Met Gin Ala Arg

210 220 230820 830 840 850

ACG TAC CCC CTG CGC TGG TtC TTI CTG GAG CAG CCT GAG ATC CAG GTC ACC CTT GGT TCT CCA GCA GGC TGC CC.s......iitrom8( -3.Okb)..Thr Tyr Pro Leo Arg Trp Phe Leo Lev Clu Gin Pro Glu lie Gin240

.......... TG..TG CCT TAG GAG CAC CTG CTG GCC CCA CCC CCA CTT AGC TTC GGG CCT TtC TGT CCT CAT GCT CtT CCC TtC CCt TT CTG TAG860 870 880 890 900 910 920 930 940 950GTG GCT CAT TIC CCC ITT AAG AAC AAC ATG AGC TTT GTG GTC CTT GTA CCC 0CC CAC TTT GAO TGG AAC GTG TCC CAG GTA CTG GCC AAC CTG AGTVal Ala His Phe Pro Phe Lys Aso Asn Met Ser Phe Vol Val Leu Val Pro Thr Gis PheCluOTrp Asn Val Ser Gin Val Leu AlIaIsnoTe r250 260 270

960 970 980 990 1000 1010 1020 1030 1040 1050TGG GAC 0CC CTG CAC CCA CCT CTG GTG TGG GAG AGG CCC ACC AAG GTC CGG CTG CCT AAG CTG TOT CTG AAA CAC CAA ATG GAC CTG GtG GCC ACCTrp Asp Thr Leou His Pro Pro Leo Val Trp G6u Arg Pro hr LYs Val Org LeuPro Lys Leo TyFr 81e6u-sRiLAintoleVtITs6plAa1u280 290 300 310

1060CTC AGC CAG CTG GGT AAG GAG GAG GGT GCG GGC GAG CCC CGA GGT CG GCT GGG COG GGC GGG TAA.............intron 9( - I.Okb) ... TAG GAALeo Ser Gin Leo C

TGA AGC GGT ATC TGT GAG TTC AAG CTG TTC CCT GGC CAG GAT CTC AGA CAC CCT CCA AAG CAC CTC COG GAG CCT GtG ACG CCA AGG GCA CCT CTG1070 1080 1090 1100 1110 1120 1130ACC ACG CAT CTC TGG CCC TGG GCA GGC CTG CAG GAG TTG TTC CAG GCC CCA GAC CTG CGT GGG ATC TCC GAG CAG AGCC_ G GtGCTG TCC GGC GTG

ly Leu Gin Glu Leo Phe Gin Ala Pro Asp Leo Arg Gly lie Ser Glu GCIS-eirt iVa1Y6l_ erC-yV'I320 330

1140 1150 1160 1170 1180 1190 1200 1210 1220 1230CAG CAT CAG TCC ACC CTG GAG CTC AGC GAG GtC GGC GtG GAG GCG GCG GCG GCC ACC AGC ATT GCC ATG TCC CGC AITG TCC CTG TCC TCC TTC AGCGln iF~sln Ser1T~ Leo Glu Leou Ser GCi Val Gly Val Glu Ala Ala Ala Ala Thr Ser le 11a Wiet Ser Irg Vet SeriUo Ser Ser Phe Ser340 350 360 370

1240 1250 1260 1270 1280 1290 1300 1310 1320GTG AAC CGC CCC TtC CtC TTC TTC AtC TtC GAG GAC ACC ACA GGC CTT CCC CtC TTC GTG GGC AGC GTG AGG AAC CCC AAC CCC ACT GCA CCG CGGV1AisnIrjgPro eTiE1euTe Phe lie PhGC sp Thrr Gly Leo Pro Leo PheeVaF Cly r F Arg Asaizo-Isn Pro Ser Ala Pro Arg

380 390 4001330 1340 1350 1360 1370 1380 1390 1400 1410 1420

GAG CTC AAG CAA CAG CAG GAT TCC CCG GGC AAC AA0 GAC TtC CTC CAG AGC CTG AAA GGC TTC CCC CGC GGA GAC AAG CTT TTC GGC CCT GAC tTA617u1 Lyrs-u Gln G61 Asp Ser Pro0T6 Aso ys Asp FbieYT Ser eutLys iyhe Org Gly AspTyYT Fi6Si Pri pleo

410 420 4301430 1440 1450 1460 1470

AAA CTT GIG CCC CCC At0 GAG GAG GAT TAC CCC CAG ITI GGC AGC CCC AAG TGA GGG GCC GTG GCT GTG GCA TCC AGA GTC CCT GCC TGG ACC AGCIys ei 1iaTPFror ef G lu Gl ispTr6 Pro GIs Phe Sffyeir Pro Lys-i - _

440 450CtC TCC OCT COT GIG ACT CTT TCC AIC CGG CTT TGT GGC OCT GGG GCA GGG GCC GGG GGC AGT CTG AGA GAG GCC OTT CTT TCC CAA CAC CTC TTG

GGG A0t TTA 6G6 TGG GGG G6G GCG CGG CtG G6A G6A6 GC6 CAG GCA TCG GGG AGC CGG GAG CCT GAC CCT CAT CTIT CT TCC AA0 CAG GCT CAG AGG

1518

1614

1710

GTG TCC TGC ACC GGG 6CC TGG GCA 6GA G6G A6G TGC TIC TAG TTC TGC CAG GAG ACA GGT TAG CTG CtC CCC ACG TCA GCT GGG ACA CCC CGA CTT 1806

Tt1 ITT ACC AA GAA AAA0 GGG 066 GGG AGAGO G CTG CCT TTG GAC tTG tCC CGG GAC ACC TAG GCT AGG GTG GGG AGA GAC GGG CCC TGC TGG TGG 1902

CTC GG6 AG6 CGA AGC GTt GTC CtC AGC CCC GCG 166 AAC TCG 161 CIG GCA COG CCI GGC 161 GGC CIA 0CC TGC CGA GAG TCC AtC AGC CTC CAT 1998

CCT ACC CCC tGt GCC TTG TCA CGC CAG ACT TCC CAC GGC tCC TCG AGA tCC CAA CAC TGC CAG CAT TTC CCt tCC TTC CTC TCC TGt CTC CCT CCT 2094

CtG CCC GGG AGC TCA 66A ACC GAG GCA GGG AAG GAT CCC AtG AGC tCC tTA AGG CTC ITT TGt AAG GTT TTT GTA GTG ATT tTt AtG CCA CCt G61 2190

TEJ4AtA6 AAT 666 CCt GGC t1G ttt 6At GTC ACC GTT CtG GG 2234

FIG. 2. Nucleic acid sequence of the a2-plasmin inhibitor gene. Exons are underlined with solid lines. Bases in the exons and the 5'- and3-flanking regions are numbered relative to the translation initiation site. Amino acids are numbered from the NH2-terminal residue in the plasmaprotein. Regions corresponding to a potential TATA box, the GC boxes, a potential transcriptional start site (- 22) and a polyadenylylationrecognition site (2189-2194) are boxed. The direct repeats ofCCAAT box-like sequence are indicated by dots. G + C-rich sequences are indicatedby the dashed underlines. The sequence (-123 to -108), similar to the hepatitis B virus enhancer sequence, is indicated by a waved line. Thesequence (- 809 to - 800), similar to the human immunodeficiency virus enhancer sequence or K-immunoglobulin light-chain gene enhancersequence, is bracketed.

CCAAT box homology (31) downstream of the GC box (27,31). In our study, we found the two direct repeats of theCCAAT box homology sequence, 5'-GCCATCA-3', sepa-rately located in the downstream regions of the two differentGC boxes (Fig. 2). The TATA box may determine theposition ofthe start of transcription, whereas the GC box maybe the site interacting with a cellular transcription factornecessary for transcriptional activity (32).The first base of the most proximal GC box sequence or the

CCAAT box homology sequences is located 88 or 74 basesupstream, respectively, from the proposed transcriptioninitiation site (Fig. 2). The relative positions of these sites areaccordant with that usually found in eukaryotes (28). The firstthymine of the TATA box is located 17 bases upstream fromthe proposed transcription initiation site (Fig. 2). The TATA

1 2 3 4 5 6

* are *~~-65

I **-48

FIG. 3. Primer-extention reac-tions by reverse transcriptase. Anend-labeled oligonucleotide probefrom the gene for the a2PI was used.Lanes: 1-4, DNA sequencing lad-der for size comparison; 5 and 6,primer-extension reactions withliver cell line Hep G2 and normalliver cell RNAs, respectively. Fig-ures at right are the lengths of theprimer-extension products, corre-sponding to 48 and 65 nt.

box is usually found between 20 and 30 bases upstream fromthe transcription initiation site on most eukaryotic protein-coding genes. Therefore, the distance between the TATA boxand the transcription initiation site here proposed might be anexceptional case among eukaryotes.

It is interesting to note that the 16-bp sequence (nt -123 to-108 in Fig. 2) is 88% similar to the 17-bp sequence (nt 1193-1209) in the hepatitis B virus enhancer element (33), whichdisplays tissue-specific activity (34, 35) and shows highhomology with sequences in the promoter region of severalliver-specific genes; a-fetoprotein, a1-antitrypsin, and albu-min (33). Also interesting is the presence of a 10-bp sequence,GTGACTITCC, between nt - 799 and - 810 (Fig. 2). Thissequence differs only by one base from the human immuno-deficiency virus enhancer sequence, G!jGACT-TlCC (36),that is 100% similar to an enhancer sequence in the K-immunoglobulin light-chain gene (37). It is quite interesting tosee whether these sequences are also functional elements forthe enhancement of the transcriptional activity of a2PI gene.The lengths of exons 1-10 were 17, 67,39,63, 202, 144,204,

143, 205, and 1169 bp, respectively. Exon 1 is located 6 kbupstream from exon 2 that contains initiation codon ATG.The signal peptide is encoded by exons 2,3, and apart of exon4 (Figs. 1 and 2). The signal peptide consists of 39 aminoacids, of which 23 are hydrophobic and form hydrophobiccores (Fig. 2). This agrees with the characteristic features ofsignal peptides (38). One base difference was noted in thesequence coding for the signal peptide as compared with thesequence of the cDNA reported by Tone et al. (6). Thenucleotide (97 in their numbering system) was thymine in

Genetics: Hirosawa et al.

Page 7: Organization ofthe human a2-plasmin inhibitor genehad 11-kb and 1.8-kb EcoRIfiagments. BothAPI2andAPI6 contained the 1.8-kb fragments with identical sequence, indicating that both

Proc. Natl. Acad. Sci. USA 85 (1988)

their cDNA, but the nucleotide at this position (nt 97) wascytosine in our study and also in the cDNA reported byHolmes et al. (5). Consequently, the predicted amino acid atposition -7 was arginine in our study and the study byHolmes et al. (5), whereas tryptophan was predicted by thecDNA sequence reported by Tone et al. (6). The differencemay have been caused by the difference of cell types fromwhich the cDNA was derived. Tone et al. (6) used the cDNAderived from a liver carcinoma cell line for sequencing the5'-region, whereas normal cells were used in our study andthe study by Holmes et al. (5).The sequence of the 3'-noncoding region, including the

consensus polyadenylylation signal AATAAA (39), is iden-tical with that of the cDNA reported by Tone et al. (6), exceptfor one substitution (T -- G) at nt 1547. When compared withthe cDNA sequence of the 3'-noncoding region reported byHolmes et al. (5), however, five minor differences includingone deletion, three insertions, and one substitution werenoticed (6). The poly(A) addition site, determined from thecDNA sequences reported by Sumi et al. (4) and Tone et al.(6), is cytosine at nt 2207 in Fig. 2. Another possiblepolyadenylylation site, determined from the cDNA se-quences reported by Holmes et al. (5), is thymine at nt 2212.These differences are probably due to the origins used for theconstruction of the cDNA libraries.

a2PI contains three functional domains-the reactive site,the plasminogen-binding site, and the cross-linking site forthe fibrin a chain (2, 3). The plasminogen-binding site and thecross-linking site are peculiar to a2PI among serine proteaseinhibitors and make a2PI the most specific and effective onein inhibiting plasmin-catalyzed fibrinolysis (2, 3). The cross-linking site domain is located in the NH2-terminal region (12)and is encoded by exon 4. The plasminogen-binding sitedomain is located in the COOH-terminal region (4, 7) and isencoded by exon 10. The reactive-site peptide bond that iscleaved by the reaction with plasmin has been postulated tobe Met-362 to Ser-363 (4) or Arg-364 to Met-365 (5), and thereactive site domain containing these peptide bonds is en-coded by exon 10, like the plasminogen-binding site domain.Homologous amino acid sequences of human a2PI and

other serpin superfamily members (antithrombin III, a1-antitrypsin, and rat angiotensinogen) were aligned as previ-ously reported (6), and the positions of the introns werecompared. Only one intron of nine introns of a2PI, intron 7,was located at the position equivalent to those of the otherserpin members. When the positions of these introns arecompared at the nucleotide level, the intron of a2PI alignsprecisely with those of a1-antitrypsin and angiotensinogen(40). However, the intron of antithrombin III is misaligned byonly one nucleotide as shown by Prochownik et al. (40).Although the serpin gene superfamily may originate from

the same ancestor, explaining the discrepancies in intronpositions of its members is difficult. Cornish-Bowden (41) hassuggested that random losses of most introns occur duringevolution from an ancestral gene. Others (42) have suggestedthat introns have been introduced into a particular familyafter the divergence of its members from an ancestral gene.The former proposal suggests that a2PI may be evolutionallyprimitive because the number of introns in the a2PI gene isthe highest among the serpin gene superfamily members. Thelatter proposal suggests, on the contrary, that a2PI may beevolutionally new. The former proposal agrees with thephylogenetic tree of the serpins constructed by Tone et al.(6), which suggested that a2PI was the first gene to branchfrom the common ancestor of the serpins.

We thank Drs. Masami Muramatsu and Masaharu Sakai, Depart-ment of Biochemistry, University of Tokyo School of Medicine, for

valuable advice during the course of this work, and Dr. YataroIchikawa, Central Research Laboratories, Teijin Ltd., for synthe-sizing the oligonucleotide, and Dr. Yoshiyuki Sakaki, KyushuUniversity School of Medicine, for critical reading ofthe manuscript.This research was supported, in part, by grants from the Ministry ofEducation, Science and Culture ofJapan (62480260), Teijin Ltd., andthe Mitsubishi Foundation.

1. Moroi, M. & Aoki, N. (1976) J. Biol. Chem. 251, 5956-5965.2. Aoki, N. & Harpel, P. C. (1984) Semin. Thromb. Hemostasis

10, 24-41.3. Aoki, N. (1986) J. Protein Chem. 5, 269-277.4. Sumi, Y., Nakamura, Y., Aoki, N., Sakai, M. & Muramatsu,

M. (1986) J. Biochem. 100, 1399-1402.5. Holmes, W. E., Nelles, L., Lijnen, H. R. & Collen, D. (1987)

J. Biol. Chem. 262, 1659-1664.6. Tone, M., Kikuno, R., Kume-Iwaki, A. & Hashimoto-Gotoh,

T. (1987) J. Biochem. 102, 1033-1041.7. Sasaki, T., Morita, T. & Iwanaga, S. (1986) J. Biochem. 99,

1699-1705.8. Moroi, M. & Aoki, N. (1977) Thromb. Res. 10, 581-586.9. Aoki, N., Moroi, M. & Tachiya, K. (1978) Thromb. Hae-

mostasis 39, 22-31.10. Wiman, B., Lijnen, H. R. & Collen, D. (1979) Biochim.

Biophys. Acta 579, 142-154.11. Sakata, Y. & Aoki, N. (1980) J. Clin. Invest. 65, 290-297.12. Tamaki, T. & Aoki, N. (1982) J. Biol. Chem. 257, 14767-14772.13. Kimura, S. & Aoki, N. (1986) J. Biol. Chem. 261, 15591-15595.14. Sakata, Y. & Aoki, N. (1982) J. Clin. Invest. 69, 536-542.15. Aoki, N., Sakata, Y. & Ichinose, A. (1983) Blood 62, 1118-

1122.16. Aoki, N., Moroi, M., Matsuda, M. & Tachiya, K. (1977) J.

Clin. Invest. 60, 361-369.17. Aoki, N. (1979) Prog. Cardiovasc. Dis. 21, 267-286.18. Aoki, N., Sakata, Y., Matsuda, M. & Tateno, K. (1980) Blood

55, 483-488.19. Aoki, N. (1984) Semin. Thromb. Hemostatis 10, 42-50.20. Kato, A., Nakamura, Y., Miura, O., Hirosawa, S., Sumi, Y. &

Aoki, N. (1988) Cytogenet. Cell Genet., in press.21. Matsushime, H., Wang, L. H. & Shibuya, M. (1986) Mol. Cell.

Biol. 6, 3000-3004.22. Benton, W. D. & Davis, R. W. (1977) Science 196, 180-182.23. Southern, E. M. (1975) J. Mol. Biol. 98, 503-517.24. Maniatis, T., Fritsch, E. F. & Sambrook, J. (1982) Molecular

Cloning:A Laboratory Manual (Cold Spring Harbor Lab., ColdSpring Harbor, NY), pp. 199-206.

25. Sanger, F., Nicklen, S. & Coulson, A. R. (1977) Proc. Natl.Acad. Sci. USA 74, 5463-5467.

26. Hattori, M. & Sakaki, Y. (1986) Anal. Biochem. 152, 232-238.27. Sollner-Webb, B. & Reede, R. H. (1979) Cell 18, 485-499.28. Breathnach, R. & Chambon, P. (1981) Annu. Rev. Biochem. 50,

349-383.29. Breathnach, R., Benoist, C., O'Hare, K., Gannon, F. &

Chambon, P. (1978) Proc. Natl. Acad. Sci. USA 75, 4853-4857.30. McKnight, S. L. & Kingsbury, R. (1982) Science 217, 316-324.31. Benoist, C., O'Hare, K., Breathnach, R. & Chambon, P. (1980)

Nucleic Acids Res. 8, 127-142.32. McKnight, S. L. & Tjian, R. (1986) Cell 46, 795-805.33. Shaul, Y. & Ben-Levy, R. (1987) EMBO J. 6, 1913-1920.34. Jameel, S. & Siddiqui, A. (1986) Mol. Cell. Biol. 6, 710-715.35. Tur-Kaspa, R., Burk, R. D., Shaul, Y. & Shafritz, D. A. (1986)

Proc. Natl. Acad. Sci. USA 83, 1627-1631.36. Franza, B. R., Jr., Josephs, S. F., Gilman, M. Z., Ryan, W. &

Clarkson, B. (1987) Nature (London) 330, 391-395.37. Nabel, G. & Baltimore, D. (1987) Nature (London) 326, 711-

713.38. Jackson, R. C. & Blobel, G. (1980) Ann. N. Y. Acad. Sci. 343,

391-403.39. Proudfoot, N. J. & Brownlee, G. G. (1976) Nature (London)

263, 211-214.40. Prochownik, E. D., Bock, S. C. & Orkin, S. H. (1985) J. Biol.

Chem. 260, 9608-9612.41. Cornish-Bowden, A. (1982) Nature (London) 297, 625-626.42. Leicht, M., Long, G. L., Chandra, T., Kurachi, K., Kidd,

V. J., Mace, M., Jr., Davie, E. W. & Woo, S. L. C. (1982)Nature (London) 297, 655-659.

6840 Genetics: Hirosawa et al.