s. j. mack 13th international histocompatibility a ...rsingle/temp2/13thpdfs/332_k.pdf · s. j....

16
87604$$332 P332 30-05-06 12:39:39 PDF GL 87604 C2 JR332 S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop Anthropology/Human D. Meyer Genetic Diversity Joint Report R. M. Single Y. Tsai Chapter 2. Methods used in the generation H. A. Erlich and preparation of data for analysis in the 13th International Histocompatibility Workshop Authors’ addresses Steven J. Mack 1,2 , Alicia Sanchez-Mazas 3 , Diogo Meyer 4 , Richard M. Single 5 , Yingssu Tsai 3 , Henry A. Erlich 1,2 1 Children’s Hospital Oakland Research Institute, Oakland CA, 2 Department of Human Genetics, Roche Molecular Systems, Alameda, CA, 3 Laboratory of Anthropology, Genetics and Peopling history, Department of Anthropology and Ecology, University of Geneva, Switzerland, 4 Department of Integrative Biology, University of California, Berkeley CA, and 5 Department of Medical Biostatistics, University of Vermont, Burlington, VT Acknowledgements This work was supported by NIH shared resource grant U24 A149213 and by FNS (Switzerland) grant .3100-49771.96. We wish to thank the following IHWG AHGDC participants for their contribution of data to the IHWG: D. Adorno, S. Agrawal, T. Akesaka, A. Arnaiz-Villena, N. Bendukidze, C. Brautbar, T. L. Bugawan, J. Cervantes, M. Crawford, E. Donadi, S. Easteal, H. A. Erlich, M. Fernandez- Vina, X. Gao, E. Gazit, C. Gorodezky, M. Hammond, E. Ivaskova, A. Kastelan, V. I. Konenkov, M. H. S. Kraemer, D. Kumashiro, R. Lang, Z. Layrise, M. S. Leffel, M. Lin, M. L. Lokki, L. Louie, M. Luo, S. J. Mack, M. Martinetti, J. McCluskey, N. Mehra, D. Middleton, E. Naumova, Y. Paik, M. H. Park, M. L. Petzl- Erler, A. Sanchez-Mazas, G. Saruhan, M. L. Sartakova, M. Schroeder, U. Shankarkumar, S. Sonoda, J. Tang, E. Thorsby, J.M. Tiercy, K. Tokunaga, E. Trachtenberg, V. Trieu An, B. Vidan-Jeras, and Y. Zaretskaya. In addition, we thank D. Gjertson, J. Hollenbach, A. Sanchez- Mazas, S. Tonks, and E. Trachtenberg for providing access to 12th Workshop datasets. HLA 2004: Immunobiology of the Human MHC. Proceedings of the 13th International Histocompatibility Workshop and Congress 1 Section 1. Description of datasets From 1998 through 2002, 110 population datasets repre- senting 13,481 sampled individuals were submitted to the International Histocompatibility Working Group (IHWG) Anthropology/Human Genetic Diversity Component (AHGDC) for analysis as part of the 13th International Histocompatibility Workshop (13W) by 39 participating laboratories. For the most part, these datasets represent high to allele-level genotyping data at subsets of the HLA-A, C, B, DRB1, DQA1, DQB1, DPA1 and DPB1 loci. Analyses for 95 of these population samples (referred to afterward as 13W datasets, described in Table 1 and pro- vided in Appendix C), representing 12,225 individuals, are presented in subsequent chapters (4–7). The remaining 15 population samples were typed at serological to medium- level resolution, or were missing information, and were excluded from analysis. Table 2 describes a set of 48 sup- plementary datasets (originally genotyped as part of the 12th International Workshop (12W) Anthropology Compo- nent), representing 5774 sampled individuals that were included in these analyses (although the results of analyses including these datasets are not always reported here), as well as an additional 20 12W datasets that were included in the linguistics-related analyses presented in chapter 7. These 12W datasets were chosen to supplement global re- gions (see below and Figure 1) for which low numbers of 13W datasets were analyzed. The ‘‘map number’’ as- signed to each population in the 12W Anthropology Report (1) is provided on Table 2 for clarification (12W.). The number of individuals typed per locus for each of the 48 12W and 95 13W datasets is described in Table 3. The latter ranged in size (n) from 12 to 1000 sampled individ- uals, with a median value of 98, while n for the 48 12W datasets ranged from 15 to 1012, with a median value of 82.

Upload: dinhkhanh

Post on 23-May-2018

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

87604$$332 P332 30-05-06 12:39:39 PDF GL 87604 C2 JR332

S. J. Mack 13th International HistocompatibilityA. Sanchez-Mazas Workshop Anthropology/HumanD. Meyer

Genetic Diversity Joint ReportR. M. Single

Y. Tsai Chapter 2. Methods used in the generationH. A. Erlich and preparation of data for

analysis in the 13th InternationalHistocompatibility Workshop

Authors’ addresses

Steven J. Mack1,2, Alicia Sanchez-Mazas3, Diogo Meyer4,Richard M. Single5, Yingssu Tsai3, Henry A. Erlich1,2

1Children’s Hospital Oakland ResearchInstitute, Oakland CA, 2Department of HumanGenetics, Roche Molecular Systems, Alameda,CA, 3Laboratory of Anthropology, Geneticsand Peopling history, Department ofAnthropology and Ecology, University ofGeneva, Switzerland, 4Department ofIntegrative Biology, University of California,Berkeley CA, and 5Department of MedicalBiostatistics, University of Vermont,Burlington, VT

Acknowledgements

This work was supported by NIH sharedresource grant U24 A149213 and by FNS(Switzerland) grant .3100-49771.96. Wewish to thank the following IHWG AHGDCparticipants for their contribution of data to theIHWG: D. Adorno, S. Agrawal, T. Akesaka, A.Arnaiz-Villena, N. Bendukidze, C. Brautbar, T.L. Bugawan, J. Cervantes, M. Crawford, E.Donadi, S. Easteal, H. A. Erlich, M. Fernandez-Vina, X. Gao, E. Gazit, C. Gorodezky, M.Hammond, E. Ivaskova, A. Kastelan, V. I.Konenkov, M. H. S. Kraemer, D. Kumashiro, R.Lang, Z. Layrise, M. S. Leffel, M. Lin, M. L.Lokki, L. Louie, M. Luo, S. J. Mack, M. Martinetti,J. McCluskey, N. Mehra, D. Middleton, E.Naumova, Y. Paik, M. H. Park, M. L. Petzl-Erler, A. Sanchez-Mazas, G. Saruhan, M. L.Sartakova, M. Schroeder, U. Shankarkumar, S.Sonoda, J. Tang, E. Thorsby, J.M. Tiercy, K.Tokunaga, E. Trachtenberg, V. Trieu An, B.Vidan-Jeras, and Y. Zaretskaya. In addition, wethank D. Gjertson, J. Hollenbach, A. Sanchez-Mazas, S. Tonks, and E. Trachtenberg forproviding access to 12th Workshop datasets.

HLA 2004:

Immunobiology of the Human MHC.

Proceedings of the 13th International

Histocompatibility Workshop and Congress

1

Section 1. Description of datasets

From 1998 through 2002, 110 population datasets repre-

senting 13,481 sampled individuals were submitted to the

International Histocompatibility Working Group (IHWG)

Anthropology/Human Genetic Diversity Component

(AHGDC) for analysis as part of the 13th International

Histocompatibility Workshop (13W) by 39 participating

laboratories. For the most part, these datasets represent high

to allele-level genotyping data at subsets of the HLA-A, C,

B, DRB1, DQA1, DQB1, DPA1 and DPB1 loci.

Analyses for 95 of these population samples (referred to

afterward as 13W datasets, described in Table 1 and pro-

vided in Appendix C), representing 12,225 individuals, are

presented in subsequent chapters (4–7). The remaining 15

population samples were typed at serological to medium-

level resolution, or were missing information, and were

excluded from analysis. Table 2 describes a set of 48 sup-

plementary datasets (originally genotyped as part of the

12th International Workshop (12W) Anthropology Compo-

nent), representing 5774 sampled individuals that were

included in these analyses (although the results of analyses

including these datasets are not always reported here), as

well as an additional 20 12W datasets that were included

in the linguistics-related analyses presented in chapter 7.

These 12W datasets were chosen to supplement global re-

gions (see below and Figure 1) for which low numbers

of 13W datasets were analyzed. The ‘‘map number’’ as-

signed to each population in the 12W Anthropology Report

(1) is provided on Table 2 for clarification (12W.).

The number of individuals typed per locus for each of the

48 12W and 95 13W datasets is described in Table 3. The

latter ranged in size (n) from 12 to 1000 sampled individ-

uals, with a median value of 98, while n for the 48 12W

datasets ranged from 15 to 1012, with a median value of 82.

Page 2: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 1. Description of 13W population datasets (nΩ95 populations)

12th IHWS1

Population dataset Sub- No. Com-Region name Labcode mission samples Country/Region Lat2 Long3 Linguistic family/Language plexity

Kenya/different01.SS-Africa Kenyan_142 CANLUO areas ª1.3 36.8 Mainly Niger-Congo 1

Niger-Congo ± Mande ±01.SS-Africa Mandenka CHETIE YES 122 Senegal/Kedougou 12.6 ª12 Mandenka 2

Zimbabwe/01.SS-Africa Shona USALOU Mashonaland ª18 25.9 Niger-Congo ± Central ± Shona 201.SS-Africa Dogon USAMFV Mali/Bandiagara 14.4 ª3.6 Niger-Congo ± Central ± Dogon 1

Kenyan-Lowlander01.SS-Africa (Luo) USAMFV Kenya/Kanyawegi ª0.11 34.63 Nilo-Saharan ± Nilotic 2

Kenyan-Highlander01.SS-Africa (Nandi) USAMFV Kenya/Kipsamoite 0.33 35.01 Nilo-Saharan ± Nilotic 201.SS-Africa Ugandan USAMFV Uganda/Kampala 0.2 32.4 Mainly Niger-Congo 301.SS-Africa Zambian USAMFV Zambia/Lusaka ª15 28.2 Niger-Congo ± Central ± South 3

North American USA (American Indo-European ± Germanic ±01.SS-Africa (Afr_descent) USAMFV Red Cross) English 3

Niger-Congo ± Central ± South01.SS-Africa Rwandan USATNG Rwanda/Kigali ª1.6 30.5 ± Kinyarwanda (ΩRwanda) 3

South Africa/ Niger-Congo ± Central ± South01.SS-Africa Zulu ZAFHAM YES 79 Durban ª30 31 ± Zulu 202.N-Africa Algerian_98 CHESAN YES 235 Algeria/Oran 35.5 ª0.4 Afro-Asiatic ± Berber 202.N-Africa Moroccan_94 CHESAN YES 501 Morocco, Souss 30.3 9.35 Afro-Asiatic ± Berber 202.N-Africa Moroccan_99 ESPARN Morocco/El Jadida 33.2 ª8.5 Afro-Asiatic ± Semitic ± Arabic 202.N-Africa Chaouya ITAADO Morocco/Settat 33.04 ª7.37 Afro-Asiatic ± Semitic ± Arabic 202.N-Africa Metalsa ITAADO Morocco/Nador 35.3 ª4 Afro-Asiatic ± Berber ± Tarifit 2

Indo-European ± Slavic ±03.Europe Bulgarian_Gipsy BGRNAU Bulgaria/Sofia 42.7 23.3 Bulgarian 2

Czech Republic/03.Europe Czech CZEIVS Praha 50 14.3 Indo-European ± Slavic ± Czech 203.Europe Georgian CZEIVS Georgia/Tbilisi 41.7 44.9 South Caucasian ± Georgian 203.Europe Finn_89 FINLOK Finland 60.2 25.1 Uralic-Yukaghir ± Uralic ± Finnish 3

Indo-European ± Slavic ± Serbo-03.Europe Croatian HRVKAS YES 21 Croatia/Zagreb 45.2 15.5 Croatian 3

Indo-European ± Balto-Slavic ±03.Europe Slovenian SVNJER Slovenia/Ljubljana 46 14 Slovene 2

Northern Ireland/all Indo-European ± Germanic ±03.Europe Irish UKIMID regions 54.7 ª6.7 English 3

North American USA/American Red Indo-European ± Germanic ±03.Europe (Eur_descent) USAMFV Cross English 3

Cuban_(Eur_03.Europe descent) UKIMID Cuba/Havana 21.5 ª80 Indo-European ± Italic ± Spanish 3

Indo-European ± Indo-Iranian ±04.SW-Asia Kurdish CZEIVS Georgia/Tbilisi 41.7 44.9 Kurdish 204.SW-Asia Druze ISRBRA Israel 32 34.5 Afro-Asiatic ± Semitic ± Arabic 304.SW-Asia Israeli_Jew ISRGAZ Israel 32 34.5 Afro-Asiatic ± Semitic ± Hebrew 304.SW-Asia Turk TURSAR YES 89 Turkey/Marmara 41 28.6 Altaic ± Turkic ± Turkish 2

Oman/various04.SW-Asia Omani UKIMID regions 21 57 Afro-Asiatic ± Semitic ± Arabic 3

Indo-European ± Indo-Iranian ±04.SW-Asia New_Dehli USAERL India/New Delhi 28.6 77.2 Indic 3

India/Andhra Elamo-Dravidian ± Dravidian ±04.SW-Asia South_Indian USAERL Pradesh, Golla 17.5 78.5 South Central ± Telugu 2

South Africa/ Elamo-Dravidian ± Dravidian ±04.SW-Asia Tamil ZAFHAM Durban ª30 31 South ± Tamil 3

Taiwan/Hualien,05.SE-Asia Ami TWNLIN Taitung 25.1 122 Austronesian ± Paiwanic ± Ami 1

Taiwan/Wulai,05.SE-Asia Atayal TWNLIN Chenshih, Wufen 24.9 122 Austronesian ± Atayal 1

Taiwan/Hsin-I,05.SE-Asia Bunun TWNLIN Taitung 23.6 121 Austronesian ± Paiwanic ± Bunun 1

Taiwan/Hsinchu, Sino-Tibetan ± Chinese05.SE-Asia Hakka TWNLIN Pintung 24.8 121 southwestπcentral ± Cantonese 305.SE-Asia Minnan TWNLIN Taiwan/Taipei 25.1 122 Sino-Tibetan ± Chinese southeast 3

Austronesian ± Paiwanic ±05.SE-Asia Paiwan_51 TWNLIN Taiwan/Lai-I 22.5 121 Paiwan 1

Taiwan/Puli, Liyutan, Austronesian ± Paiwanic Sinicized05.SE-Asia Pazeh TWNLIN Fengyuan 24 121 ± Pazeh 3

Austronesian ± Paiwanic ±05.SE-Asia Puyuma_49 TWNLIN Taiwan/Peinan 22.8 121 Puyuma 105.SE-Asia Rukai TWNLIN Taiwan/Wutai 22.8 121 Austronesian ± Tsouic ± Rukai 1

Taiwan/Wufen, Austronesian ± Paiwanic ±05.SE-Asia Saisiat TWNLIN Nanchuang 24.6 121 Saisiyat 1

Taiwan/Tanei, Austronesian ± Paiwanic Sinicized05.SE-Asia Siraya TWNLIN Tsochen 23.1 120 ± Siraya 3

2 HLA 2004: Immunobiology of the Human MHC

Page 3: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 1. Continued

12th IHWS1

Population dataset Sub- No. Com-Region name Labcode mission samples Country/Region Lat2 Long3 Linguistic family/Language plexity

Austronesian ± Western Malayo-05.SE-Asia Tao (Yami) TWNLIN Taiwan/Lan-Yu 22 122 Polynesian ± Yami 1

Austronesian ± Paiwanic Sinicized05.SE-Asia Thao TWNLIN Taiwan/Yuchih 23.9 121 ± Thao 305.SE-Asia Toroko TWNLIN Taiwan/Hsiulin 23 120 Austronesian ± Atayal 105.SE-Asia Tsou TWNLIN Taiwan/Tapang 23.5 121 Austronesian ± Tsouic ± Tsou 105.SE-Asia Han_Chinese_149 UKIMID Singapore 1.28 104 Sino-Tibetan ± Chinese 305.SE-Asia Han_Chinese_572 UKIMID Hong-Kong 22.2 114 Sino-Tibetan ± Chinese 3

Austronesian ± Western Malayo-05.SE-Asia Malay USAERL Malay Peninsula 3 103 Polynesian ± Sundic ± Malay 2

Singapore_(Chin_05.SE-Asia descent) USAERL Singapore 1.28 104 Sino-Tibetan ± Chinese 3

Tai-Kadai ± Tai-Sek ± Southwest-05.SE-Asia Thai USAERL Thailand 12 100 ern Tai ± Chiang Saeng 3

North American USA (American Indo-European ± Germanic ±05.SE-Asia (Asi_descent) USAMFV Red Cross) English 3

China/South, Can-05.SE-Asia Chinese USATRA ton region 23.6 113 Sino-Tibetan ± Chinese 3

Austroasiatic ± Mon-Khmer ±05.SE-Asia Kinh VNMVTA Viet Nam/Hanoi 21 106 Vietnamese 3

Austroasiatic ± Mon-Khmer ±05.SE-Asia Muong VNMVTA Viet Nam/Hoa Binh 20.8 105 Muong 2

Philippines/Batan Is- Austronesian ± Extra-Formosan ±06.Oceania Ivatan TWNLIN land, Baso 20.5 122 Proto-Filipino ± Ivatan 1

Indonesia/East Ti- Austronesian ± Central Malayo-06.Oceania East_Timorese USAERL mor/Nusa Tenggara ª9 125 Polynesian ± Flores-Lembata 2

Austronesian ± Western Malayo-06.Oceania Filipino USAERL Philippines/Manilla 14.6 121 Polynesian ± Tagalog 206.Oceania Indonesian USAERL Indonesia ª3 120 Austronesian ± Malayo-Polynesian 3

Austronesian ± Central Malayo-06.Oceania Moluccan USAERL Indonesia/Moluccas ª2 127 Polynesian ± Southwest Maluku 2

Melanesia/New Indo-Pacific ± Trans-New Guinea06.Oceania PNG_Highlander USAERL Guinea, Highlands ª5 145 ± East New Guinea Highlands 2

Melanesia/NewPNG_Lowlander_ Guinea, Lowlands,

06.Oceania 48 USAERL many areas ª10 147 Indo-Pacific 2Melanesia/New

PNG_Lowlander_ Guinea, Lowlands, Indo-Pacific ± Sepik-Ramu ± Sepik06.Oceania 95 USAERL Wosera ª5 150 ± Middle Sepik ± Ndu 2

Austronesian ± Eastern Malayo-06.Oceania Samoa USARWL Melanesia /Samoa 14.2 ª171 Polynesian ± Oceanic ± Samoan 2

Australia/Australian_Cape_ Queensland/Cape

07.Australia York USAGAO York ª13 143 Australian ± Pama-Nyungan 2Australia/Northern

Australian_Groote_ Territory/Groote- Australian ± non-Pama-Nyungan07.Australia Eylandt USAGAO Eylandt ª14 137 ± Anindhilyaguan 1

Australian_ Australia/Western Australian ± non-Pama-Nyungan07.Australia Kimberley USAGAO Australia/Kimberley ª17 127 ± Wororan and Nyulnyulan 2

Australian_ Australia/Northern Australian ± Pama-Nyungan ±07.Australia Yuendumu USAGAO YES 6 Territory/Yuendumu ª24 132 Ngargan 2

Altaic ± Korean-Japanese ± Ryu-08.NE-Asia Okinawan USAPAK Hawai/Honolulu 26 128 kyuan 2

Altaic ± Korean-Japanese ± Ryu-08.NE-Asia Ryukuan JPNTKN Japan/Okinawa 26.4 128 kyuan ± Amami-Okinawan 108.NE-Asia Buriat JPNTKN Mongolia/Angarsk 47.6 119 Altaic ± Mongol 1

Altaic ± Korean-Japanese ± Ko-08.NE-Asia Korean KORPMH Korea/Seoul 37.6 127 rean 2

Russia/Novosibirsk-08.NE-Asia Tuva USAERL Kyzyl 50 95 Altaic ± Turkic ± Tuvinian 209.N-America Lacandon MEXGOR Mexico/Chiapas 16.7 ª91 Amerind ± Maya ± Lacandon 1

Amerind ± North Amerind ± Ho-09.N-America Seri MEXGOR Mexico/Isla Tiburon 29 ª112 kan ± Seri 1

USA/Arizona, Na-Dene ± Athapascan ± Navajo09.N-America Canoncito USAERL Grand Canyon 36.1 ª112 ± Canoncito 1

Amerind ± North Amerind ±09.N-America Maya USAERL Mexico/Yucatan 20 ª90 Penutian ± Mayan 2

USA/Arizona, Gila Amerind ± Central Amerind ±09.N-America Pima_17 USAERL River 33 ª113 Uto-Aztecan 109.N-America Pima_99 USAERL USA/Arizona 33 ª112 Amerind ± North Amerind 1

Amerind ± North Amerind ± Al-09.N-America Sioux USAERL USA/South Dakota 43.6 ª97 mosan-Keresiouan ± Dakota 2

Amerind ± North Amerind ±09.N-America Zuni USAERL USA/New Mexico 35 ª107 Penutian ± Zuni 109.N-America Yupik USALEF Alaska/South 60 ª160 Eskimo-Aleut ± Eskimo ± Yupik 2

USA/American Red09.N-America Amerindian USAMFV Cross Amerind 3

3HLA 2004: Immunobiology of the Human MHC

Page 4: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 1. Continued

12th IHWS1

Population dataset Sub- No. Com-Region name Labcode mission samples Country/Region Lat2 Long3 Linguistic family/Language plexity

Brazil/Mato Grossodo Sul, Amambai, Li- Amerind ± Equatorial-Tucanoan ±

10.S-America Guarani-Kaiowa BRAPTZ mao Verde ª23 ª55 Tupi-Guarani ± Guarani 1Brazil/Mato Grossodo Sul, Porto Lindo, Amerind ± Equatorial-Tucanoan ±

10.S-America Guarani-Nandeva BRAPTZ Amambai, ª24 ª55 Tupi-Guarani ± Guarani 1Amerind ± Equatorial-Tucanoan ±

10.S-America Ticuna USAERL Brazil/Tabatinga ª5 ª70 Ticuna 1Amerind ± Chibcan-Paezan ±

10.S-America Central American USAERL Costa Rica/Panama 3 ª65 Chibchan 1Venezuela/Saimadodyi & Cam- Amerind ± Chibcan-Paezan ±

10.S-America Bari VENLAY po Rosario 9.8 ª73 Chibchan ± Bari (Motilon) 1Brazilian_(Afr-Eur_ Brazil/Ribeirao Pre- Indo-European ± Italic ± Portu-

11.Other descent) BRADON to ª23 ª48 guese 311.Other Mexican MEXGOR Mexico/Mexico City 19.4 ª99 Indo-European ± Italic ± Spanish 2

Brazil/Belo Hori- Indo-European ± Italic ± Portu-11.Other Brazilian UKIMID zonte ª10 ª55 guese 3

Cuban_(Afr-Eur_11.Other descent) UKIMID Cuba/Havana 21.5 ª80 Indo-European ± Italic ± Spanish 3

North American USA (American Indo-European ± Germanic ±11.Other (His_descent) USAMFV Red Cross) English 31These same samples were also reported in the 12th IHWS (see reference .1)2LAT indicates latitude in degrees north or south.3LONG indicates longitude in degrees east or west.

The mean number of loci genotyped in the 13W datasets was

3.9, while the mean for the 48 12W datasets was 3.6.

Many of the population samples submitted as 13W datasets

had been previously typed as part of the 11th International

Workshop (11W) or 12W, primarily at class II loci. Submitting

laboratories were encouraged to use IHWG methods (see sec-

tion A, HLA Typing and Informatics) for new typing. In cases

where participating laboratories were unable to carry out mo-

lecular-level typing, genotyping was accomplished by a sec-

ond laboratory. All laboratories using IHWG reagents were

required to type a subset of the IHWG Quality Control (QC)

cell panel cells at 92% accuracy before new genotyping data

could be accepted for analysis. Data that had been generated

before the start of the IHWG, or which had been generated

using non-IHWG methods, was classed as non-qualified data

(or ‘‘Available Data’’) and was accepted in the form of four

digit (relatively unambiguous) genotype assignments. Data

generated with IHWG reagents was submitted in the form

of probe-reactivity patterns, formatted using either the RLS

software or the IHWG Virtual DNA Analysis (VDA) compo-

nent’s SCORE software (Section Joint R, Virtual DNA Analysis

Report). In these cases, alleles and genotypes were inferred

by the software. Approximately 53% (50/95) of the IHWG

datasets had been typed subsequent to the 12W and were

accepted as non-qualified data. Of the remaining 45 datasets,

41 were typed at class I loci using IHWG RLS reagents, 3

using IHWG SSOP reagents, and 1 by sequencing based typ-

ing (SBT) (these typing methods are described in the Tech-

4 HLA 2004: Immunobiology of the Human MHC

nology joint report, sections A.2, A.3, and A.5). In many

instances, the results of these methods were verified by SBT.

Class II typing in many of these datasets was carried out using

12W or local reagents. Datasets typed using local reagents

were submitted in a format similar to non-qualified data (rel-

atively unambiguous assignments).

Thanks to the efforts of the submitting laboratories, com-

plete background information (especially geographic and lin-

guistic information) is available for most population samples.

In those cases where well-defined geographic information

was not available, the latitude and longitude of a close locality

or the capital city of the country was used. Linguistic assign-

ments were based on information provided by each laborato-

ry when available (see below), and either Ruhlen’s classifica-

tion scheme for linguistic families (2) or the Ethnologue (3)

was consulted when no such information was available. There

were a few cases where the broad linguistic family could not

be specified with certainty. For example, ‘‘Kenyans’’ (i.e., the

Kenyan_142 sample) and ‘‘Ugandans’’ may include Afro-

Asiatic- and Nilo-Saharan-speakers, in addition to Niger-Con-

go speakers. These populations were classified as ‘‘Mainly Ni-

ger-Congo’’ based on the proportion of Niger-Congo speakers

in these nations when compared to Afro-Asiatic and Nilo-

Saharan languages (as shown in Table 4 (3)). In other cases,

only a broad linguistic characterization was possible (e.g.,

‘‘Indo-Pacific’’). A summary of these data is provided in Table

5, which describes the number of 13W populations that

correspond to linguistic families in each geographic region

Page 5: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 2A. Description of 12W datasets supplementing all analyses (nΩ48 populations)

Com-Region Population Name 12W labcode 12W. Country/Region Lat. Long. Linguistic family/Language plexity

Colombian01.SS-Africa (Afr_descent) 12trachtenberg * Colombia 4.34 ª74 Indo-European ± Italic ± Spanish 201.SS-Africa Amhara ITAGDS 155 Ethiopia/Arssi 7.58 39 Afro-Asiatic ± Semitic ± Amharic 201.SS-Africa Baganda USALOU 226 Uganda/Kampala 0 32.5 Niger-Congo ± Bantu ± Luganda 2

Niger-Congo ± Bantu ± Kikongo,01.SS-Africa Mukongo BELDPT 1 Zaire/Kinshasa ª4.2 15.2 Lingala, Tsheluba 202.N-Africa Egyptian 12ferencik * Egypt 30.1 31.1 Afro-Asiatic ± Semitic ± Arabic 3

Algeria/Algiers_02.N-Africa Algerian_100 FRAMER 55 area 36.5 3 Afro-Asiatic ± Semitic ± Arabic 302.N-Africa Bedouin EGYELC 171 Egypt/Siwa 31.2 27.2 Afro-Asiatic ± Semitic ± Arabic 203.Europe Italian 12ferrara * Italy 41.5 12.3 Indo-European ± Italic ± Italian 303.Europe North_Italian ITAFER 45 Italy/Bergamo 45.7 9.7 Indo-European ± Italic ± Italian 203.Europe Finn_143 FINTII 29 Finland/Oulu 65 25.3 Uralic-Yukaghir ± Finnic ± Finnish 3

Indo-European ± Slavic ±03.Europe Hvar_Island_Croatian CROKAS 20 Croatia/Hvar 43.1 16.3 Croatian 3

Indo-European ± Slavic ±03.Europe Krk_Island_Croatian CRORUD 19 Croatia/Krk 45 14.5 Croatian 303.Europe Polish FRADDC 156 Poland 52.3 16.5 Indo-European ± Slavic ± Polish 3

Greece/Northern_03.Europe Pomaki GRESTV 164 Xanthis 41.1 24.6 Indo-European ± Greek 2

France/Ile_de_03.Europe Provincial_French FRADDC 229 France 48 0 Indo-European ± Italic ± French 303.Europe Spanish_100 SPAARN 104 Spain 40.3 ª3.4 Indo-European ± Italic ± Spanish 303.Europe Spanish_133 SPALAR 82 Spain 40.3 ª3.4 Indo-European ± Italic ± Spanish 303.Europe Spanish_Basque SPABER 120 Spain 43 ª2 Basque 204.SW-Asia Sri_Lankan 12hashemi * India/Sri Lanka 6.7 79.9 Elamo-Dravidian ± Dravidian 2

Indo-European ± Germanic ±04.SW-Asia Zoroastrian 12hashemi * Canada 45.3 ª73 English 3

India/Uttar Pradesh, Indo-European ± Indo-Iranian ±04.SW-Asia North_Indian 12ferencik * Lucknow 27.5 82 Indic 204.SW-Asia Ashkenazi_Jews ISRBRA 116 Israel 31.5 35.1 Afro-Asiatic ± Semitic ± Hebrew 3

Pakistan/04.SW-Asia Hunza-Burushaski PAKQAS 214 North:Gilgit 36.2 74.4 Burushaski 204.SW-Asia Libyan_Jews ISRBRA 153 Israel 31.5 35.1 Afro-Asiatic ± Semitic ± Hebrew 304.SW-Asia Moroccan_Jews ISRBRA 117 Israel 31.5 35.1 Afro-Asiatic ± Semitic ± Hebrew 2

Indo-European ± Indo-Iranian ±04.SW-Asia Sindhi PAKQAS 213 Pakistan/Sindh 24.5 67 Indic ± Sindhi 2

China/Fujian,05.SE-Asia South_Han 12johnlee * Xiamen 24.5 118 Sino-Tibetan ± Chinese 305.SE-Asia Taiwanese 12johnlee * China/Fujian 23 120 Sino-Tibetan ± Chinese 305.SE-Asia Ami_14 JAPSEK 243 Taiwan/East 22.5 121 Austronesian ± Paiwanic 105.SE-Asia Paiwan_64 JAPSEK 241 Taiwan/South 23.5 121 Austronesian ± Paiwanic 1

Taiwan/South_05.SE-Asia Puyuma_15 JAPSEK 245 West 22 121 Austronesian ± Paiwanic 105.SE-Asia Thai-Chinese THACHI 85 Thailand/Centre 15 100 Sino-Tibetan ± Chinese 3

Altaic ± Korean-Japanese ±08.NE-Asia Japanese 12juji * Japan 35.3 140 Japanese 3

Altaic ± Korean-Japanese ±08.NE-Asia Japanese_Kobe 12araki 47 Japan/Kobe 34 135 Japanese 3

Mongolia/08.NE-Asia Halkh JAPTSU 184 Ulaanbaatar 47.5 107 Altaic ± Mongolian 208.NE-Asia Han JAPINK 137 China/North 43.4 87.4 Sino-Tibetan ± Sinitic ± Chinese 3

Mongolia/Uvs_08.NE-Asia Hoton JAPTSU 112 Aimag 48 92 Altaic ± Mongolian 208.NE-Asia Kazakh JAPINK 135 Russia/Kazakhstan 48 68 Altaic ± Turkic ± Kazakh 308.NE-Asia Korean JAPJUJ 240 China/Heilongjiang 46 127 Altaic ± Korean 308.NE-Asia Mongolian JAPJUJ 16 Mongolia 47.5 107 Altaic ± Mongolian 3

Russia/Tuvinian/08.NE-Asia Tuvinian RUSKON 236 several_Regions 51 95 Altaic ± Turkic ± Tuva 108.NE-Asia Uygur JAPINK 17 China 43.4 87.4 Altaic ± Turkic ± Uygur 2

Amerind ± North Amerind ±09.N-America Mixe USAKLI 209 Mexico/Oaxaca 17.1 ª97 Penutian ± Mixe 1

Amerind ± Central Amerind ±09.N-America Mixteca USAKLI 211 Mexico/Oaxaca 17.1 ª97 Oto-Manguean ± Mixtec 1

Amerind ± Central Amerind ±09.N-America Zapotec USAKLI 210 Mexico/Oaxaca 17.1 ª97 Oto-Manguean ± Zapotec 110.S-America Colombian 12trachtenberg * Colombia 4.34 ª74 Indo-European ± Italic ± Spanish 110.S-America Ecuadorian 12trachtenberg * Ecuador ª0.2 ª78 Indo-European ± Italic ± Spanish 1

Amerind ± Ge-Pano-Carib ±10.S-America Yukpa VENLAY 185 Venezuela/Zulia 10 ª73 Yupan 1* No 12W map number, data provided by corresponding laboratory.

5HLA 2004: Immunobiology of the Human MHC

Page 6: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 2B. Description of 12W datasets supplementing linguistics-related analyses (nΩ20 populations)

Region Population Name 12W. Country/Region Lat. Long. Linguistic Family/Language01.SS-Africa Merina 220 Madagascar/Central Highlands ª18 47.2 Austronesian ± Malagasy01.SS-Africa Oromo 154 Ethiopia/Arssi 7.58 39 Afro-Asiatic ± Cushitic ± Oromo

Niger-Congo ±Bantu ± Kikongo,01.SS-Africa Zairian 234 Zaire/several regions ª4.2 15.2 Lingala, Tsheluba02.N-Africa Egyptian_Copts 221 Egypt (in USA) 30 31 Afro-Asiatic ± Semitic ± Arabic02.N-Africa Egyptian_Delta 109 Egypt/Delta 31 31.3 Afro-Asiatic ± Semitic ± Arabic02.N-Africa Mzab 3 Algeria/South Sahara 32.2 3.4 Afro-Asiatic ± Berber ± Mzab03.Europe Belgian 158 Belgium/Namur & Luxembourg 50.3 4.52 Indo-European ± Italic ± French03.Europe Bulgarian 119 Bulgaria 42.4 23.2 Indo-European ± Slavic ± Bulgarian03.Europe French_North 61 France/Northern 50.4 3.05 Indo-European ± Italic ± French03.Europe Greek_Attiki 165 Greece/Attiki 38 25.3 Indo-European ± Greek03.Europe Italian_Pavia 42 Italy/Pavia 45.1 9.09 Indo-European ± Italic ± Italian03.Europe Portuguese_Coimbra 195 Portugal/Coimbra 40.1 ª8.3 Indo-European ± Italic ± Portuguese03.Europe Portuguese_South 65 Portugal/South 39 ª9 Indo-European ± Italic ± Portuguese03.Europe Sardinian 46 Italy/Sardinia 40 9 Indo-European ± Italic ± Sardinian03.Europe Swiss 196 Switzerland/Geneva 46.2 6.1 Indo-European ± Italic ± French04.SW-Asia Lebanese 145 Lebanon 33.5 35.3 Afro-Asiatic ± Semitic ± Arabic

Indo-European ± Indo-Iranian ± Indic04.SW-Asia Punjabi 38 India/Punjab 28.4 77.1 ± Punjabi06.Oceania Trobriand 111 Melanesia/Papua New Guinea/islands ª8.3 151 Austronesian ± Kilivila08.NE-Asia Manchu 15 China/Heilongjiang 45.2 126 Altaic ± Tungus ± Manchu10.S-America Kaingang 9 Brazil/South: Parana ª25 ª52 Amerind ± Ge-Pano-Carib ± Kaingang

(defined in section 2.II below and shown in Figure 1). In

addition, detailed descriptions of each 13W population (sum-

marizing history, sampling and genotyping methods, and

Figure 1. Boundaries for global regions.

6 HLA 2004: Immunobiology of the Human MHC

preliminary analyses) are included in the following chapter

(Chapter 3, Short Population Reports). The report for each

13W population is referenced by the 13W. in the table of

contents.

Page 7: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 3. Number of individuals typed at each locus in the 12W and 13W datasets (total of 163 populations)

13W. Region Population Name Labcode IHWC 12W . A C B DRB1 DQA1 DQB1 DPA1 DPB11 01.SS-Africa Dogon USAMFV 13th 138 129 138 1382 01.SS-Africa Kenyan_Lowlander (Luo) USAMFV 13th 265 265 265

Kenyan_Highlander3 01.SS-Africa (Nandi) USAMFV 13th 241 240 2404 01.SS-Africa Kenyan_142 CANLUO 13th 113 143 143 119 113 129 1235 01.SS-Africa Mandenka CHETIE 13th 12th 122 93 54 94 84 856 01.SS-Africa Rwandan USATNG 13th 280 2807 01.SS-Africa Shona USALOU 13th 225 226 226 229 229 229 2288 01.SS-Africa Ugandan USAMFV 13th 163 163 1619 01.SS-Africa Zambian USAMFV 13th 43 45 44

North American159 01.SS-Africa (Afr_descent) USAMFV 13th 255 252 25110 01.SS-Africa Zulu ZAFHAM 13th 199 98 201 88 89 87 8711 01.SS-Africa Amhara ITAGDS 12th 155 98 9812 01.SS-Africa Baganda USALOU 12th 226 2613 01.SS-Africa Merina FRADAN 12th 220 16014 01.SS-Africa Mukongo BELDPT 12th 1 30 2015 01.SS-Africa Oromo ITAGDS 12th 154 8216 01.SS-Africa Zairian FRAKPL 12th 234 10617 02.N-Africa Algerian_98 CHESAN 13th 12th 235 99 9918 02.N-Africa Chaouya ITAADO 13th 67 68 9919 02.N-Africa Metalsa ITAADO 13th 72 68 9920 02.N-Africa Moroccan_94 CHESAN 13th 12th 501 98 98 9821 02.N-Africa Moroccan_99 ESPARN 13th 95 9622 02.N-Africa Algerian_100 FRAMER 12th 55 10023 02.N-Africa Bedouin EGYELC 12th 171 101 9824 02.N-Africa Egyptian 12ferencik 12th 79 78 79 7925 02.N-Africa Egyptian_Copts USAYUN 12th 221 4026 02.N-Africa Egyptian_Delta EGYELC 12th 109 10327 02.N-Africa Libyan_Jews ISRBRA 12th 153 40 39 40 40 40 4028 02.N-Africa Moroccan_Jews ISRBRA 12th 117 40 39 40 40 40 4029 02.N-Africa Mzab FRATHO 12th 3 10730 03.Europe Bulgarian_Gipsy BRGNAU 13th 12 11 1131 03.Europe Croatian HRVKAS 13th 12th 21 150 150 13932 03.Europe Czech CZEIVS 13th 105 106 106 104 105 106 10233 03.Europe Finn_89 FINLOK 13th 90 90 90 90 35 3034 03.Europe Georgian CZEIVS 13th 105 107 10835 03.Europe Irish UKIMID 13th 1000 1000 1000 100036 03.Europe Slovenian SVNJER 13th 100 100 100 100 10037 03.Europe Belgian BELOSS 12th 158 3838 03.Europe Bulgarian BULNAU 12th 119 4239 03.Europe Finn_143 FINTII 12th 29 131 13240 03.Europe French_North FRADAN 12th 61 23441 03.Europe Greek_Attiki GRESTV 12th 165 96 9642 03.Europe Hvar_Island_Croatian CROKAS 12th 20 106 106 10443 03.Europe Italian 12ferrara 12th 26 97 97 97 5044 03.Europe Italian_Pavia ITAMRA 12th 42 9245 03.Europe Krk_Island_Croatian CRORUD 12th 19 104 104 10246 03.Europe North_Italian 12ferrara 12th 101 101 101 101 101 101 10147 03.Europe Polish FRADDC 12th 156 98 99 9948 03.Europe Pomaki GRESTV 12th 164 100 100 100 10049 03.Europe Portuguese_Coimbra PORLTC 12th 195 219 10850 03.Europe Portuguese_South PORCHS 12th 65 110 3951 03.Europe Provincial_French FRADDC 12th 229 224 224 24452 03.Europe Ashkenazi_Jews ISRBRA 12th 116 40 40 40 40 40 4053 03.Europe Sardinian ITACON 12th 46 80 8054 03.Europe Spanish_100 SPAARN 12th 104 100 100 10055 03.Europe Spanish_133 SPALAR 12th 82 57 125 126 6856 03.Europe Spanish_Basque SPABER 12th 120 165 15857 03.Europe Swiss SUIJEA 12th 196 88

North American134 03.Europe (Eur_descent) USAMFV 13th 297 292 28758 04.SW-Asia Druze ISRBRA 13th 100 100 10059 04.SW-Asia Israeli_Jew ISRGAZ 13th 117 94 10960 04.SW-Asia Kurdish CZEIVS 13th 30 29 2961 04.SW-Asia New_Dehli USAERL 13th 66 56 6662 04.SW-Asia Omani UKIMID 13th 121 12163 04.SW-Asia South_Indian USAERL 13th 88 104 10964 04.SW-Asia Tamil ZAFHAM 13th 50 48 4965 04.SW-Asia Turk TURSAR 13th 12th 89 245 24566 04.SW-Asia Hunza-Burushaski PAKQAS 12th 214 46 46 4667 04.SW-Asia Lebanese USABIA 12th 145 12668 04.SW-Asia North_Indian 12ferencik 12th 120 120 120 11869 04.SW-Asia Punjabi INDRAN 12th 38 5170 04.SW-Asia Sindhi PAKQAS 12th 213 39 39 3971 04.SW-Asia Sri_Lankan 12hashemi 12th 57

North American160 05.SE-Asia (Asi_descent) USAMFV 13th 411 401 39672 05.SE-Asia Ami TWNLIN 13th 98 98 98 98

7HLA 2004: Immunobiology of the Human MHC

Page 8: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 3. Continued

13W. Region Population Name Labcode IHWC 12W . A C B DRB1 DQA1 DQB1 DPA1 DPB173 05.SE-Asia Atayal TWNLIN 13th 106 106 106 10674 05.SE-Asia Bunun TWNLIN 13th 101 101 101 10175 05.SE-Asia Chinese USATRA 13th 282 281 28276 05.SE-Asia Hakka TWNLIN 13th 55 55 55 5577 05.SE-Asia Han_Chinese_149 UKIMID 13th 149 14978 05.SE-Asia Han_Chinese_572 UKIMID 13th 572 57279 05.SE-Asia Kinh VNHNAN 13th 102 10080 05.SE-Asia Malay USAERL 13th 124 107 101 54 55 5381 05.SE-Asia Minnan TWNLIN 13th 102 102 102 10282 05.SE-Asia Muong VNHNAN 13th 83 8384 05.SE-Asia Paiwan_51 TWNLIN 13th 51 51 51 5185 05.SE-Asia Pazeh TWNLIN 13th 55 55 55 5586 05.SE-Asia Puyuma_49 TWNLIN 13th 50 50 50 5087 05.SE-Asia Rukai TWNLIN 13th 50 50 50 5089 05.SE-Asia Saisiat TWNLIN 13th 51 51 51 51

Singapore_90 05.SE-Asia (Chinese_descent) USAERL 13th 86 8691 05.SE-Asia Siraya TWNLIN 13th 51 51 51 5192 05.SE-Asia Thai USAERL 13th 98 92 9993 05.SE-Asia Thao TWNLIN 13th 30 30 30 3094 05.SE-Asia Toroko TWNLIN 13th 55 55 55 5595 05.SE-Asia Tsou TWNLIN 13th 51 51 51 5196 05.SE-Asia Yami (Tao) TWNLIN 13th 50 50 50 5097 05.SE-Asia Ami_14 JAPSEK 12th 243 1598 05.SE-Asia Paiwan_64 JAPSEK 12th 241 6599 05.SE-Asia Puyuma_15 JAPSEK 12th 245 16

100 05.SE-Asia South_Han 12johnlee 12th 162 162 162 162 162101 05.SE-Asia Taiwanese 12johnlee 12th 1012 199 1011 1012102 05.SE-Asia Thai-Chinese THACHI 12th 85 42 74103 06.Oceania East_Timorese USAERL 13th 57 86 86 86 86 86104 06.Oceania Filipino USAERL 13th 94 94 94 94 94 94 94105 06.Oceania Indonesian USAERL 13th 50 50 49106 06.Oceania Ivatan TWNLIN 13th 50 50 50 50107 06.Oceania Moluccan USAERL 13th 25 40 46 46 46 46108 06.Oceania PNG_Highlander USAERL 13th 92 77 90 92 91 78 88109 06.Oceania PNG_Lowlander_48 USAERL 13th 48 48 48 48110 06.Oceania PNG_Lowlander_95 USAERL 13th 79 83 93111 06.Oceania Samoa USARWL 13th 50 50 50112 06.Oceania Trobriand GERNAG 12th 111 79113 07.Australia Australian_Cape_York USAGAO 13th 103 89 100 99 99 99 96

Australian_Groote_114 07.Australia Eylandt USAGAO 13th 75 73 75115 07.Australia Australian_Kimberley USAGAO 13th 36 28 38 41 41 41 38116 07.Australia Australian_Yuendumu USAGAO 13th 12th 6 191 192 193 190 11683 08.NE-Asia Okinawan USAPAK 13th 105 105 10488 08.NE-Asia Ryukuan JPNTKN 13th 142

117 08.NE-Asia Buriat JPNTKN 13th 140118 08.NE-Asia Korean KORPMH 13th 191 200 200 199119 08.NE-Asia Tuva USAERL 13th 189 174 180 189120 08.NE-Asia Halkh JAPTSU 12th 184 40 41 41 41121 08.NE-Asia Han JAPINK 12th 137 57 57 57 57122 08.NE-Asia Hoton JAPTSU 12th 112 85 84 84 85123 08.NE-Asia Japanese 12juji 12th 608 608 608124 08.NE-Asia Japanese_Kobe 12araki 12th 47 32 32 32 30 32 30125 08.NE-Asia Kazakh JAPINK 12th 135 39 39 39 39126 08.NE-Asia Korean JAPJUJ 12th 240 73 67 199127 08.NE-Asia Manchu JAPJUJ 12th 15 160128 08.NE-Asia Mongolian JAPJUJ 12th 16 61 34 203129 08.NE-Asia Tuvinian RUSKON 12th 236 191130 08.NE-Asia Uygur JAPINK 12th 17 66 66 66 65131 09.N-America Amerindian USAMFV 13th 257 248 235132 09.N-America Canoncito USAERL 13th 40 40 40 40133 09.N-America Lacandon MEXGOR 13th 162 162 162135 09.N-America Maya USAERL 13th 15 15 15 15137 09.N-America Mixe Hollenbach 12th 52 52 52 52 52 52 52138 09.N-America Mixteco Hollenbach 12th 52 51 52 52 52 52 52139 09.N-America Pima_17 USAERL 13th 86 89 97 95140 09.N-America Pima_99 USAERL 13th141 09.N-America Seri MEXGOR 13th 33 33 33 33 33 25142 09.N-America Sioux USAERL 13th 96 96 96 83143 09.N-America Yupik USALEF 13th 252 149 252 252 58 252144 09.N-America Zapotec Hollenbach 12th 72 76 71 74 76 76 72145 09.N-America Zuni USAERL 13th 50 50 50 50146 10.S-America Bari VENLAY 13th 92 86 82147 10.S-America Brazilian UKIMID 13th 97 95148 10.S-America Guarani-Kaiowa BRAPTZ 13th 144 144 144 144 144149 10.S-America Guarani-Nandeva BRAPTZ 13th 53 53 53 53 53150 10.S-America Ticuna USAERL 13th 49 49 49 49151 10.S-America Central American USAERL 13th 55 55 55 55

8 HLA 2004: Immunobiology of the Human MHC

Page 9: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 3. Continued

13W. Region Population Name Labcode IHWC 12W . A C B DRB1 DQA1 DQB1 DPA1 DPB112trach-

152 10.S-America Colombian tenberg 12th 217 217 217 21712trach-

153 10.S-America Ecuadorian tenberg 12th 99 99 99 99154 10.S-America Kaingang BRAPTZ 12th 9 28 87 101155 10.S-America Yukpa VENLAY 12th 185 73 73 70

Brazilian_(Afr-Eur_156 11.Other descent) BRADON 13th 99 106 69 100136 11.Other Mexican MEXGOR 13th 62 29 40 204 204 204

Cuban_157 11.Other (Afr-Eur_descent) UKIMID 13th 42 42158 11.Other Cuban_(Eur_descent) UKIMID 13th 70 70

North American161 11.Other (His_descent) USAMFV 13th 247 246 240

12trach-162 11.Other Colombian (Afr_descent) tenberg 12th 70 70 70 70163 11.Other Zoroastrian 12hashemi 12th 54 54

As shown in Table 5, Southeast Asia is the best-represented

region (26 population samples) and is largely represented by

Aboriginal populations from Taiwan. By contrast, very few

populations have been typed in Northeast Asia (only 3 popu-

lation samples).

As a consequence of this geographic distribution, the

Austronesian and Amerindian linguistic families are the best

represented (20 and 14 population samples, respectively),

followed by Indo-European languages (16 populations). The

Indo-European family is the most widespread linguistic group

in the 13W dataset, with speakers in the Europe, South-West

Asia, North America and Other regions. Some linguistic

groups are absent, as is the case for Afroasiatic and Khoisan

in sub-Saharan Africa.

In summary, the 13W dataset represents a considerable con-

tribution to the pool of populations tested for HLA polymor-

phism at class I and class II loci using high-resolution meth-

ods. A large number of populations (95) have been tested,

approximately 42% of which are from Southeast Asia, and

North/Central America. Moreover, contextual and historical

background information regarding each population sample

accompanies each dataset in the form of a short report. This

information is crucial for the proper interpretation of the re-

sulting genetic analyses.

At the same time, it must be noted that these 13W popula-

tion samples have not all been typed for the same HLA loci,

although similar numbers of samples have been tested for

some class I and class II loci (as shown in Table 6). Even

when the 68 12W datasets are included, analyses are limited

to some 50–90 populations per locus (e.g., Chapter C.7). Fu-

ture studies should encourage additional molecular-level typ-

ing of the same samples in these populations, with the goal of

obtaining genotypes at all HLA loci in all sampled individuals.

Finally, the population sample sizes are too low in many cases

9HLA 2004: Immunobiology of the Human MHC

(fewer than 40 individuals in some populations) to permit

multi-locus analyses. As the number of distinct alleles increases

with each report of the WHO Committee for Nomenclature for

Factors of the HLA System, the problems presented by low sam-

ple sizes are compounded. As shown in Table 6, the average

sample size at the class I and DRB1 loci is greater than 100 indi-

viduals, but this number is considerably lower than the number

of currently distinguishable alleles. Because large numbers of

HLA alleles are common in human populations, the frequen-

cies of most alleles are low (∞5%), and there is a good chance

that low frequency alleles will not be detected when sample siz-

es are small. In addition, statistical tests performed on poorly

sampled populations have a low power and are not reliable. This

is true at the single-locus level, and may be even more dramatic

at the multi-locus level, where the estimation of haplotype fre-

quencies and tests of linkage disequilibrium depend on accu-

rate sampling of allelic diversity. Finally, the method for reduc-

ing genotype ambiguity (described in section 2.I, below) is de-

pendent on the observation of alleles in a number of different

genotypes, the likelihood of which is proportional to the size

of the population sample. In general, an effort must be made to

sample at least 100 individuals per population (e.g., see

Sanchez-Mazas 2002 (5)).

Section 2. Pre-analytical dataset processing

Subsequent to submission, each 13W dataset was prepared and

formatted for analysis in a multi-step process. First, all ambigu-

Table 4. Linguistic representation in Kenya and Uganda1

Nation Niger-Congo Nilo-Saharan Afro-AsiaticKenya 17.9 million 7.5 million 715,000Uganda 11.4 million 5.3 million 01 These values are taken from reference (3).

Page 10: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 5. Geographic and linguistic distribution of 13W datasets

Linguistic groupsGeographicRegions AA NC NS IE UY SC ED ST AU TK AN IP AB AL AM EA ND Total01.SS-Africa 8 2 1 1102.N-Africa 5 503.Europe 5 1 1 704.SW-Asia 3 2 2 1 805.SE-Asia 1 6 2 1 14 2 2606.Oceania 6 3 907.Australia 4 408.NE-Asia 3 309.N-America 4 8 1 1 1410.S-America 1 5 611.Other 2 2Total 8 8 2 16 1 1 2 6 2 1 20 3 4 6 4 1 1 95AA: Afro-Asiatic, NC: Niger-Congo, NS: Nilo-Saharan, IE: Indo-European, UY: Uralic-Yukaghir, SC: South-Caucasian, ED: Elamo-Dravidian, ST: Sino-Tibetan, AU:Austroasiatic, TK: Tai-Kadai, AN: Austronesian, IP: Indo-Pacific, AB: Australian, AL: Altaic, AM: Amerindian, EA: Eskimo-Aleut, ND: Na-Dene.

Table 6. 13W Population datasets genotyped at each locus

HLA locusA C B DRB1 DQA1 DQB1 DPA1 DPB1

Number of populations tested 78 59 73 59 21 33 8 21Average sample size 130.6 129.3 131.7 114.8 88.4 103.7 76.1 72.7Standard deviation 136.3 141.8 139.5 132.1 55.1 68.1 34.3 30.4

ities associated with a given genotype were reduced to pairs of

single alleles. Second, genotype data for each population sam-

ple was merged with basic demographic and typing informa-

tion in a standardized format. Third, the existence of each allele

at each locus was verified by comparison to the March 2002

allele list approved by the WHO Committee for Nomenclature

for Factors of the HLA System, and alleles were truncated to

peptide level (4-character) designations. Fourth, ‘‘binning’’

rules, reassigning alleles that were identified using high resolu-

tion typing methods in a subset of populations as variants de-

tectable with lower-resolution methods, were applied to reduce

these alleles to common denominator categories. Each of these

Table 7. Extent of ambiguous alleles and ambiguous genotypes in the 13W dataset

Average percent ofNumber of Number of datasets Percent of datasets with ambiguity among datasets

Type of Ambiguity HLA Locus population datasets with any ambiguity any ambiguity with any ambiguityAllelic A 46 35 0.76087 0.6194

C 39 30 0.76923 0.49542B 46 36 0.78261 0.30536DRB1 29 4 0.13793 0.06012DQA1 15 4 0.26667 0.31798DQB1 21 5 0.2381 0.10201DPA1 2 1 0.5 0.33721DPB1 12 0 0 –

Genotypic A 46 30 0.65217 0.40851C 39 30 0.76923 0.6293B 46 30 0.65217 0.30565DRB1 29 1 0.03448 0.13061DQA1 15 1 0.06667 0.13061DQB1 21 1 0.04762 0.13061DPA1 2 0 0 –DPB1 12 0 0 –

10 HLA 2004: Immunobiology of the Human MHC

processes is described in detail in the following sections (I.

Ambiguity Reduction, II. Datafile Formatting, III. Data Filter-

ing, and IV. Binning). The overall extent of changes made to

datasets as a result of these processes is summarized in section

3, below.

I. Ambiguity Reduction

Much of the 13W genotype data was not resolved to the allelic

level (i.e., only two alleles per genotype) when submitted for

analysis and contained ambiguous alleles and/or ambiguous

genotypes (described below). The extent of allelic and geno-

typic ambiguity in the overall dataset is detailed in Table 7.

Page 11: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Because most of the class I data was submitted for analysis in

the form of probe reactivities, and because many of the class

II datasets were submitted as Available Data, with many ambi-

guities reduced to individual allele calls prior to submission,

both allele and genotype ambiguities are more extensive at

the class I loci, with ambiguities in 60–80% of population

datasets, than at the class II loci, with ambiguities in 5–20%

of population datasets. It was necessary to resolve these ambi-

guities to the allelic level before analysis could begin.

Ambiguous alleles are those that cannot be distinguished

because the typing method cannot assess all pertinent poly-

morphisms. For example, ‘A*020101, 0209, 0230, 0231’ is

an ambiguous allele set used to represent an ambiguous allele,

the actual identity of which could be any one of the four

constituent alleles. An unambiguously assigned allele is

characterized by a single, DNA-level designation (e.g.,

A*020101). Because ambiguous alleles result from the limita-

tion of a given typing system, population samples typed using

multiple systems may have similar but different ambiguous

allele sets (e.g., one system results in an ‘A*020101, 0209,

0230, 0231’ allele set, while a second system results in an

‘A*020101, 020102, 020104, 0230’ allele set).

Ambiguous genotypes cannot be distinguished due to an

inability to establish the phase of the assessed polymorphisms

in a given probe reactivity pattern. For example, ‘A*0101/

*02011 or *0101/*0236 or *0106/*02011 or *0106/

*0236’ is used to denote an ambiguous genotype set, the

actual identity of which could be any one of the four constit-

uent genotypes. Two sets of ambiguously assigned alleles are

shown in this case; the alleles in the A*01 serogroup consti-

tute one ambiguously assigned allele set in this case, and the

alleles in the A*02 serogroup constitute a second such set.

An unambiguously assigned genotype is characterized by a

single possible genotype for a given sample (e.g., A*0101/

*0236). Because ambiguous genotypes are the result of par-

ticular combinations of allele-specific probe reactivity pat-

terns, some alleles will only appear in a population as part of

an ambiguous genotype set, while other alleles will appear in

both ambiguous and unambiguous genotypes. In some cases,

one allele may be unambiguously assigned in an ambiguous

genotype. For example, in the ambiguous ‘A*0101/*02011

or *0101/*0236’ genotype, the A*0101 allele has been un-

ambiguously assigned. The identity of the alleles in the A*02

allele set is obscured by the inability to set phase.

Both types of ambiguity can be observed for a given sam-

ple. For example, ‘A* 0101/*02011, 0209, 0230, 0231 or

*0101/*0236 or *0106/*02011, 0209, 0230, 0231 or

*0106/*0236’ represents an ambiguous genotype with four

11HLA 2004: Immunobiology of the Human MHC

possible constituent genotypes, two of which contain an am-

biguous allele with four possible constituent alleles.

The method used to reduce ambiguities to the allele level

attempts to resolve ambiguous genotypes separately from am-

biguous alleles (stages 1 and 2 below), and assumes that the

sampled individuals are part of a single population with rela-

tively little admixture, and that they were typed with a single

typing system. In general, it is assumed that these populations

will have low numbers of alleles within a particular serogro-

up, and that alleles with the same pattern of polymorphic

sequence motifs share the same DNA sequence when ob-

served in the same population.

Stage 1. Reduction of Ambiguous Genotypes

This method proceeds in four steps as outlined here:

Step 1. Eliminate genotypes with alleles never seen in unam-

biguous genotypes.

Step 2. Reduce ambiguously assigned allele sets to common

denominators.

Step 3. Rely on Hardy-Weinberg proportions to establish ho-

mozygotes.

Step 4. Consider all remaining ambiguous allele sets as am-

biguous alleles.

Detailed description:

Step 1. Compile a list of all alleles (both ambiguous and unam-

biguous alleles) observed in unambiguous genotypes as well

as the unambiguously assigned alleles in ambiguous geno-

types. These are the ‘‘observed alleles’’. In each ambiguous

genotype set, eliminate those genotypes lacking observed al-

leles, reducing each set to those genotypes with observed al-

leles. If there are no genotypes with two observed alleles in

a given set, keep all genotypes with one observed allele, and

eliminate those with no observed alleles. If all of the geno-

types in a set lack observed alleles, do nothing to that set.

For example, a hypothetical population with only two sam-

ples presents an unambiguous‘A*02011/*3303’ genotype and

an ambiguous ‘A*0101/*02011 or *0101/*0236’ genotype.

In this case, the A*0101, *02011, and *3303 alleles have been

unambiguously assigned, and the A*0236 allele is eliminated

as it is never seen in an unambiguously assigned genotype.

This step should be repeated until there is no change to

the ambiguous genotype sets, but usually only requires one

iteration.

This step assumes that a population will have a small

number of alleles in a given serogroup. When multiple alleles

from a given serogroup do exist in a population, it assumes

Page 12: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

that distinct patterns of ambiguity will result when these allel-

es are in a genotype with a given allele. This assumption may

not be true for admixed populations.

Step 2. Comparing the ambiguous allele sets with at least one

allele in common, eliminate those genotypes (involved in this

comparison) containing alleles not found in all such ambigu-

ous allele sets.

For example, in three ambiguous genotypes, ‘A*0101/

*02011 or *0101/*0209’, ‘A*2402/*02011 or *2402/

*0236’, and ‘A*3303/*02011 or *3303/*0231’, the

A*02011 allele is found in all A*02 sets. The genotypes con-

taining the other A*02 alleles are eliminated. Note that an

ambiguous ‘A*0101/*02012 or *0101/*0235’ genotype

would not influence this decision, as none of the A*02 alleles

in this ambiguous allele set overlap with the other A*02 set.

As in Step 1, this step assumes that the number of alleles

in a given serogroup in a population will be low, and that

distinct alleles will present distinct patterns of ambiguity

when present in a genotype with a given allele.

Step 3. Where a genotype may be either heterozygous or ho-

mozygous, assign homozygote and heterozygote status based

on Hardy-Weinberg expectations.

For example, the genotype ‘A*2402/*2402 or *2402/

*2403’ could be either a homozygous ‘*2402/*2402’ geno-

type or a heterozygous ‘*2402/*2403’ genotype. Consider

that two such ambiguous genotypes are observed in a popula-

tion of 100 individuals, with the *2402 allele observed in 25

other genotypes, and the *2403 allele observed in two other

genotypes. In this case the number of homozygous ‘*2402/

*2402’ genotypes expected under Hardy-Weinberg equilibri-

um is calculated with the assumption that both ambiguous

genotypes are homozygous (so that all four alleles in these

genotypes are *2402 alleles), and the number of heterozy-

gous ‘*2402/*2403’ genotypes expected under Hardy-Wein-

berg equilibrium is calculated with the assumption that both

genotypes are heterozygous (so that two alleles are *2402

alleles and two are *2403 alleles). Under these circumstances,

either 2.1 homozygous or 0.54 heterozygous genotypes are

expected, and the two ambiguous genotypes are re-assigned

as homozygous for the *2402 allele.

Alternatively, if the number of *2402 alleles observed in

other genotypes were lower than the number of *2403 alleles

observed in other genotypes (e.g., 13 versus 25), primarily

heterozygous genotypes might be expected (e.g., 0.72 ex-

pected homozygotes versus 2.0 expected heterozygotes) and

the two genotypes would be re-assigned as heterozygotes.

12 HLA 2004: Immunobiology of the Human MHC

This step assumes that the genotype proportions of the

population in question are in Hardy-Weinberg equilibrium.

Step 4. For a given ambiguous genotype set, lump all alleles

of a given serogroup to form an ambiguous allele. For exam-

ple, the ambiguous ‘A*0101/*02011 or *0101/*0236’ ge-

notype would be changed to an unambiguous ‘A*0101/

*02011, 0236’ allele.

It should be noted that this process has the potential to bias

the dataset in favor of high-frequency alleles, resulting in an

underestimation of the allele diversity and the exclusion of

low-frequency alleles at that locus. Because this method as-

sumes a correspondence between serological specificity and

the first two digits of the allele name, it cannot easily be

used as described for the DPB1 locus. Overall, this method

has been most effective on populations assumed to be

relatively free of admixture (i.e., population complexity

values of 1 and 2, as described in section II, below). SBT

has confirmed that the ambiguity reduction method was

correct of select samples, as summarized in Table 8. Overall,

this method assumes that genotype diversity in a population

will result in sufficient unambiguous assignment of alleles

to permit the reduction of ambiguity in the other allele

assignments. Ultimately, the adequacy of this assumption

rests on the size of the population sample that is geno-

typed. Irrespective of the genotyping method used, the

larger the population sample genotyped, the greater the

chance that genotype combinations will result in unambig-

uous allele assignment, so that the success of SSOP genotyp-

ing of populations using this ambiguity reduction approach

becomes a function of sample size.

Stage 2. Reduction of Ambiguous Alleles

The IHWG Biostatistics core has compiled a database of HLA

allele frequency distributions published for populations from

around the world (4). The populations in this database have

been divided into seven regions (Africa, Europe, Middle East,

Asia, Siberia, South Pacific Islands, and the Americas) for the

purpose of investigating geographic structure. This database

was used for the reduction of ambiguous alleles in a given

population, by identifying those constituent alleles in the

corresponding global region. This stage of the method as-

sumes that the correlation between geography and genetic

distance observed for many populations at other loci extends

to MHC loci, and proceeds in three steps.

Step 1. Eliminate those constituent alleles that are not observed

in the corresponding region.

Page 13: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 8. SBT confirmation of the reduction of ambiguous genotypes

Submitted Ambiguous Genotypes Reduced Genotypes SBT GenotypesPopulation Sample ID Allele 1 Allele 2 Allele 1 Allele 2 Allele 1 Allele 2Paiwan_51 PW34 3501, 3507, 3511, 3523 40011, 40012 3501 4001 3501 40011, 40012

3520 4007

Thao TH05 1525 39011, 39013, 3905 1525 3901 1525 39011, 390131521 39021, 39022

Siraya SL23 3501, 3507, 3511, 3523 4002 3501 4002 3501 40023502, 3504, 35091, 35092 40033515 4005

Hakka HA07 15011, 1526N, 1533 4601 1512 4601 1512, 1519 46011512, 1519 46011532, 1535 4601

HE08 1521 39011, 39013, 3905 1521 3901 1521 39011, 390131521 3910

Step 2. Of the remaining constituent alleles, keep the allele that

has the highest frequency in that region, and eliminate the

rest. When multiple candidate alleles have similar frequencies,

keep the allele at the highest frequency in a population geo-

graphically closest to the population being analyzed.

Step 3. If none of the constituent alleles are identified in the

database, reduce the ambiguous allele to the lowest numbered

constituent allele.

It should be noted that the utility of this database is propor-

tional to the number of populations typed at each locus, and

that older published datasets will likely contain no data on

recently identified alleles. As a result, novel and rare alleles

are less likely to be detected, and a bias in favor of more

widespread alleles will be introduced. However, this bias will

be consistent between populations. In addition, genetic dis-

tances between populations within regions will be under-esti-

mated.

II. Datafile Format

Each population dataset consisted of a ‘header block’ that

described the six-character IHWG labcode for the submit-

ting laboratory, the typing method used, the ethnicity of

the sampled population, the population’s region of origin,

the site at which the population sample were collected, the

latitude and longitude of the collection site, and the com-

plexity of the population (described below), as well as a

‘data block’ that included the unique population name,

sample ID, and genotype data for each sampled individual.

These genotype data were organized by locus and presented

in map order (HLA-A, C, B, DRA, DRB1, DQA1, DQB1,

DPA1, and DPB1). The 95 13W population datasets are

included in Appendix C, Table 1.

13HLA 2004: Immunobiology of the Human MHC

Header block fields (labcode, method, ethnic, contin,

collect, latit, longit, and complex)

Typing methods (method field): The typing protocols used

fell into five categories; (1) PCR-Single Stranded Oligo Probe

(SSO, SSOP) methods (11W, 12W, IHWG and local SSOP sys-

tems), (2) Reverse hybridization format PCR-SSOP methods

(IHWG Reverse Line Strip (RLS) and Innolipa PCR-SSO sys-

tems), (3) Sequence Specific Primer (SSP) methods (12W

ARMS, Genovision SSP, and Dynal SSP systems), (4) Sequence

Based Typing (SBT) methods (IHWG SBT and local SBT sys-

tems), and (5) PCR-single strand conformation polymor-

phism (SSCP) methods.

Ethnicity (ethnic field): A table of 268 linguistically and

culturally defined ethnic codes and 10 admixture codes was

provided for data submitting labs (see appendix C, Table 3).

In instances where an ethnic or admixture code was not avail-

able on this table, a new code was assigned for the new ethnic

identifier. In as many cases as is possible, the ethnic identity

of each population sample is distinct from the unique popula-

tion name and regional identification (see below).

Regional categories (contin field): Each population sample

was assigned to one of eleven regional categories (Sub-Sahar-

an Africa, North Africa, Europe, South-West Asia, Oceania,

Australia, North-East Asia, North America, South America and

Other), based on the geographic region of origin and the

estimated degree of admixture of the sampled population.

For non-indigenous populations, regional assignments were

made based on the historical locale of ancestors of those

populations 1000 years ago. Admixed populations were as-

signed to the Other category when members of these popula-

tions were estimated to be descended from parent popula-

tions from different regional categories. Using these criteria,

Page 14: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

populations of predominantly Sub-Saharan African descent

living outside of Africa were assigned to the Sub-Saharan Afri-

ca region, and populations of predominantly European de-

scent living outside of Europe were assigned to the Europe

region, while populations of both Sub-Saharan African and

European descent were assigned to the Other category. A map

defining the boundaries of these regions is shown in Figure

1.

Latitude and Longitude (latit and longit fields): Latitude and

longitudes were recorded in a decimal format, with minutes

and seconds indicated as fractions of each degree value. North

latitudes and east longitudes were recorded as positive values,

while south latitudes and west longitudes were recorded as

negative values. For example 35 degrees 20 minutes south

latitude would be recorded as latit٪35.33, and 2 degrees

30 minutes east longitude would be recorded as longitΩ2.5.

Complexity (complex field): Each population sample was

assigned to one of three complexity categories (ranging in

value from 1 to 3), in an attempt to estimate the degree

of potential sub-structure and admixture in each population

sample. A population sample collected from a single settle-

ment or group of closely related settlements was assigned a

complexity of 1. A population sample collected from a group

of disparate but discrete settlements, or across a large region

of territory was assigned a complexity of 2. A population

sample collected in a metropolitan area, across an entire na-

tion, or from an extremely admixed population was assigned

a complexity of 3. Such assignments were made conserva-

tively, with the higher value assigned in equivocal cases. Giv-

en these designations, the ambiguity reduction process (sec-

tion I, above) will function best on populations with low

complexity values.

Data block fields (populat, id, and locus names):

Population name (populat field): When possible, the popula-

tion name supplied by the submitting laboratory was used.

When multiple populations samples were submitted with

identical names, a unique population name was created by

appending the sample size (n) to the end of the population

name. For example, two population samples named ‘popula-

tion’ with samples sizes of 140 and 200 would be noted as

‘population_140’ and ‘population_200’.

Sample Identifier (id field): Each sampled individual in a

given population sample was assigned a unique identifier.

These identifiers have been coded to protect the confidential-

ity of the individual, in accordance with the IHWG protocol

for the use of human subjects in research. All samples have

been obtained in accordance with applicable laws and regula-

14 HLA 2004: Immunobiology of the Human MHC

tions at the submitter’s institution, including any required re-

garding informed consent for prospective research use.

III. Data Filtering

In the next pre-analytical step of data processing, each allele

name was inspected to ensure that it conformed to standard

nomenclature formats. This data ‘‘filtration’’ step took the

form of both data truncation and the reclassification of sero-

logically designated alleles. Each of these processes is de-

scribed below.

A. Data truncation

Because of the variety of genotyping methods used to gener-

ate the 13W datasets, alleles were reported at varying levels

of specificity (e.g., A*24020101, *240201, *2402). Because

these differences reflect synonymous nucleotide changes in

most instances, allele names were truncated to a common,

peptide level (4-character) allele name (e.g., *24020101 was

changed to *2402), and the existence of these truncated allel-

es was verified using the allele set in the IMGT/HLA database

(approved by the WHO committee for nomenclature for fac-

tors of the HLA system as of March 2002) as follows:

i. If a common 4-character substring was found between the

truncated allele and an allele (or alleles) in the IMGT/HLA

database (e.g., if *2402 was the reduced allele in question,

this would match *24020101, *24020102L, *240202,

*240203, and *240204, and ‘2402’ for the common substr-

ing), then this truncated allele was used in the data analysis.

ii. If no substring match was found between the truncated

allele and alleles in the IMGT/HLA database, analysis was halt-

ed for data review.

B. Serological reclassification

IHWG datasets for which more than 10% of the alleles typed

at serological to intermediate levels of resolution were exclud-

ed from the 13W dataset. In cases where serological-level al-

lele designations were provided for less than 10% of the allel-

es at a given locus in a dataset (e.g, DQA1*03), those serolog-

ical designations were coded as a 4-character allele name in

the format XX00, where XX represents the serological de-

signation for that allele (e.g., *03 is coded as *0300). These

coded alleles were then assigned the name of an allele in the

IMGT/HLA database using the following rules:

i. If other alleles with the same serological designation were

observed in the population, the name of the coded allele was

Page 15: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

Table 9. Binning reassignment of high-resolution HLA alleles

Locus High resolution allele Reassigned allele nameA 2409N 2402

C 0706 0701

B 0706 0705

DRB1 1443 14051506 1501

DQA1 0104 01010302 03010303 03010502 0501

DQB1 0202 02010309 03010609 06050611 0602

DPB1 2301 04013901 04014801 02014901 04025101 04026201 40017601 1401

Table 10. Fraction of HLA allele assignments that remained unchangedduring pre-analytical data processing of 47 population datasets

Number of Total number of Percent of alleleunchanged allele allele assignments assignments

Locus assignments (2n) unchangedA 4608 9344 49.3B 5802 9328 62.2C 3981 8334 47.8DRB1 6353 8728 72.8DQA1 1456 1825 79.8DQB1 2980 3755 79.4DPA1 287 410 70.0DPB1 856 876 97.7All Loci 26323 42600 61.8This table reflects modifications made to individual population datasetsincluding the American_Samoa, Ami_97, Arab_Druze, Atayal, Bari,Brazilian(Af_Eu), Bulgarian, Bunun, Canoncito, Central American, Chaouya,Chinese, Croatian, Filipino, Finn_90, Georgian, Guarani-Kaiowa, Guarani-Nandeva, Hakka, Irish, Israeli_Jews, Ivatan, Kinh, Korean_200, Maya, Metalsa,Minnan, Moroccan_99, Muong, Paiwan_51, Pazeh, Pima_17, Puyuma_49, Rukai,Rwandan, Saisiat, Siraya, Tamil, Thao, Ticuna, Toroko, Tsou, Turk, Yami, Yupik,Zulu, and Zuni.

changed to the allele that had the highest frequency in that

population at that locus (e.g. if the coded allele was 0300,

and *0301 was observed in that population with a frequency

of 0.2 while *0302 was observed with a frequency of 0.05,

the name of the coded allele was changed to *0301).

ii. If no alleles with the same serological designation were

observed at that locus in that population, then the coded allele

was re-named to correspond to the lowest-numbered allele

in the IMGT/HLA database with the same serological designa-

tion as the coded allele (e.g., if no other 03 alleles were ob-

served then all *0300 alleles were renamed as *0301).

iii. If neither of the previous steps resulted in a name change

to a coded allele, analysis was halted for data review.

15HLA 2004: Immunobiology of the Human MHC

Overall, this data ‘‘filtering’’ step results in a reduction of

the number of alleles (k) when alleles that are identical at the

peptide level, but which differ at the nucleotide level, are

reported in the same population. In addition, it is possible

that k was reduced in datasets generated with multiple typing

systems. In these instances, alleles typed at different levels of

resolution (e.g., A*2402 versus A*240202) that might repre-

sent distinct nucleotide-level variants were treated as identical.

It should be noted that all analytical results and inferences are

valid only for peptide-level allele variation.

IV. Binning

In the final step of pre-analytical data processing, alleles that

were only detectable in a subset of samples (due to the use

of higher resolution genotyping methods for those samples)

were reassigned to a level of resolution equivalent to that

which could be detected using lower resolution genotyping

methods. For example, HLA-B*0706 alleles were reassigned

as B*0705 alleles. This process of reassignment is described

as ‘‘binning’’. These binning reassignments were made in or-

der to facilitate useful comparisons across datasets that were

genotyped using different methods, and are not reflected in

the datasets available in Appendix C. Table 9 identifies the

alleles that were binned (High resolution allele), and the allel-

ic category to which they were reassigned (Reassigned allele

name).

Section 3. Overall modifications made to datasets

The extent of the modifications made to allele assignments

before analysis is described in Table 10, which summarizes

the fraction of allele assignments that were unchanged

through of the various steps of data processing in 47 datasets

for which raw data (i.e., including allele and genotype ambi-

guity) were available. The data in the remaining 48 datasets

was submitted as Available Data and required significantly less

modification. Overall, 60% of the submitted allele assign-

ments in these 47 datasets were analyzed as they were submit-

ted. For the purpose of this summary, reassignment includes

the reduction of ambiguous allele sets to individual alleles;

the truncation of nucleotide-level allele names to peptide-

level allele names; the reassignment of serological allele de-

signations to peptide-level allele names; the reclassification of

improperly formatted allele names; and the binning of alleles

genotyped at varying levels of resolution. Unambiguously as-

signed alleles that remained unchanged, but that were sub-

mitted in ambiguous genotype sets were counted as fractions

of alleles in proportion to the number of genotypes in that

Page 16: S. J. Mack 13th International Histocompatibility A ...rsingle/temp2/13thPDFs/332_k.pdf · S. J. Mack 13th International Histocompatibility A. Sanchez-Mazas Workshop ... in the linguistics-related

Mack et al ¡ C2. Methods used in the generation and preparation of data for analysis in the 13th International Histocompatibility Workshop

set. As expected given the greater extent of allele and geno-

type ambiguity observed in class I datasets, fewer (50-60%)

class I allele assignments remained unchanged in comparison

References

1. 2. 4.Bodmer J, Cambon-Thomsen A, Hors J, Pi- Ruhlen M. A Guide to the World’s Languag- Literature database for the 13th IHWC.azza A, Sanchez-Mazas A. Report of the es, volume 1. Stanford University Press, http://allele5.biol.berkeley.edu/13ihwg/Anthropology Component. In, D Charron Stanford, California, 1987. lit_data.html(ed.) HLA: Proceedings of the Twelfth In- 3. Grimes BF, Grimes JE (eds.) Ethnologue: 5. Sanchez-Mazas A. HLA data analysis international Histocompatibility Workshop Languages of the World, 14th Edition. SIL anthropology: basic theory and practice.and Conference: Genetic Diversity of HLA: Publications, 2002. (http://www.ethnolo- Teaching session 5: Biostatistics, 16th Euro-Functional and medical Implication, Volume gue.com). pean Histocompatibility Conference, Stras-I, EDK, 1997, 269– 74. bourg, 19–22 March 2002, p. 68–83 (avail-

able at http://anthro.unige.ch/∂sanchez/pdf_files/).

16 HLA 2004: Immunobiology of the Human MHC

to class II allele assignments (50-60% versus 80% respec-

tively).