SOAP-HLA is a flow of sequencing data analysis pipeline to type all of the HLA genes in IMGT/HLA database using capture sequenced data or WGS data with high accuracy. The pipeline takes the aligned BAM file as input and outputs the most reliable HLA type for each gene.

The alogrithms and software in this package were developed by BGI (Hongzhi Cao, Tao Zhang).

System requirements

  1. Hardware:
    1. 64-bit x86-64 Intel CPU with SSE instructions
    2. The program needs ~500 MB main memory to run with data from Homo sapiens
  2. Software:
    1. 64-bit Linux System
    2. The version of gcc compiler is at least 4.2.4
    3. The version of perl is at least 5.8.5


    Download SOAP-HLA from the link below.

    Release 1.0, 20-02-2013

    download(MD5: 52bd108ad0033d6122ee86dda5e2dec6)


  • Download SOAP-HLA.tar.gz to your local directory.
  • $ gzip -d SOAP-HLA.tar.gz
    $ tar -zxf SOAP-HLA.tar
    $ cd SOAP-HLA
    After, you can find the structure of the program directory as:
    	|-- Data_imgt_txt
    	|-- MHC.hg18.database
    	|-- MHC.hg19.database
    	|-- blastall
    	|-- formatdb
    	|-- samtools
  • HLA database Preparation
  • Besides downloading HLA typing reads from IMGT, you should prepare your own HLA database with following steps.
    Perl Data_imgt_txt Date_type_hg19 hg19
    Perl Data_imgt_txt Date_type_hg18 hg18
  • How to Run
  • After getting the HLA database, you can run SOAP-HLA with following steps.
      perl -i *bam -od outdir -v version
    1. The bam file was alignment by bwa
    2. You can choose hg19 or hg18 for HLA typing
  • Result file format
  •     The final result were listed in *.type.

        The description of each column:
        1. the final type we typed.
        2. the closest type to the final type.
        3. the score of the final type.
        4. the novel mutation of the final type.
To Top


We tested HLA typing pipeline on simulated data. Based on the eight-haplotype sequence that construct by Sanger's MHC haplotype project in 2008 with known HLA typing result, we simulate a diploid sample's MHC region's sequence data and do the HLA typing. The comparison result were summarized as:

The horizontal axis represents the depth while vertical axis repesents the accuracy at 4 digt,and the lines of different colours represent different sequence error rate.
It shows that the accuracy of our method was 96.47% at the depth of 80 with sequence error rate of 0.01(the most close situation to the truth).


  • Dunckley H. 2012. HLA typing by SSO and SSP methods. Methods Mol Biol 882: 9-25.
  • Erlich RL, Jia X, Anderson S, Banks E, Gao X, Carrington M, Gupta N, DePristo MA, Henn MR, Lennon NJ et al. 2011. Next-generation sequencing for HLA typing of class I loci. BMC Genomics 12: 42.

*For more details, please refer to the 'README.pdf' in the software package.
To Top