bwa [生物資訊實驗室]

BWA 簡介

軟體簡介 Fast, accurate, memory efficient aligner for short and long sequencing reads
核心技術 FM-index
適用平台 Sanger / Solexa Illumina / 454 / ABI SOLiD
原創作者 Heng Li / Richard Durbin
維護狀況 bwa 0.5.7 - 2010-03-01
輸入格式 Fastq
輸出格式 Sam
適用機器架構 i386 / x86_64
作業平台 Linux / Solaris / Mac / BSD

BWA 安裝流程


  1. BWA
  2. Samtool


  • 安裝BWA
# tar -jxvf bwa-0.5.7.tar.bz2
  • 編譯BWA
# make


  • 安裝Samtool
# tar -jxvf samtools-0.1.7_x86_64-linux.tar.bz2
  • 將Samtool加入PATH


在export PATH之前加入PATH="${PATH}:samtool資料夾的絕對路徑"


在export PATH之前加入PATH="${PATH}:/home/bioinfo/samtool"

BWA 使用流程

  1. Index the database file in the FASTA format
  2. Find the suffix array (SA) coordinates of good hits of each individual read
  3. Convert SA coordinates to chromosomal coordinate and pair reads


  1. Reference genome data (*.fa)
  2. NGS Short reads data (*.fastq)

建立 Index

  • 根據 reference genome data(e.g. reference.fa) 建立 Index File
# bwa index -a bwtsw reference.fa
  • 若是資料是 colorspace 的話,就要建立 colorspace 的 index file
#./bwa index -a bwtsw -c reference.fa 
  • bwa index 指令更多的用法及 options
#./bwa index 


* -c :建立colorspace的index所需要的參數。

尋找 SA coordinates

  • 利用以下資料找 SA coordinates
  1. NGS short reads data (e.g. leftRead.fastq / rightRead.fastq)
  2. refernece genome data(e.g. reference.fa)
# bwa aln reference.fa leftRead.fastq > leftRead.sai
# bwa aln reference.fa rightRead.fastq > rightRead.sai
  • 若是資料是 colorspace 的話,則必須先使用內附的 檔案先行轉換 reads,再產生 SA coordinates ,指令如下
#./bwa aln -c -f leftreads.sai reference.fa leftreads.fastq
#./bwa aln -c -f rightreads.sai reference.fa rightreads.fastq
  • 若是希望使用 multi threads 跑指令的話
#./bwa aln -c -t 3 -f leftreads.sai reference.fa leftreads.fastq
  • bwa aln 指令更多的用法及 options
#./bwa aln


* -f file:file to write output to instead of stdout

* -c:input sequences are in the color space

* -t num :number of threads. (初始值:1)

* Reason for loosing 2bp when using solid2fastq in bwa:

轉換 SA coordinates

  • 利用以下資料轉換SA coordinates file 輸出 Sam file
  1. reference genome data(reference.fa)
  2. NGS short reads(leftRead.fastq / rightRead.fastq)
  3. SA coordinate file(leftRead.sai / rightRead.sai)
# bwa sampe reference.fa leftRead.sai rightRead.sai leftRead.fastq rightread.fastq > human.sam 
  • Generate alignments in the SAM format given single-end reads
#./bwa samse -f leftreads.sam reference.fa leftreads.sai leftreads.fastq
#./bwa samse -f rightreads.sam reference.fa rightreads.sai rightreads.fastq
  • 讓結果出現有 multiple mapped 的相關內容
#./bwa samse -n 10000 -f leftreads.sam reference.fa leftreads.sai leftreads.fastq
  • 更多 bwa sampe 和 bwa samse 的指令用法和相關的options
#./bwa sampe
#./bwa samse


* -f file:輸出檔案

* -n num: Maximum number of alignments to output in the XA tag for reads paired properly.(預設值為:3)

Problem with nohup and stdout

I am still trying to get samse to work. I am using bwa version 0.61.
nohup ./bwa index -a bwtsw /archive/Koeln-February/ALL_DATA/Mus_musculus.NCBIM37.66.dna.toplevel.fa &
nohup ./bwa aln -t 6 -n 0.02 Mus_musculus.NCBIM37.66.dna.toplevel.fa SN7640083_2746_F1_B_2_sequence.fq > SN7640083_2746_F1_B_2.sai &
nohup ./bwa samse Mus_musculus.NCBIM37.66.dna.toplevel.fa SN7640083_2746_F1_B_2.sai SN7640083_2746_F1_B_2_sequence.fq > SN7640083_2746_F1_B_2.sam &

I only started each command once the previous one had finished. I attach below the nohup.out file, which shows that the indexing and alignment seemed to have worked fine, but the samse does not. I also attach below the samse outfile. It seems as if the alignemnt outfile is not a bam file.

I also attach a small test infile. The small test file works perfectly fine, but when I try and run the total dataset, I get the error shown below in the samse outfile.

Use “aln -f” to output to a file, instead of stdout by default. Please don't use nohup and try again (you do not need to index the genome again).


Multiple Mapping Problem

Indeed BWA only outputs suboptimal hits for single-end reads only. One needs to write an additional program to do the pairing. Implementing this feature in bwa in C may interfere with other part of the code and thus require quite some works and testing. I will not do this, at least not in the near future. Sorry.


PS: Samse/pe was initially designed to output up to N random hits if there are more than N hits. Later I thought this increases the file size without adding too much benefit (for my work) and switched this feature off.

On Sep 7, 2011, at 7:16 PM, Brian Haas wrote:

Hi all,

We (at the Broad) have a slightly modified version of BWA 0.5.7 that should properly report the multiply mapped reads for single end reads, modified by Andrey Sivachenko et al. I was given permission to share the patch, which I've made temporarily available here:

To use the patch (for those that are not patch-savvy), do the following:

Download the BWA version 0.5.7 from sourceforge here:

After uncompressing the archive, cd into the base directory and put the patch file there. Then run the following to apply the patch:

patch -p1 -i bwa-0.5.7-multi.patch

Then build the updated software normally.

Usage notes from Andrey are as follows:

The only difference is in running samse step. Run as:

bwa samse -n <N> -s …. (the rest is standard as per documentation)

in order to generate up to N multiple alignment records for each read.
If read has more than N alignments, subset of exactly N will be chosen at random.


Hopefully, something along these lines can be incorporated into the latest BWA release. Modifications to allow for pairing of these multiply-mapped reads would also be fantastic. At the very least, we may have a separate script available soon that will do the pairing of such reads, but it would be best to keep such functionality within the official BWA tool suite.
bwa.txt · 上一次變更: 2014/06/05 16:33 (external edit)
Recent changes RSS feed Creative Commons License Donate Powered by PHP Valid XHTML 1.0 Valid CSS Driven by DokuWiki