6/22/2023 0 Comments Samtools threadsMy $BAMFilePath = "/path/to/your/alignment. Each category is represented in the output as PASS + FAIL, followed by a short description of the category. My $maxThread = 4 #-maximum number of threads to be issued The tool does a full pass through the BAM/SAM/CRAM format input file and it calculates and returns statistics counts for each of 13 categories based on bit flags in the FLAG field. For the former, I prefer the threads more than fork For the latter, I prefer piping samtools to stream the BAM rather than using Bio::DB::Sam. One could constantly create and detach the thread (code 1), or create a definite number of threads and create a queue for each thread (code 2) Ĭode 1: Constantly create and detach threads, one thread for one chromosome There are two critical ingredients to make this receipt works: 1) To initiate multi-threads in Perl 2) To access the BAM of one particular chromosome in Perl. The issue that I encountered is that the speed of 'calmd' is incredibly slow The jobs have already run 12 hours, and only 600MB BAM with MD tag are generated. The size of original BAM is around 50Gb (whole genome sequence by using pacbio HIFI reads). The logic is simple: Count each chromosome independently in a thread. I am using 'samtools calmd' to add MD tag back to BAM file. Solution: To implement multiple threads BAM counting in Perl, one thread per one chromosome, with the help from samtools. There are existing tools doing the job very well, like bedtools and HTSeqCount, but none of them are multithreaded. Sometimes BAM files are big so counting could be slow and multithreading definitely helps. Problem: One of the most common tasks in processing BAM files is to count the number of reads mapped to a particular region, e.g.
0 Comments
Leave a Reply. |