使用Jellyfish 统计k-mer
-
下载并安装: http://www.genome.umd.edu/jellyfish.html#Release
-
准备数据
参照陈连福的方法,提前将paired-end数据整合。将read2数据的序列反向重复后与read1文件合并。这一步使用Trinity附带的fastool程序完成转换。
1
2
3
|
$ zcat read_1.clean.fq.gz | $Trinity_Home/trinity-plugins/fastool/fastool --illumina-trinity --to-fasta > reads_1.fasta
$ zcat read_2.clean.fq.gz | $Trinity_Home/trinity-plugins/fastool/fastool --rev --illumina-trinity --to-fasta > reads_2.fasta
$ cat reads_1.fasta reads_2.fasta > both.fasta
|
- 统计kmers
1
|
$ jellyfish count -C -m 21 -s 1000000000 -t 10 both.fasta -o reads.jf
|
- 绘制直方图
1
|
$ jellyfish histo -t 10 reads.jf > reads.histo
|
-m是kmer长度;
-s是预估哈希表的大小,即G+Gce*k。G是Genome Size;c是coverage(genome survey测序通常低于100x);e是测序错误率(illumina为1%);k是kmer大小。
-C表示考虑DNA正义与反义链,遇到反义kmer时,计入正义kmer频数中。
Running GenomeScope Online
在线作图http://genomescope.org/
Running GenomeScope on the Command Line
1
|
$ Rscript genomescope.R histogram_file k-mer_length read_length output_dir [kmer_max] [verbose]
|
参考来源
https://github.com/schatzlab/genomescope
https://gif.biotech.iastate.edu/genomescope
http://www.chenlianfu.com/?p=806
https://fengleiblog.wordpress.com/2014/12/17/%E5%9F%BA%E4%BA%8Eillumina%E6%B5%8B%E5%BA%8F%E6%95%B0%E6%8D%AE%EF%BC%8C%E4%BD%BF%E7%94%A8jellyfish%E4%BC%B0%E8%AE%A1%E5%9F%BA%E5%9B%A0%E7%BB%84%E5%A4%A7%E5%B0%8F/