HiC-Pro使用手册

软件安装

软件默认提供了Singularity的镜像，可以直接使用

环境依赖

The bowtie2 mapper
Python (>2.7) with pysam (>=0.8.3), bx-python(>=0.5.0), numpy(>=1.8.2), and scipy(>=0.15.1) libraries. Note that the current version does not support python 3
R with the RColorBrewer and ggplot2 (>2.2.1) packages
g++ compiler
samtools (>1.1)
Unix sort (which support -V option) is required ! For Mac OS user, please install the GNU core utilities !

其中，python模块pysam与bx-python不太好安装，因此，还是使用conda来配置安装环境比较方便。

1
2
3
4
5
6
7


conda create -n hicpro python=2.7
source activate hicpro
conda install -y samtools bowtie2 R
conda install -y pysam bx-python numpy scipy

R
>install.packages(c('ggplot2','RColorBrewer'))

但是我们得到一条报错

1
2
3
4
5


Warning messages:
1: In download.file(url, destfile = f, quiet = TRUE) :
  unable to load shared object '/Storage/data002/shurh/miniconda3/envs/hicpro/lib/R/modules//internet.so':
  libssl.so.1.0.0: cannot open shared object file: No such file or directory
2: packages ‘ggplot2’, ‘RColorBrewer’ are not available (for R version 3.5.1)

经过错误排查，我们需要升级curl

1

conda update curl

然后，我们就可以重新在R中安装ggplot2与RColorBrewer了

软件编译

解决了环境依赖之后，编译安装就很容易了

1
2
3
4
5


git clone https://github.com/nservant/HiC-Pro.git
cd HiC-Pro
vi config-install.txt
make configure
make install

测试使用

测试数据

1
2


wget https://zerkalo.curie.fr/partage/HiC-Pro/HiCPro_testdata.tar.gz && tar -zxvf HiCPro_testdata.tar.gz
wget ftp://ftp.ncbi.nlm.nih.gov/refseq/H_sapiens/annotation/GRCh38_latest/refseq_identifiers/GRCh38_latest_genomic.fna.gz

还可以使用其他已发表的数据来使用

1
2
3
4
5
6


prefetch SRR824846
wget ftp://ftp.ensemblgenomes.org/pub/bacteria/release-40/fasta/bacteria_20_collection/caulobacter_crescentus_na1000/dna/Caulobacter_crescentus_na1000.ASM2200v1.dna.toplevel.fa.gz
gunzip Caulobacter_crescentus_na1000.ASM2200v1.dna.toplevel.fa.gz
bowtie2-build Caulobacter_crescentus_na1000.ASM2200v1.dna.toplevel.fa bacteria
cat >genome.size
Chromosome 4016942

得到注释文件

HiC-Pro需要两个注释文件，分别包含实验对基因组的酶切位点，以及基因组的染色体长度。

restricttion fragments file

该文件可以由HiC-Pro提供的一支程序得到

染色体长度

1
2
3
4
5
6
7
8


###
## How to generate chromosome size files ?
###
require(Biostrings)

TargetGenome <- readDNAStringSet("genome.fasta", format = "fasta")
chrom.size <- seqlengths(TargetGenome)
write.table(chrom.size, file="genome.size", quote=FALSE, col.names=TRUE, sep="\t")