安装vcftools

configure的过程报错:Zlib找不到。

实际上已经安装zlib。

解决办法

export ZLIB_LIBS='-L/home/fenglei/local/lib -lz'
export ZLIB_CFLAGS=-I/home/fenglei/local/include
echo $ZLIB_LIBS $ZLIB_CFLAGS
./configure --prefix=/home/fenglei/local/
make
make install

成功安装。

参考资料:https://github.com/vcftools/vcftools/issues/29

Advertisements
发表在 Bioinformatics

[miRNA analysis]安装MiRPlant,patman

安装miRPlant程序,先需要安装patman(一个比对small RNA sequence的程序),但是安装patman的过程中遇到错误了。

[fenglei@localhost PatMaN]$ make install prefix=/home/fenglei/local/
g++ -DVERSION="\"1.2\"" -Wall -O3 -funroll-loops -DNDEBUG -march=k8 -c -o prefix_tree.o prefix_tree.cpp
In file included from /home/fenglei/local/include/assert.h:5:0,
 from /home/fenglei/local/include/c++/6.3.0/cassert:44,
 from prefix_tree.cpp:12:
/home/fenglei/local/include/except.h:15:32: error: conflicting declaration ‘typedef struct Except_Frame_T* Except_Frame_T’
 typedef struct Except_Frame_T *Except_Frame_T;
 ^~~~~~~~~~~~~~
/home/fenglei/local/include/except.h:15:16: note: previous declaration as ‘struct Except_Frame_T’
 typedef struct Except_Frame_T *Except_Frame_T;
 ^~~~~~~~~~~~~~
/home/fenglei/local/include/except.h:17:18: error: field ‘prev’ has incomplete type ‘Except_Frame_T’
 Except_Frame_T prev;
 ^~~~
/home/fenglei/local/include/except.h:16:8: note: definition of ‘struct Except_Frame_T’ is not complete until the closing brace
 struct Except_Frame_T {
 ^~~~~~~~~~~~~~
make: *** [Makefile:25: prefix_tree.o] Error 1

根据错误信息提示,查找下面的文件

/home/fenglei/local/include/except.h

这是GMAP编译过程中生成的文件,于是进入gmap安装目录,执行make uninstall进行卸载。发现except.h文件已经消失。

随后返回patman安装目录,执行make install –prefix=/home/fenglei/local 就顺利安装了。

 

 

发表在 Bioinformatics

Gene Ontology analysis for DEGs of Arabidopsis

try http:// if https:// URLs are not supported
 source("https://bioconductor.org/biocLite.R")
 biocLite("clusterProfiler")
 biocLite("DOSE")
 biocLite("tibble")
 library(clusterProfiler)
 biocLite("topGO")
 library(topGO)
 biocLite("org.At.tair.db")
 library(org.At.tair.db)

a=read.table("diff.gene.table", head=T, sep="\t")a=read.table("diff.gene.table", head=T, sep="\t")b=a[,1]
keytypes(org.At.tair.db)  ## 看该数据库支持哪些基因名称类型,例如拟南芥支持AT1G01110 就是keytype="TAIR"
ego <- enrichGO(gene          = b, keyType = "TAIR",                OrgDb         = org.At.tair.db,                ont           = "CC",                pAdjustMethod = "BH",                pvalueCutoff  = 0.05,                qvalueCutoff  = 0.05, readable      = TRUE)
barplot(ggo, drop=TRUE, showCategory=12)dotplot(ego)
write.table(as.data.frame( ego@result), file="test_CC.txt")
# KEGG over-representation testkk <- enrichKEGG(gene         = b,                 organism     = 'ath',                 pvalueCutoff = 0.05)
 write.table(as.data.frame(kk@result), file="test_kk.txt")

# KEGG Gene Set Enrichment Analysiskk2 <- gseKEGG(geneList     = b,               organism     = 'ath',               nPerm        = 1000,               minGSSize    = 120,               pvalueCutoff = 0.05,               verbose      = FALSE)head(kk2)
mkk <- enrichMKEGG(gene = b,                   organism = 'ath')

 

发表在 Bioinformatics

根据参考基因组对contigs进行排列

Multiple programs have been developed for reference-assisted chromosome assembly: Bambus [10], BACCardI [11], Projector2 [12], OSLay [13], ABACAS [14], MeDuSa [15], AlignGraph [16], Ragout [17], SyMap [18] and RACA [19]. Most of the listed tools were designed for bacterial or small genomes. For example, ABACAS is a convenient bacterial genome contiguation tool that may also be used for small eukaryotic genomes such as Saccharomyces cerevisiae (12.1 mega base pairs). However, ABACAS is not efficiently scaled to use with the large genomes typical of vertebrate species.

ABACAS对小型基因组很实用。但是注意:contig连接处是由99个N连接起来的,实际上contig的末端室友重叠关系的。

针对大型基因组,可以使用Chromosomer: a reference-based genome arrangement tool for producing draft chromosome sequences

发表在 Bioinformatics

基因组序列的共线性分析

1 LASTZ

13742_2016_141_fig5_html

Ref:http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html

Paper:https://gigascience.biomedcentral.com/articles/10.1186/s13742-016-0141-6

2 Gepard 

f4-large

Ref:http://cube.univie.ac.at/gepard

Paper:http://www.plantcell.org/content/23/1/27

3 MAUVE ( Multiple Alignment of Conserved Genomic Sequence )

fmicb-02-00236-g001

Ref:darlinglab.org/mauve/mauve.html

4 MCscanX

circle

dual_synteny

dot

MCScanX采用改进了的MCScan算法,分析基因组内或者基因组间的共线性区块。它利用两个物种蛋白质blastp比对结果,再结合这些蛋白质基因在基因组中的位置,得到两个物种基因组的共线性区块。如果是分析基因组内的共线性区块,物种内蛋白质自己比对自己就好了。

Ref:http://chibba.pgml.uga.edu/mcscan2/examples/example7.php

5 C-Sibelia: an easy-to-use and highly accurate tool for bacterial genome comparison

9c8a61c3-6abf-49a0-8637-fec6d0ed1aa7_figure1

ref:

6 SyMAP (http://www.symapdb.org/)

alignment2d

发表在 Bioinformatics

SSPACE: Can’t locate getopts.pl

问题:运行SSPACE_Standard_v3.0.pl报错。

Can’t locate getopts.pl in @INC (@INC contains: /xxx/SSPACE-STANDARD/dotlib/ /home/fenglei-cuhk/perl5/lib/perl5 /etc/perl /usr/local/lib/perl/5.18.2 /usr/local/share/perl/5.18.2 /usr/lib/perl5 /usr/share/perl5 /usr/lib/perl/5.18 /usr/share/perl/5.18 /usr/local/lib/site_perl .) at /home/xx/SSPACE-STANDARD/SSPACE_Standard_v3.0.pl line 124.

原因:

getopts.pl is a Perl 4 core library but no longer included in current Perl 5 distributions.

解决办法:

在Linux命令行模式下输入“perl -MCPAN -e shell”进入perl界面,然后输入“install Perl4::CoreLibs”即可。

 

 

发表在 Bioinformatics, Linux

Shell编程:获取文件前缀字符串

比如处理名为 /xxx/yyy/zzz/abc2017.1.fq.gz 的文件,我们需要获取abc2017这个文件名。

i=/xxx/yyy/zzz/abc2017.1.fq.gz

IN=$i      
# read1文件绝对路径:/xxx/yyy/zzz/abc2017.1.fq.gz

path=${IN%/*}        
# Read1文件的绝对路径前缀,以“/”作为分隔界限并获取界限之前的字符串:/xxx/yyy/zzz

sample=${path##*/}       
# Read1所在文件夹名称,即 zzz

prefix=${IN%.1*}             
# Read1与Read2两个文件共有的路径前缀,即以“.1”作为分隔界限并获取界限之前的字符串 /xxx/yyy/zzz/abc2017

sample=${prefix##*/}   
# sample的名称 abc2017
发表在 Bioinformatics, Linux