Privatre:bioinfomatics: Difference between revisions
(7 intermediate revisions by the same user not shown) | |||
Line 120: | Line 120: | ||
* https://github.com/danielecook/Awesome-Bioinformatics | * https://github.com/danielecook/Awesome-Bioinformatics | ||
* https://mybiosoftware.com/ | * https://mybiosoftware.com/ | ||
* https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/ | * https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/ | ||
* Cluster system software modules - https://bioinformatics.uconn.edu/cbc_software/software-2/ | * Cluster system software modules - https://bioinformatics.uconn.edu/cbc_software/software-2/ | ||
== We can use BIOConda <ref>https://bioconda.github.io/index.html</ref> == | == We can use BIOConda <ref>https://bioconda.github.io/index.html</ref> == | ||
Line 129: | Line 127: | ||
== Lib and sources == | == Lib and sources == | ||
{| class="wikitable sortable | |||
=== Key online URLs === | |||
* https://vcru.wisc.edu/simonlab/bioinformatics/programs | |||
* https://mybiosoftware.com/ | |||
* https://crc.pitt.edu/applications | |||
* https://bioconda.github.io/conda-package_index.html | |||
* https://studylib.net/doc/25332332/the-biostar-handbook | |||
{| class="wikitable sortable" | |||
|+ | |+ | ||
!Libraries | !Libraries | ||
(Python 3.9) | (Python 3.9) | ||
!Mamba or manual | !Mamba or manual | ||
!Deps | |||
!Description | !Description | ||
!References | !References | ||
|- | |- | ||
| | |BWA | ||
|Mamba | |Mamba | ||
| | | | ||
|BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. | |BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. | ||
| | | | ||
Line 151: | Line 155: | ||
|samtools | |samtools | ||
|Mamba | |Mamba | ||
| | |||
| | | | ||
| | | | ||
Line 156: | Line 161: | ||
|ncbi-blast+<ref>https://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html</ref> | |ncbi-blast+<ref>https://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html</ref> | ||
|Manual | |Manual | ||
|polyphen2 | |||
|ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML). | |ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML). | ||
|<nowiki>https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html</nowiki> | |<nowiki>https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html</nowiki> | ||
Line 161: | Line 167: | ||
<nowiki>https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/</nowiki> | <nowiki>https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/</nowiki> | ||
<nowiki>http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download</nowiki> | |||
https://vcru.wisc.edu/simonlab/bioinformatics/programs/#blastplus | |||
setup | |||
rsync -avz polyphen-2.2.2/precomputed/* polyphen-2.2.3/precomputed/ | |||
others$ wget <nowiki>ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.9.0/ncbi-blast-2.9.0+-x64-linux.tar.gz</nowiki> | |||
$ tar vxaf ncbi-blast-2.9.0+-x64-linux.tar.gz | |||
|- | |- | ||
|somatic-sniper | |somatic-sniper | ||
|Mamba | |Mamba | ||
| | |||
|a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. | |a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. | ||
| | | | ||
Line 169: | Line 189: | ||
|breakdancer | |breakdancer | ||
|Mamba | |Mamba | ||
| | |||
|a package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads | |a package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads | ||
| | | | ||
Line 174: | Line 195: | ||
|tigra-sv<ref>https://bioinformatics.mdanderson.org/public-software/archive/tigra/</ref> | |tigra-sv<ref>https://bioinformatics.mdanderson.org/public-software/archive/tigra/</ref> | ||
|Manual | |Manual | ||
| | |||
|a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI | |a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI | ||
|https://bioinformatics.mdanderson.org/public-software/archive/tigra/ | |https://bioinformatics.mdanderson.org/public-software/archive/tigra/ | ||
Line 179: | Line 201: | ||
|<s>TopHat</s>-> hisat2 | |<s>TopHat</s>-> hisat2 | ||
|NA | |NA | ||
| | |||
|Please note that TopHat has entered a low maintenance, low [[support]] stage as it is now largely superseded by '''HISAT2''' which provides the same core functionality<ref>https://ccb.jhu.edu/software/tophat/index.shtml</ref> | |Please note that TopHat has entered a low maintenance, low [[support]] stage as it is now largely superseded by '''HISAT2''' which provides the same core functionality<ref>https://ccb.jhu.edu/software/tophat/index.shtml</ref> | ||
|'''Not support Python 3''' | |'''Not support Python 3''' | ||
Line 184: | Line 207: | ||
|[https://daehwankimlab.github.io/hisat2/ HISAT2] | |[https://daehwankimlab.github.io/hisat2/ HISAT2] | ||
|Mamba | |Mamba | ||
| | |||
|'''HISAT2''' is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. | |'''HISAT2''' is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. | ||
| | | | ||
Line 189: | Line 213: | ||
|cufflinks<ref>http://cole-trapnell-lab.github.io/cufflinks/</ref> | |cufflinks<ref>http://cole-trapnell-lab.github.io/cufflinks/</ref> | ||
|Manual | |Manual | ||
| | |||
|manual install to prefix | |manual install to prefix | ||
(231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al | (231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al | ||
Line 195: | Line 220: | ||
|bedtools | |bedtools | ||
|Mamba | |Mamba | ||
| | |||
| | | | ||
| | | | ||
Line 200: | Line 226: | ||
|T-COFFEE | |T-COFFEE | ||
|Mamba | |Mamba | ||
| | |||
| | | | ||
| | | | ||
Line 205: | Line 232: | ||
|mafft | |mafft | ||
|Mamba | |Mamba | ||
| | |||
| | | | ||
| | | | ||
Line 210: | Line 238: | ||
|maq | |maq | ||
|Manual | |Manual | ||
| | |||
|Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. | |Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. | ||
hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$ | hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$ | ||
Line 229: | Line 258: | ||
|muscle, | |muscle, | ||
|Mamba | |Mamba | ||
| | |||
| | | | ||
| | | | ||
Line 234: | Line 264: | ||
|phyml | |phyml | ||
|Mamba | |Mamba | ||
| | |||
| | | | ||
| | | | ||
|- | |- | ||
| | |primer3 | ||
|Mamba | |Mamba | ||
| | |||
| | | | ||
| | | | ||
|- | |- | ||
|probcons | |probcons | ||
|Mamba | |Mamba | ||
| | |||
| | | | ||
| | | | ||
|- | |- | ||
|sim4 | |sim4 | ||
|Manual | |Manual | ||
| | |||
|(231216) hpcmate@223vmbase:~/biolib/compile/sim4/sim4.2012-10-10$ | |(231216) hpcmate@223vmbase:~/biolib/compile/sim4/sim4.2012-10-10$ | ||
or | or | ||
Line 255: | Line 289: | ||
|tigr-glimmer | |tigr-glimmer | ||
|mamba | |mamba | ||
| | |||
|tigr-glimmer | |tigr-glimmer | ||
| | | | ||
Line 260: | Line 295: | ||
|amap-align | |amap-align | ||
|mamba | |mamba | ||
| | |||
|AMAP is a multiple sequence alignment program based on sequence annealing | |AMAP is a multiple sequence alignment program based on sequence annealing | ||
|https://github.com/mes5k/amap-align | |https://github.com/mes5k/amap-align | ||
Line 265: | Line 301: | ||
|dialign -> dialign2 | |dialign -> dialign2 | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 270: | Line 307: | ||
|emboss | |emboss | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 275: | Line 313: | ||
|exonerate | |exonerate | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 280: | Line 319: | ||
|kalign2 & kalign3 | |kalign2 & kalign3 | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 285: | Line 325: | ||
|CNVnator | |CNVnator | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 290: | Line 331: | ||
|CREST | |CREST | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 295: | Line 337: | ||
|CAP3 | |CAP3 | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 300: | Line 343: | ||
|Cluster -> mmseqs2 | |Cluster -> mmseqs2 | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 305: | Line 349: | ||
|Cluster | |Cluster | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 310: | Line 355: | ||
|FastQC | |FastQC | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 315: | Line 361: | ||
|fastx_toolkit | |fastx_toolkit | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 320: | Line 367: | ||
|IGVTools | |IGVTools | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 325: | Line 373: | ||
|MACS -> macs2 | |MACS -> macs2 | ||
|mamba | |mamba | ||
| | |||
|Need Python < 3 | |Need Python < 3 | ||
| | | | ||
Line 330: | Line 379: | ||
|Meerkat -> django-meerkat | |Meerkat -> django-meerkat | ||
|pip | |pip | ||
| | |||
|pip install django-meerkat | |pip install django-meerkat | ||
| | | | ||
Line 335: | Line 385: | ||
|RNAcode | |RNAcode | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 340: | Line 391: | ||
|RNAz | |RNAz | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 345: | Line 397: | ||
|RepeatMasker | |RepeatMasker | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 350: | Line 403: | ||
|SNVMix2 | |SNVMix2 | ||
|manual | |manual | ||
| | |||
| | | | ||
|<nowiki>https://github.com/shahcompbio/snvmix</nowiki> | |<nowiki>https://github.com/shahcompbio/snvmix</nowiki> | ||
Line 356: | Line 410: | ||
|SOAPdenovo2-src | |SOAPdenovo2-src | ||
|mamba | |mamba | ||
| | |||
|SOAPdenovo2 | |SOAPdenovo2 | ||
|dependency -> samtool 0.1.9 | |dependency -> samtool 0.1.9 | ||
Line 361: | Line 416: | ||
|VarScan | |VarScan | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 366: | Line 422: | ||
|ViennaRNA | |ViennaRNA | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 371: | Line 428: | ||
|bismark | |bismark | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 376: | Line 434: | ||
|blat | |blat | ||
|mamba | |mamba | ||
|polyphen2 | |||
| | | | ||
|<nowiki>https://kentinformatics.com/</nowiki> | |<nowiki>https://kentinformatics.com/</nowiki>Blat tools are necessary in order to analyze variants in novel, | ||
unannotated or otherwise non-standard genes and proteins. Note that | |||
PolyPhen-2 uses UCSC hg19 database as the reference source of all gene | |||
annotations and UniProtKB for protein sequences and annotations. If | |||
you want to analyze genes/proteins from a different source | |||
(e.g., RefSeq or Ensembl) this would also require Blat tools. | |||
Instructions for downloading Blat sources and executables can be | |||
found here: | |||
<nowiki>http://genome.ucsc.edu/FAQ/FAQblat.html#blat3</nowiki> | |||
Complete set of binary executables for 64-bit Linux is available here: | |||
http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ | |||
PolyPhen-2 only needs the following three files: | |||
blat | |||
twoBitToFa | |||
bigWigToWig | |||
|- | |- | ||
|circos | |circos | ||
|mamba | |mamba | ||
| | |||
| | | | ||
|error : Is a directory: '/opt/anaconda/envs/231216/README' -> remov README directory | |error : Is a directory: '/opt/anaconda/envs/231216/README' -> remov README directory | ||
Line 387: | Line 475: | ||
(=clustalW2) | (=clustalW2) | ||
|mamba | |mamba | ||
| | |||
|ClustalW2 - Multiple Sequence Alignment<ref>https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal</ref> | |ClustalW2 - Multiple Sequence Alignment<ref>https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal</ref> | ||
|ClustalW, the command line version of clustalx | |ClustalW, the command line version of clustalx | ||
Line 394: | Line 483: | ||
|- | |- | ||
|clustalx | |clustalx | ||
-> NA | |||
|need X window | |need X window | ||
| | |||
|Multiple Sequence Alignment, Graphic interface | |Multiple Sequence Alignment, Graphic interface | ||
|<nowiki>https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal</nowiki> | |<nowiki>https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal</nowiki> | ||
Line 401: | Line 492: | ||
|cnD | |cnD | ||
|manual install | |manual install | ||
| | |||
|(231216) hpcmate@223vmbase:~/biolib/compile/cnD$ | |(231216) hpcmate@223vmbase:~/biolib/compile/cnD$ | ||
|<nowiki>https://mybiosoftware.com/cnd-1-2-copy-number-variant-caller-inbred-strains.html</nowiki> | |<nowiki>https://mybiosoftware.com/cnd-1-2-copy-number-variant-caller-inbred-strains.html</nowiki> | ||
Line 413: | Line 505: | ||
|cpc -> CPC2 | |cpc -> CPC2 | ||
|mamba | |mamba | ||
| | |||
| | | | ||
|https://github.com/biocoder/cpc | |https://github.com/biocoder/cpc | ||
Line 418: | Line 511: | ||
|fasta -> fasta3 | |fasta -> fasta3 | ||
|mamba | |mamba | ||
| | |||
|The FASTA package - protein and DNA sequence similarity searching and alignment programs | |The FASTA package - protein and DNA sequence similarity searching and alignment programs | ||
| | | | ||
Line 423: | Line 517: | ||
|gmap-gsnap -> gmap | |gmap-gsnap -> gmap | ||
|mamba | |mamba | ||
| | |||
|gmap & gsnap packages | |gmap & gsnap packages | ||
| | | | ||
Line 428: | Line 523: | ||
|lobstr | |lobstr | ||
|manual | |manual | ||
| | |||
|lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data. | |lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data. | ||
Line 439: | Line 535: | ||
make | make | ||
|- | |- | ||
|meme | |meme<ref>https://meme-suite.org/meme/doc/install.html?man_type=web</ref> | ||
|mamba | |mamba | ||
| | |||
| | | | ||
|https://meme-suite.org/meme/<nowiki/>For [[Linux]] 64, Open MPI is built with [[CUDA]] awareness but this support is disabled by default. | |https://meme-suite.org/meme/<nowiki/>For [[Linux]] 64, Open MPI is built with [[CUDA]] awareness but this support is disabled by default. | ||
Line 461: | Line 558: | ||
|miRDP-> '''mirdeep2''' <small>2.0.1.3</small> | |miRDP-> '''mirdeep2''' <small>2.0.1.3</small> | ||
|mamba | |mamba | ||
| | |||
|miRDeep-P (miRDP) is a tool which can be used to detecting miRNAs in plants from deeply sequenced small RNA libraries. It was developed by modifying miRDeep, which is based on a probabilistic model of miRNA biogenesis in animals, with a plant-specific scoring system and filtering criteria. | |miRDeep-P (miRDP) is a tool which can be used to detecting miRNAs in plants from deeply sequenced small RNA libraries. It was developed by modifying miRDeep, which is based on a probabilistic model of miRNA biogenesis in animals, with a plant-specific scoring system and filtering criteria. | ||
miRDP2 is adopted from miRDeep-P (miRDP) with new strategies and overhauled algorithm. | miRDP2 is adopted from miRDeep-P (miRDP) with new strategies and overhauled algorithm. | ||
Line 467: | Line 565: | ||
|'''mirdeep-p2''' <small>1.1.4</small> | |'''mirdeep-p2''' <small>1.1.4</small> | ||
|mamba | |mamba | ||
| | |||
|A fast and accurate tool for analyzing the miRNA transcriptome in plants | |A fast and accurate tool for analyzing the miRNA transcriptome in plants | ||
| | | | ||
Line 472: | Line 571: | ||
|mirdeep2 | |mirdeep2 | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 477: | Line 577: | ||
|picard-tools -> '''picard 3.1.1''' | |picard-tools -> '''picard 3.1.1''' | ||
|mamba | |mamba | ||
| | |||
|Java tools for working with NGS data in the BAM format | |Java tools for working with NGS data in the BAM format | ||
conda | conda | ||
Line 483: | Line 584: | ||
|polyphen | |polyphen | ||
|Manual | |Manual | ||
| | |||
|PolyPhen (Polymorphism Phenotyping) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. | |PolyPhen (Polymorphism Phenotyping) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. | ||
Need at least 70 GB | |||
|http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads<nowiki/>https://sunyaevlab.hms.harvard.edu/wiki/!web/software<nowiki/>175 GB of free disk space | |http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads<nowiki/>https://sunyaevlab.hms.harvard.edu/wiki/!web/software<nowiki/>175 GB of free disk space | ||
Perl, Java, | Perl, Java, | ||
Line 490: | Line 593: | ||
search by apt : apt search perl | grep -i XML::Simple -B 3 | search by apt : apt search perl | grep -i XML::Simple -B 3 | ||
* List::Util <<< liblist-allutils-perl | * List::Util <<< liblist-allutils-perl | ||
* XML::Simple. libxml-opml-simplegen-perl | * XML::Simple. libxml-opml-simplegen-perl | ||
* DBD::SQLite. libdbd-sqlite3-perl | * DBD::SQLite. libdbd-sqlite3-perl | ||
* CGI.pm. | * CGI.pm. | ||
sudo apt-get install libscalar-list-utils-perl libxml-simple-perl libdbd-sqlite3-perl libcgi-pm-perl build-essential default-jre | sudo apt-get install libscalar-list-utils-perl libxml-simple-perl libdbd-sqlite3-perl libcgi-pm-perl build-essential default-jre bioperl | ||
Line 501: | Line 604: | ||
Blat tools are necessary in order to analyze variants in novel, unannotated or otherwise non-standard genes and proteins | |||
Line 522: | Line 624: | ||
|rseq | |rseq | ||
|Manual | |Manual | ||
| | |||
| | | | ||
| | | | ||
Line 527: | Line 630: | ||
|seqtk-master -> seqtk | |seqtk-master -> seqtk | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 532: | Line 636: | ||
|sickle-master -> sickle | |sickle-master -> sickle | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 537: | Line 642: | ||
|snpEff | |snpEff | ||
|mamba | |mamba | ||
| | |||
| | | | ||
| | | | ||
Line 542: | Line 648: | ||
|soap | |soap | ||
|pip | |pip | ||
| | |||
| | | | ||
| | | | ||
Line 547: | Line 654: | ||
|'''rSeq: RNA-Seq Analyzer''' | |'''rSeq: RNA-Seq Analyzer''' | ||
|Manual | |Manual | ||
| | |||
|https://jhui2014.github.io/rseq/ | |https://jhui2014.github.io/rseq/ | ||
|On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src | |On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src | ||
Line 552: | Line 660: | ||
|SNVMix2 | |SNVMix2 | ||
|Manual | |Manual | ||
| | |||
|https://github.com/shahcompbio/snvmix | |https://github.com/shahcompbio/snvmix | ||
|imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$ | |imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$ | ||
Line 557: | Line 666: | ||
|Samtools | |Samtools | ||
|mamba | |mamba | ||
| | |||
|SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments | |SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments | ||
| | | | ||
Line 565: | Line 675: | ||
|Breakdancer | |Breakdancer | ||
|mamba | |mamba | ||
| | |||
|BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use <code>make</code>. The requirements are the zlib, development [[library]], gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process | |BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use <code>make</code>. The requirements are the zlib, development [[library]], gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process | ||
<code># --recursive option is important so that it gets the submodules too | <code># --recursive option is important so that it gets the submodules too | ||
Line 573: | Line 684: | ||
* https://github.com/shendurelab/LACHESIS/issues/30 | * https://github.com/shendurelab/LACHESIS/issues/30 | ||
|- | |- | ||
|<code>vcftools</code> | |||
|mamba | |||
| | | | ||
| | | -c bioconda | ||
| | | | ||
|} | |} |
Latest revision as of 10:48, 19 December 2023
Bioinfomatics
A combined technologies with biology, computer science, mathmatics and statistics. [1]
Bioinfomatics workflow steps
- quality control assessmemt steps
- sequence alignment
- data summarization into genes/regions
- data annotation to genomics features
- statistical comparisons
- mutltiomic ingetration
Bioinfomatics curated software list[2]
- Package suites
- Data Tools
- Downloading
- Compressing
- Data Processing
- Command Line Utilities
- Next Generation Sequencing
- Workflow Managers
- Pipelines
- Sequence Processing
- Data Analysis
- Sequence Alignment
- Pairwise
- Multiple Sequence Alignment
- Clustering
- Quantification
- Variant Calling
- Structural variant callers
- BAM File Utilities
- VCF File Utilities
- GFF BED File Utilities
- Variant Simulation
- Variant Prediction/Annotation
- Tools for Assessment of Variants
- PolyPhen-2 is a tool for predicting the effect of an amino acid substitution on protein structure and function, based on comparative genomics and experimentally determined protein structures. It is available as a web service, and can also be downloaded as a standalone application.
- SNPtrack is a simple interface for mutation mapping and identifying causal mutations from whole-genome sequencing studies. It is available as a web service.
- Tools for Mass Spectrometry and Proteomics
- MS-BLAST is a tool for searching protein sequences identified with tandem mass spectrometry against databases of protein sequences. It is available as a web service and as a standalone software.
- Tools for Statistical Genetics
- Joint Likelihood Mapping (JLIM) is a tool to test for shared genetic effect between two genetic association data, for example, a disease GWAS study and gene expression QTL (eQTL) study.
- Joint Likelihood Mapping 2 (JLIM_2.0) is a version of JLIM which supports meta-analysis across more than one cohort of matching ancestry.
- Joint Likelihood Mapping (JLIM) 2.5 is a new version of JLIM based on summary statistics.
- NPS is a tool for polygenic risk scoring based on partitioning-based non-parametric shrinkage algorithm.
- RVTT is a novel statistical test of trend that assesses the relationship of the frequency of qualifying rare variants in a pathway with dichotomous disease phenotypes leveraging the Cochran-Armitage test statistic.
- Tools for Cancer Genomics
- MutPanning is designed to detect rare cancer driver genes from aggregated whole-exome sequencing data.
- CBaSE enables cancer type and gene-specific estimation of the strength of negative and positive selection. It is available as a browser-based tool as well as for download as a standalone package.
- Tools for Population Genetics
- simDoSe is a fast and flexible Wright-Fisher simulator for arbitrary diploid selection evolving through realistic human demography.
- Python Modules
- Data
- Tools
- Assembly
- Annotation
- Long-read sequencing
- Long-read Assembly
- Visualization
- Genome Browsers / Gene Diagrams
- Circos Related
- Database Access
- Resources
- Becoming a Bioinformatician
- Bioinformatics on GitHub
- Sequencing
- RNA-Seq
- ChIP-Seq
- YouTube Channels and Playlists
- Blogs
- Miscellaneous
- Online networking groups
File format in Bioinfomatics
This section explains some of the commonly used file formats in bioinformatics[3]
File formats | File extensions |
---|---|
FASTA | .fa, .fasta, .fsa |
FASTQ | .fastq, .sanfastq, .fq |
SAM
(Sequence Alignment Map) |
file.sam |
BAM | file.bam |
VCF
(Variant Calling Format/File) |
file.vcf |
GFF
(General Feature Format or Gene Finding Format) |
file.gff2, file. gff3, file.gff |
GTF
(Gene Transfer format) |
file.gtf |
Usufull Tutorial Link
- https://vcru.wisc.edu/simonlab/bioinformatics/programs/
- https://github.com/danielecook/Awesome-Bioinformatics
- https://mybiosoftware.com/
- https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/
- Cluster system software modules - https://bioinformatics.uconn.edu/cbc_software/software-2/
We can use BIOConda [4]
Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used
Lib and sources
Key online URLs
- https://vcru.wisc.edu/simonlab/bioinformatics/programs
- https://mybiosoftware.com/
- https://crc.pitt.edu/applications
- https://bioconda.github.io/conda-package_index.html
- https://studylib.net/doc/25332332/the-biostar-handbook
Libraries
(Python 3.9) |
Mamba or manual | Deps | Description | References |
---|---|---|---|---|
BWA | Mamba | BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. | ||
samtools | Mamba | |||
ncbi-blast+[5] | Manual | polyphen2 | ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML). | https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html
https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download https://vcru.wisc.edu/simonlab/bioinformatics/programs/#blastplus setup rsync -avz polyphen-2.2.2/precomputed/* polyphen-2.2.3/precomputed/ others$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.9.0/ncbi-blast-2.9.0+-x64-linux.tar.gz $ tar vxaf ncbi-blast-2.9.0+-x64-linux.tar.gz |
somatic-sniper | Mamba | a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. | ||
breakdancer | Mamba | a package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads | ||
tigra-sv[6] | Manual | a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI | https://bioinformatics.mdanderson.org/public-software/archive/tigra/ | |
NA | Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality[7] | Not support Python 3 | ||
HISAT2 | Mamba | HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. | ||
cufflinks[8] | Manual | manual install to prefix
(231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al |
https://github.com/cole-trapnell-lab/cufflinks | |
bedtools | Mamba | |||
T-COFFEE | Mamba | |||
mafft | Mamba | |||
maq | Manual | Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.
hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$ |
https://maq.sourceforge.net/maq-man.shtml
https://mybiosoftware.com/sim4-20030613-align-expressed-dna-sequence-genomic-sequence.html https://mybiosoftware.com/maq-0-7-1-mapping-assembly-qualities.html
MAQ is really old, and by now it has problems compiling with current compilers. You can use the
Note I took the
| |
muscle, | Mamba | |||
phyml | Mamba | |||
primer3 | Mamba | |||
probcons | Mamba | |||
sim4 | Manual | (231216) hpcmate@223vmbase:~/biolib/compile/sim4/sim4.2012-10-10$
or |
https://globin.bx.psu.edu/ftp/dist/sim4/https://globin.bx.psu.edu/html/docs/sim4.html | |
tigr-glimmer | mamba | tigr-glimmer | ||
amap-align | mamba | AMAP is a multiple sequence alignment program based on sequence annealing | https://github.com/mes5k/amap-align | |
dialign -> dialign2 | mamba | |||
emboss | mamba | |||
exonerate | mamba | |||
kalign2 & kalign3 | mamba | |||
CNVnator | mamba | |||
CREST | mamba | |||
CAP3 | mamba | |||
Cluster -> mmseqs2 | mamba | |||
Cluster | mamba | |||
FastQC | mamba | |||
fastx_toolkit | mamba | |||
IGVTools | mamba | |||
MACS -> macs2 | mamba | Need Python < 3 | ||
Meerkat -> django-meerkat | pip | pip install django-meerkat | ||
RNAcode | mamba | |||
RNAz | mamba | |||
RepeatMasker | mamba | |||
SNVMix2 | manual | https://github.com/shahcompbio/snvmix
https://github.com/shahcompbio/snvmix/test/biolibs/gitbuild/snvmix, Version 0.11.8-r4 | ||
SOAPdenovo2-src | mamba | SOAPdenovo2 | dependency -> samtool 0.1.9 | |
VarScan | mamba | |||
ViennaRNA | mamba | |||
bismark | mamba | |||
blat | mamba | polyphen2 | https://kentinformatics.com/Blat tools are necessary in order to analyze variants in novel,
unannotated or otherwise non-standard genes and proteins. Note that PolyPhen-2 uses UCSC hg19 database as the reference source of all gene annotations and UniProtKB for protein sequences and annotations. If you want to analyze genes/proteins from a different source (e.g., RefSeq or Ensembl) this would also require Blat tools. Instructions for downloading Blat sources and executables can be found here: http://genome.ucsc.edu/FAQ/FAQblat.html#blat3 Complete set of binary executables for 64-bit Linux is available here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ PolyPhen-2 only needs the following three files: blat twoBitToFa bigWigToWig | |
circos | mamba | error : Is a directory: '/opt/anaconda/envs/231216/README' -> remov README directory | ||
clustalw
(=clustalW2) |
mamba | ClustalW2 - Multiple Sequence Alignment[10] | ClustalW, the command line version of clustalx
ClustalW2 is a general purpose DNA or protein multiple sequence alignment program for three or more sequences. For the alignment of two sequences please instead use our pairwise sequence alignment tools. The ClustalW2 services have been retired. To access similar services, please visit the Multiple Sequence Alignment tools page. For protein alignments we recommend Clustal Omega. For DNA alignments we recommend trying MUSCLE or MAFFT. If you have any questions/concerns please contact us via the feedback link above. | |
clustalx
-> NA |
need X window | Multiple Sequence Alignment, Graphic interface | https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal
ClustalX, the graphical interface, is available in the Bioinformatics menu | |
cnD | manual install | (231216) hpcmate@223vmbase:~/biolib/compile/cnD$ | https://mybiosoftware.com/cnd-1-2-copy-number-variant-caller-inbred-strains.html
cnD (Copy number variant detection) is a program to detect copy number variants from short read sequence data. How to install - https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/cnd.htm imp@CGX-GPU:~/test/bioinfomatics/cnD/cnD$ | |
cpc -> CPC2 | mamba | https://github.com/biocoder/cpc | ||
fasta -> fasta3 | mamba | The FASTA package - protein and DNA sequence similarity searching and alignment programs | ||
gmap-gsnap -> gmap | mamba | gmap & gsnap packages | ||
lobstr | manual | lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data.
hpcmate@223vmbase:~/biolib/compile/lobstr-code$ |
https://github.com/gymreklab/lobstr-code/blob/master/INSTALLsudo apt install libgsl-dev autotools-dev libboost-all-dev libgsl-dev pkg-config zlib1g-dev zlib1g
.configure make | |
meme[11] | mamba | https://meme-suite.org/meme/For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default.
To enable it, please set the environment variable OMPI_MCA_opal_cuda_support=true before launching your MPI processes. Equivalently, you can set the MCA parameter in the command line: mpiexec --mca opal_cuda_support 1 ... In addition, the UCX support is also built but disabled by default. To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment variables OMPI_MCA_pml="ucx" OMPI_MCA_osc="ucx" before launching your MPI processes. Equivalently, you can set the MCA parameters in the command line: mpiexec --mca pml ucx --mca osc ucx ... Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX's documentation for detail. | ||
miRDP-> mirdeep2 2.0.1.3 | mamba | miRDeep-P (miRDP) is a tool which can be used to detecting miRNAs in plants from deeply sequenced small RNA libraries. It was developed by modifying miRDeep, which is based on a probabilistic model of miRNA biogenesis in animals, with a plant-specific scoring system and filtering criteria.
miRDP2 is adopted from miRDeep-P (miRDP) with new strategies and overhauled algorithm. |
||
mirdeep-p2 1.1.4 | mamba | A fast and accurate tool for analyzing the miRNA transcriptome in plants | ||
mirdeep2 | mamba | |||
picard-tools -> picard 3.1.1 | mamba | Java tools for working with NGS data in the BAM format
conda |
source : https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/picard.htm | |
polyphen | Manual | PolyPhen (Polymorphism Phenotyping) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations.
Need at least 70 GB |
http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloadshttps://sunyaevlab.hms.harvard.edu/wiki/!web/software175 GB of free disk space
Perl, Java, Perl is required to run PolyPhen-2. Minimal version is 5.14.2; version 5.30.0 was the latest one successfully tested, required perl modules, search by apt : apt search perl | grep -i XML::Simple -B 3
sudo apt-get install libscalar-list-utils-perl libxml-simple-perl libdbd-sqlite3-perl libcgi-pm-perl build-essential default-jre bioperl
Blat tools are necessary in order to analyze variants in novel, unannotated or otherwise non-standard genes and proteins
http://genome.ucsc.edu/FAQ/FAQblat.html#blat3 Complete set of binary executables for 64-bit Linux is available here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ PolyPhen-2 only needs the following three files: blat, twoBitToFa, bigWigToWig $ cp blat twoBitToFa bigWigToWig $PPH/bin/ | |
rseq | Manual | |||
seqtk-master -> seqtk | mamba | |||
sickle-master -> sickle | mamba | |||
snpEff | mamba | |||
soap | pip | |||
rSeq: RNA-Seq Analyzer | Manual | https://jhui2014.github.io/rseq/ | On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src | |
SNVMix2 | Manual | https://github.com/shahcompbio/snvmix | imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$ | |
Samtools | mamba | SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments | ||
Breakdancer | mamba | BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use make . The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process
|
||
vcftools
|
mamba | -c bioconda |
References
- ↑ https://www.youtube.com/watch?v=ky1-mF0fHnQ
- ↑ https://github.com/danielecook/Awesome-Bioinformatics
- ↑ https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/file-formats-tutorial/
- ↑ https://bioconda.github.io/index.html
- ↑ https://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html
- ↑ https://bioinformatics.mdanderson.org/public-software/archive/tigra/
- ↑ https://ccb.jhu.edu/software/tophat/index.shtml
- ↑ http://cole-trapnell-lab.github.io/cufflinks/
- ↑ https://www.biostars.org/p/353144/
- ↑ https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal
- ↑ https://meme-suite.org/meme/doc/install.html?man_type=web