Privatre:bioinfomatics: Difference between revisions
| Line 103: | Line 103: | ||
== We can use BIOConda <ref>https://bioconda.github.io/index.html</ref> ==  | == We can use BIOConda <ref>https://bioconda.github.io/index.html</ref> ==  | ||
Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9  -> DLS38 can be used    | Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and '''3.9'''  -> DLS38 can be used    | ||
== Lib and sources ==  | == Lib and sources ==  | ||
| Line 109: | Line 109: | ||
|+  | |+  | ||
!Libraries  | !Libraries  | ||
!Mamba or manual  | |||
!Description  | !Description  | ||
!References  | !References  | ||
|-  | |||
|  | |||
|  | |||
|  | |||
|, TopHat, cufflinks, bedtools, T-COFFEE, mafft, maq, muscle, phyml, primer3,  | |||
probcons, sim4, tigr-glimmer, amap-align, dialign, emboss, exonerate, kalign, CNVnator,  | |||
CREST, CAP3, Cluster, FastQC, Fastx-toolkit, IGVTools, MACS, Meerkat, RNAcode,  | |||
RNAz, RepeatMasker, SNVMix2, SOAPdenovo2-src, VarScan, ViennaRNA,bismark, blat,  | |||
circos, clustalw, clustalx, cnD, cpc, fasta, gmap-gsnap, lobstr, meme, miRDP, mirdeep2,  | |||
picard-tools, polyphen, rseq, seqtk-master, sickle-master, snpEff, soap  | |||
|-  | |||
|meme<ref>https://meme-suite.org/meme/doc/install.html?man_type=web</ref>  | |||
|Mamba  | |||
|  | |||
|  | |||
|-  | |||
|BWA  | |||
|Mamba  | |||
|BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.  | |||
|  | |||
* https://bio-bwa.sourceforge.net/  | |||
* https://wikis.utexas.edu/display/bioiteam/BWA  | |||
|-  | |||
|samtools  | |||
|Mamba  | |||
|  | |||
|  | |||
|-  | |||
|ncbi-blast+<ref>https://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html</ref>  | |||
|Manual  | |||
|ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML).  | |||
|<nowiki>https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html</nowiki>  | |||
<nowiki>https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm</nowiki>  | |||
<nowiki>https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/</nowiki>  | |||
|-  | |||
|somatic-sniper  | |||
|Mamba  | |||
|a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.  | |||
|  | |||
|-  | |||
|breakdancer  | |||
|Mamba  | |||
|a  package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads  | |||
|  | |||
|-  | |||
|tigra-sv<ref>https://bioinformatics.mdanderson.org/public-software/archive/tigra/</ref>  | |||
|Manual  | |||
|a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI   | |||
|https://bioinformatics.mdanderson.org/public-software/archive/tigra/  | |||
|-  | |||
|<s>TopHat</s>  | |||
|  | |||
|Please note that TopHat has entered a low maintenance, low [[support]] stage as it is now largely superseded by '''HISAT2''' which provides the same core functionality<ref>https://ccb.jhu.edu/software/tophat/index.shtml</ref>  | |||
|'''Not support Python 3'''  | |||
|-  | |||
|[https://daehwankimlab.github.io/hisat2/ HISAT2]  | |||
|  | |||
|'''HISAT2''' is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.  | |||
|  | |||
|-  | |-  | ||
|'''rSeq: RNA-Seq Analyzer'''  | |'''rSeq: RNA-Seq Analyzer'''  | ||
|  | |||
|https://jhui2014.github.io/rseq/  | |https://jhui2014.github.io/rseq/  | ||
|On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src    | |On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src    | ||
|-  | |-  | ||
|SNVMix2  | |SNVMix2  | ||
|  | |||
|https://github.com/shahcompbio/snvmix  | |https://github.com/shahcompbio/snvmix  | ||
|imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$    | |imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$    | ||
|-  | |-  | ||
|Samtools  | |Samtools  | ||
|  | |||
|SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments  | |SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments  | ||
|  | |  | ||
| Line 128: | Line 197: | ||
|-  | |-  | ||
|Breakdancer  | |Breakdancer  | ||
|  | |||
|BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use <code>make</code>. The requirements are the zlib, development [[library]], gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process  | |BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use <code>make</code>. The requirements are the zlib, development [[library]], gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process  | ||
<code># --recursive option is important so that it gets the submodules too  | <code># --recursive option is important so that it gets the submodules too  | ||
| Line 136: | Line 206: | ||
* https://github.com/shendurelab/LACHESIS/issues/30  | * https://github.com/shendurelab/LACHESIS/issues/30  | ||
|-  | |-  | ||
|  | |  | ||
|  | |||
|  | |||
|  | |||
|}  | |}  | ||
== References ==  | == References ==  | ||
<references />  | <references />  | ||
Revision as of 20:54, 16 December 2023
Bioinfomatics
A combined technologies with biology, computer science, mathmatics and statistics. [1]
Bioinfomatics workflow steps
- quality control assessmemt steps
 - sequence alignment
 - data summarization into genes/regions
 - data annotation to genomics features
 - statistical comparisons
 - mutltiomic ingetration
 
Bioinfomatics curated software list[2]
- Package suites
 - Data Tools
- Downloading
 - Compressing
 
 - Data Processing
- Command Line Utilities
 
 - Next Generation Sequencing
- Workflow Managers
 - Pipelines
 - Sequence Processing
 - Data Analysis
 - Sequence Alignment
- Pairwise
 - Multiple Sequence Alignment
 - Clustering
 
 - Quantification
 - Variant Calling
- Structural variant callers
 
 - BAM File Utilities
 - VCF File Utilities
 - GFF BED File Utilities
 - Variant Simulation
 - Variant Prediction/Annotation
 - Python Modules
- Data
 - Tools
 
 - Assembly
 - Annotation
 
 - Long-read sequencing
- Long-read Assembly
 
 - Visualization
- Genome Browsers / Gene Diagrams
 - Circos Related
 
 - Database Access
 - Resources
- Becoming a Bioinformatician
 - Bioinformatics on GitHub
 - Sequencing
 - RNA-Seq
 - ChIP-Seq
 - YouTube Channels and Playlists
 - Blogs
 - Miscellaneous
 
 
- Online networking groups
 
File format in Bioinfomatics
This section explains some of the commonly used file formats in bioinformatics[3]
| File formats | File extensions | 
|---|---|
| FASTA | .fa, .fasta, .fsa | 
| FASTQ | .fastq, .sanfastq, .fq | 
| SAM
 (Sequence Alignment Map)  | 
file.sam | 
| BAM | file.bam | 
| VCF
 (Variant Calling Format/File)  | 
file.vcf | 
| GFF
 (General Feature Format or Gene Finding Format)  | 
file.gff2, file. gff3, file.gff | 
| GTF
 (Gene Transfer format)  | 
file.gtf | 
Usufull Tutorial Link
- https://vcru.wisc.edu/simonlab/bioinformatics/programs/
 - https://github.com/danielecook/Awesome-Bioinformatics
 - https://mybiosoftware.com/
 - https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/
 - Cluster system software modules - https://bioinformatics.uconn.edu/cbc_software/software-2/
 
We can use BIOConda [4]
Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used
Lib and sources
| Libraries | Mamba or manual | Description | References | 
|---|---|---|---|
| , TopHat, cufflinks, bedtools, T-COFFEE, mafft, maq, muscle, phyml, primer3,
 probcons, sim4, tigr-glimmer, amap-align, dialign, emboss, exonerate, kalign, CNVnator, CREST, CAP3, Cluster, FastQC, Fastx-toolkit, IGVTools, MACS, Meerkat, RNAcode, RNAz, RepeatMasker, SNVMix2, SOAPdenovo2-src, VarScan, ViennaRNA,bismark, blat, circos, clustalw, clustalx, cnD, cpc, fasta, gmap-gsnap, lobstr, meme, miRDP, mirdeep2, picard-tools, polyphen, rseq, seqtk-master, sickle-master, snpEff, soap  | |||
| meme[5] | Mamba | ||
| BWA | Mamba | BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. | |
| samtools | Mamba | ||
| ncbi-blast+[6] | Manual | ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML). | https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html
 https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/  | 
| somatic-sniper | Mamba | a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. | |
| breakdancer | Mamba | a package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads | |
| tigra-sv[7] | Manual | a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI | https://bioinformatics.mdanderson.org/public-software/archive/tigra/ | 
| Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality[8] | Not support Python 3 | ||
| HISAT2 | HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. | ||
| rSeq: RNA-Seq Analyzer | https://jhui2014.github.io/rseq/ | On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src | |
| SNVMix2 | https://github.com/shahcompbio/snvmix | imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$ | |
| Samtools | SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments | ||
| Breakdancer | BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use make. The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process
  | 
||
References
- ↑ https://www.youtube.com/watch?v=ky1-mF0fHnQ
 - ↑ https://github.com/danielecook/Awesome-Bioinformatics
 - ↑ https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/file-formats-tutorial/
 - ↑ https://bioconda.github.io/index.html
 - ↑ https://meme-suite.org/meme/doc/install.html?man_type=web
 - ↑ https://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html
 - ↑ https://bioinformatics.mdanderson.org/public-software/archive/tigra/
 - ↑ https://ccb.jhu.edu/software/tophat/index.shtml