Privatre:bioinfomatics: Difference between revisions
Line 116: | Line 116: | ||
| | | | ||
| | | | ||
| | | | ||
|- | |- | ||
|meme<ref>https://meme-suite.org/meme/doc/install.html?man_type=web</ref> | |meme<ref>https://meme-suite.org/meme/doc/install.html?man_type=web</ref> | ||
Line 174: | Line 164: | ||
|- | |- | ||
|[https://daehwankimlab.github.io/hisat2/ HISAT2] | |[https://daehwankimlab.github.io/hisat2/ HISAT2] | ||
| | |Mamba | ||
|'''HISAT2''' is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. | |'''HISAT2''' is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. | ||
| | | | ||
Line 183: | Line 173: | ||
(231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al | (231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al | ||
|https://github.com/cole-trapnell-lab/cufflinks | |https://github.com/cole-trapnell-lab/cufflinks | ||
|- | |||
|bedtools | |||
|Mamba | |||
| | |||
| | |||
|- | |||
|T-COFFEE | |||
|Mamba | |||
| | |||
| | |||
|- | |||
|mafft | |||
|Mamba | |||
| | |||
|, mafft, maq, muscle, phyml, primer3, | |||
probcons, sim4, tigr-glimmer, amap-align, dialign, emboss, exonerate, kalign, CNVnator, | |||
CREST, CAP3, Cluster, FastQC, Fastx-toolkit, IGVTools, MACS, Meerkat, RNAcode, | |||
RNAz, RepeatMasker, SNVMix2, SOAPdenovo2-src, VarScan, ViennaRNA,bismark, blat, | |||
circos, clustalw, clustalx, cnD, cpc, fasta, gmap-gsnap, lobstr, meme, miRDP, mirdeep2, | |||
picard-tools, polyphen, rseq, seqtk-master, sickle-master, snpEff, soap | |||
|- | |||
|maq | |||
|Manual | |||
|Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. | |||
hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$ | |||
|<nowiki>https://maq.sourceforge.net/maq-man.shtml</nowiki> | |||
<nowiki>https://mybiosoftware.com/sim4-20030613-align-expressed-dna-sequence-genomic-sequence.html</nowiki> | |||
https://mybiosoftware.com/maq-0-7-1-mapping-assembly-qualities.html | |||
Why do you need MAQ? Its latest version is more than 10 years old - I think you would be better of using some newer program. | |||
MAQ is really old, and by now it has problems compiling with current compilers. You can use the <code>fpermissive</code> flag to get it to compile:<ref>https://www.biostars.org/p/353144/</ref> | |||
<code>make CFLAGS="-Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -fpermissive" CXXFLAGS="-Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -fpermissive"</code> | |||
Note I took the <code>CFLAGS</code> and <code>CXXFLAGS</code> from the Makefile, and appended <code>-fpermissive</code> to them. Your <code>CFLAGS</code> and <code>CXXFLAGS</code> may be different, check them before issuing make. | |||
Three executables, `'''maq'''<nowiki/>', `'''maq.pl'''<nowiki/>' and `'''farm-run.pl'''<nowiki/>', will be copied to '''/usr/local/bin''' by default. | |||
|- | |- | ||
|'''rSeq: RNA-Seq Analyzer''' | |'''rSeq: RNA-Seq Analyzer''' |
Revision as of 21:25, 16 December 2023
Bioinfomatics
A combined technologies with biology, computer science, mathmatics and statistics. [1]
Bioinfomatics workflow steps
- quality control assessmemt steps
- sequence alignment
- data summarization into genes/regions
- data annotation to genomics features
- statistical comparisons
- mutltiomic ingetration
Bioinfomatics curated software list[2]
- Package suites
- Data Tools
- Downloading
- Compressing
- Data Processing
- Command Line Utilities
- Next Generation Sequencing
- Workflow Managers
- Pipelines
- Sequence Processing
- Data Analysis
- Sequence Alignment
- Pairwise
- Multiple Sequence Alignment
- Clustering
- Quantification
- Variant Calling
- Structural variant callers
- BAM File Utilities
- VCF File Utilities
- GFF BED File Utilities
- Variant Simulation
- Variant Prediction/Annotation
- Python Modules
- Data
- Tools
- Assembly
- Annotation
- Long-read sequencing
- Long-read Assembly
- Visualization
- Genome Browsers / Gene Diagrams
- Circos Related
- Database Access
- Resources
- Becoming a Bioinformatician
- Bioinformatics on GitHub
- Sequencing
- RNA-Seq
- ChIP-Seq
- YouTube Channels and Playlists
- Blogs
- Miscellaneous
- Online networking groups
File format in Bioinfomatics
This section explains some of the commonly used file formats in bioinformatics[3]
File formats | File extensions |
---|---|
FASTA | .fa, .fasta, .fsa |
FASTQ | .fastq, .sanfastq, .fq |
SAM
(Sequence Alignment Map) |
file.sam |
BAM | file.bam |
VCF
(Variant Calling Format/File) |
file.vcf |
GFF
(General Feature Format or Gene Finding Format) |
file.gff2, file. gff3, file.gff |
GTF
(Gene Transfer format) |
file.gtf |
Usufull Tutorial Link
- https://vcru.wisc.edu/simonlab/bioinformatics/programs/
- https://github.com/danielecook/Awesome-Bioinformatics
- https://mybiosoftware.com/
- https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/
- Cluster system software modules - https://bioinformatics.uconn.edu/cbc_software/software-2/
We can use BIOConda [4]
Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used
Lib and sources
Libraries | Mamba or manual | Description | References |
---|---|---|---|
meme[5] | Mamba | ||
BWA | Mamba | BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. | |
samtools | Mamba | ||
ncbi-blast+[6] | Manual | ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML). | https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html
https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ |
somatic-sniper | Mamba | a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. | |
breakdancer | Mamba | a package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads | |
tigra-sv[7] | Manual | a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI | https://bioinformatics.mdanderson.org/public-software/archive/tigra/ |
Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality[8] | Not support Python 3 | ||
HISAT2 | Mamba | HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. | |
cufflinks[9] | Manual | manual install to prefix
(231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al |
https://github.com/cole-trapnell-lab/cufflinks |
bedtools | Mamba | ||
T-COFFEE | Mamba | ||
mafft | Mamba | , mafft, maq, muscle, phyml, primer3,
probcons, sim4, tigr-glimmer, amap-align, dialign, emboss, exonerate, kalign, CNVnator, CREST, CAP3, Cluster, FastQC, Fastx-toolkit, IGVTools, MACS, Meerkat, RNAcode, RNAz, RepeatMasker, SNVMix2, SOAPdenovo2-src, VarScan, ViennaRNA,bismark, blat, circos, clustalw, clustalx, cnD, cpc, fasta, gmap-gsnap, lobstr, meme, miRDP, mirdeep2, picard-tools, polyphen, rseq, seqtk-master, sickle-master, snpEff, soap | |
maq | Manual | Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.
hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$ |
https://maq.sourceforge.net/maq-man.shtml
https://mybiosoftware.com/sim4-20030613-align-expressed-dna-sequence-genomic-sequence.html https://mybiosoftware.com/maq-0-7-1-mapping-assembly-qualities.html
MAQ is really old, and by now it has problems compiling with current compilers. You can use the
Note I took the
|
rSeq: RNA-Seq Analyzer | https://jhui2014.github.io/rseq/ | On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src | |
SNVMix2 | https://github.com/shahcompbio/snvmix | imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$ | |
Samtools | SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments | ||
Breakdancer | BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use make . The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process
|
||
References
- ↑ https://www.youtube.com/watch?v=ky1-mF0fHnQ
- ↑ https://github.com/danielecook/Awesome-Bioinformatics
- ↑ https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/file-formats-tutorial/
- ↑ https://bioconda.github.io/index.html
- ↑ https://meme-suite.org/meme/doc/install.html?man_type=web
- ↑ https://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html
- ↑ https://bioinformatics.mdanderson.org/public-software/archive/tigra/
- ↑ https://ccb.jhu.edu/software/tophat/index.shtml
- ↑ http://cole-trapnell-lab.github.io/cufflinks/
- ↑ https://www.biostars.org/p/353144/