Privatre:bioinfomatics: Difference between revisions
Line 410: | Line 410: | ||
|manual | |manual | ||
|lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data. | |lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data. | ||
|https://github.com/gymreklab/lobstr-code/blob/master/INSTALL | |||
version 4.0.4 | |||
hpcmate@223vmbase:~/biolib/compile/lobstr-code$ | |||
|https://github.com/gymreklab/lobstr-code/blob/master/INSTALL<code>sudo apt install libgsl-dev autotools-dev libboost-all-dev libgsl-dev pkg-config zlib1g-dev zlib1g</code> | |||
.configure | |||
make | |||
|- | |- | ||
|meme | |meme | ||
Line 433: | Line 440: | ||
Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX's documentation for detail. | Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX's documentation for detail. | ||
|- | |- | ||
|miRDP | |miRDP-> '''mirdeep2''' <small>2.0.1.3</small> | ||
| | |mamba | ||
|miRDeep-P (miRDP) is a tool which can be used to detecting miRNAs in plants from deeply sequenced small RNA libraries. It was developed by modifying miRDeep, which is based on a probabilistic model of miRNA biogenesis in animals, with a plant-specific scoring system and filtering criteria. | |||
miRDP2 is adopted from miRDeep-P (miRDP) with new strategies and overhauled algorithm. | |||
| | | | ||
|- | |||
|'''mirdeep-p2''' <small>1.1.4</small> | |||
|mamba | |||
|A fast and accurate tool for analyzing the miRNA transcriptome in plants | |||
| | | | ||
|- | |- | ||
|mirdeep2 | |mirdeep2 | ||
| | |mamba | ||
| | | | ||
| | | | ||
|- | |- | ||
|picard-tools | |picard-tools -> '''picard 3.1.1''' | ||
| | |mamba | ||
| | |Java tools for working with NGS data in the BAM format | ||
| | conda | ||
|source : https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/picard.htm | |||
|- | |- | ||
|polyphen | |polyphen |
Revision as of 12:55, 17 December 2023
Bioinfomatics
A combined technologies with biology, computer science, mathmatics and statistics. [1]
Bioinfomatics workflow steps
- quality control assessmemt steps
- sequence alignment
- data summarization into genes/regions
- data annotation to genomics features
- statistical comparisons
- mutltiomic ingetration
Bioinfomatics curated software list[2]
- Package suites
- Data Tools
- Downloading
- Compressing
- Data Processing
- Command Line Utilities
- Next Generation Sequencing
- Workflow Managers
- Pipelines
- Sequence Processing
- Data Analysis
- Sequence Alignment
- Pairwise
- Multiple Sequence Alignment
- Clustering
- Quantification
- Variant Calling
- Structural variant callers
- BAM File Utilities
- VCF File Utilities
- GFF BED File Utilities
- Variant Simulation
- Variant Prediction/Annotation
- Python Modules
- Data
- Tools
- Assembly
- Annotation
- Long-read sequencing
- Long-read Assembly
- Visualization
- Genome Browsers / Gene Diagrams
- Circos Related
- Database Access
- Resources
- Becoming a Bioinformatician
- Bioinformatics on GitHub
- Sequencing
- RNA-Seq
- ChIP-Seq
- YouTube Channels and Playlists
- Blogs
- Miscellaneous
- Online networking groups
File format in Bioinfomatics
This section explains some of the commonly used file formats in bioinformatics[3]
File formats | File extensions |
---|---|
FASTA | .fa, .fasta, .fsa |
FASTQ | .fastq, .sanfastq, .fq |
SAM
(Sequence Alignment Map) |
file.sam |
BAM | file.bam |
VCF
(Variant Calling Format/File) |
file.vcf |
GFF
(General Feature Format or Gene Finding Format) |
file.gff2, file. gff3, file.gff |
GTF
(Gene Transfer format) |
file.gtf |
Usufull Tutorial Link
- https://vcru.wisc.edu/simonlab/bioinformatics/programs/
- https://github.com/danielecook/Awesome-Bioinformatics
- https://mybiosoftware.com/
- https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/
- Cluster system software modules - https://bioinformatics.uconn.edu/cbc_software/software-2/
We can use BIOConda [4]
Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used
Lib and sources
Libraries | Mamba or manual | Description | References |
---|---|---|---|
meme[5] | Mamba | ||
BWA | Mamba | BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM. | |
samtools | Mamba | ||
ncbi-blast+[6] | Manual | ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML). | https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html
https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ |
somatic-sniper | Mamba | a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls. | |
breakdancer | Mamba | a package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads | |
tigra-sv[7] | Manual | a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI | https://bioinformatics.mdanderson.org/public-software/archive/tigra/ |
Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality[8] | Not support Python 3 | ||
HISAT2 | Mamba | HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome. | |
cufflinks[9] | Manual | manual install to prefix
(231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al |
https://github.com/cole-trapnell-lab/cufflinks |
bedtools | Mamba | ||
T-COFFEE | Mamba | ||
mafft | Mamba | ||
maq | Manual | Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.
hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$ |
https://maq.sourceforge.net/maq-man.shtml
https://mybiosoftware.com/sim4-20030613-align-expressed-dna-sequence-genomic-sequence.html https://mybiosoftware.com/maq-0-7-1-mapping-assembly-qualities.html
MAQ is really old, and by now it has problems compiling with current compilers. You can use the
Note I took the
|
muscle, | Mamba | ||
phyml | Mamba | ||
, primer3, | Mamba | ||
probcons, | Mamba | ||
sim4, | Manual | (231216) hpcmate@223vmbase:~/biolib/compile/sim4/sim4.2012-10-10$
or |
https://globin.bx.psu.edu/ftp/dist/sim4/https://globin.bx.psu.edu/html/docs/sim4.html |
tigr-glimmer | mamba | tigr-glimmer | |
amap-align | ??? | ||
dialign -> dialign2 | mamba | ||
emboss | mamba | ||
exonerate | mamba | ||
kalign2 & kalign3 | mamba | ||
CNVnator | mamba | ||
CREST | mamba | ||
CAP3 | mamba | ||
Cluster -> mmseqs2 | mamba | ||
Cluster | mamba | ||
FastQC | mamba | ||
fastx_toolkit | mamba | ||
IGVTools | mamba | ||
MACS -> macs2 | mamba | Need Python < 3 | |
Meerkat -> django-meerkat | pip | pip install django-meerkat | |
RNAcode | mamba | ||
RNAz | mamba | ||
RepeatMasker | mamba | ||
SNVMix2 | manual | https://github.com/shahcompbio/snvmix
https://github.com/shahcompbio/snvmix/test/biolibs/gitbuild/snvmix, Version 0.11.8-r4 | |
SOAPdenovo2-src | mamba | SOAPdenovo2 | dependency -> samtool 0.1.9 |
VarScan | mamba | ||
ViennaRNA | mamba | ||
bismark | mamba | ||
blat | mamba | https://kentinformatics.com/ | |
circos | mamba | error : Is a directory: '/opt/anaconda/envs/231216/README' -> remov README directory | |
clustalw
(=clustalW2) |
mamba | ClustalW2 - Multiple Sequence Alignment[11] | ClustalW, the command line version of clustalx
ClustalW2 is a general purpose DNA or protein multiple sequence alignment program for three or more sequences. For the alignment of two sequences please instead use our pairwise sequence alignment tools. The ClustalW2 services have been retired. To access similar services, please visit the Multiple Sequence Alignment tools page. For protein alignments we recommend Clustal Omega. For DNA alignments we recommend trying MUSCLE or MAFFT. If you have any questions/concerns please contact us via the feedback link above. |
clustalx | need X window | Multiple Sequence Alignment, Graphic interface | https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal
ClustalX, the graphical interface, is available in the Bioinformatics menu |
cnD | manual install | (231216) hpcmate@223vmbase:~/biolib/compile/cnD$ | https://mybiosoftware.com/cnd-1-2-copy-number-variant-caller-inbred-strains.html
cnD (Copy number variant detection) is a program to detect copy number variants from short read sequence data. How to install - https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/cnd.htm imp@CGX-GPU:~/test/bioinfomatics/cnD/cnD$ |
cpc -> CPC2 | mamba | https://github.com/biocoder/cpc | |
fasta -> fasta3 | mamba | The FASTA package - protein and DNA sequence similarity searching and alignment programs | |
gmap-gsnap -> gmap | mamba | gmap & gsnap packages | |
lobstr | manual | lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data.
hpcmate@223vmbase:~/biolib/compile/lobstr-code$ |
https://github.com/gymreklab/lobstr-code/blob/master/INSTALLsudo apt install libgsl-dev autotools-dev libboost-all-dev libgsl-dev pkg-config zlib1g-dev zlib1g
.configure make |
meme | mamba | https://meme-suite.org/meme/For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default.
To enable it, please set the environment variable OMPI_MCA_opal_cuda_support=true before launching your MPI processes. Equivalently, you can set the MCA parameter in the command line: mpiexec --mca opal_cuda_support 1 ... In addition, the UCX support is also built but disabled by default. To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment variables OMPI_MCA_pml="ucx" OMPI_MCA_osc="ucx" before launching your MPI processes. Equivalently, you can set the MCA parameters in the command line: mpiexec --mca pml ucx --mca osc ucx ... Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX's documentation for detail. | |
miRDP-> mirdeep2 2.0.1.3 | mamba | miRDeep-P (miRDP) is a tool which can be used to detecting miRNAs in plants from deeply sequenced small RNA libraries. It was developed by modifying miRDeep, which is based on a probabilistic model of miRNA biogenesis in animals, with a plant-specific scoring system and filtering criteria.
miRDP2 is adopted from miRDeep-P (miRDP) with new strategies and overhauled algorithm. |
|
mirdeep-p2 1.1.4 | mamba | A fast and accurate tool for analyzing the miRNA transcriptome in plants | |
mirdeep2 | mamba | ||
picard-tools -> picard 3.1.1 | mamba | Java tools for working with NGS data in the BAM format
conda |
source : https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/picard.htm |
polyphen | |||
rseq | |||
seqtk-master | |||
sickle-master | |||
snpEff | |||
soap | |||
rSeq: RNA-Seq Analyzer | https://jhui2014.github.io/rseq/ | On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src | |
SNVMix2 | https://github.com/shahcompbio/snvmix | imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$ | |
Samtools | SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments | ||
Breakdancer | BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use make . The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process
|
||
References
- ↑ https://www.youtube.com/watch?v=ky1-mF0fHnQ
- ↑ https://github.com/danielecook/Awesome-Bioinformatics
- ↑ https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/file-formats-tutorial/
- ↑ https://bioconda.github.io/index.html
- ↑ https://meme-suite.org/meme/doc/install.html?man_type=web
- ↑ https://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html
- ↑ https://bioinformatics.mdanderson.org/public-software/archive/tigra/
- ↑ https://ccb.jhu.edu/software/tophat/index.shtml
- ↑ http://cole-trapnell-lab.github.io/cufflinks/
- ↑ https://www.biostars.org/p/353144/
- ↑ https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal