Privatre:bioinfomatics: Difference between revisions

Latest revision as of 10:48, 19 December 2023

Bioinfomatics

A combined technologies with biology, computer science, mathmatics and statistics. ^[1]

Bioinfomatics workflow steps

quality control assessmemt steps
sequence alignment
data summarization into genes/regions
data annotation to genomics features
statistical comparisons
mutltiomic ingetration

Bioinfomatics curated software list^[2]

Package suites
Data Tools
- Downloading
- Compressing
Data Processing
- Command Line Utilities
Next Generation Sequencing
- Workflow Managers
- Pipelines
- Sequence Processing
- Data Analysis
- Sequence Alignment
  - Pairwise
  - Multiple Sequence Alignment
  - Clustering
- Quantification
- Variant Calling
  - Structural variant callers
- BAM File Utilities
- VCF File Utilities
- GFF BED File Utilities
- Variant Simulation
- Variant Prediction/Annotation
Tools for Assessment of Variants
- PolyPhen-2 is a tool for predicting the effect of an amino acid substitution on protein structure and function, based on comparative genomics and experimentally determined protein structures. It is available as a web service, and can also be downloaded as a standalone application.
- SNPtrack is a simple interface for mutation mapping and identifying causal mutations from whole-genome sequencing studies. It is available as a web service.
Tools for Mass Spectrometry and Proteomics
- MS-BLAST is a tool for searching protein sequences identified with tandem mass spectrometry against databases of protein sequences. It is available as a web service and as a standalone software.

Tools for Statistical Genetics
- Joint Likelihood Mapping (JLIM) is a tool to test for shared genetic effect between two genetic association data, for example, a disease GWAS study and gene expression QTL (eQTL) study.
- Joint Likelihood Mapping 2 (JLIM_2.0) is a version of JLIM which supports meta-analysis across more than one cohort of matching ancestry.
- Joint Likelihood Mapping (JLIM) 2.5 is a new version of JLIM based on summary statistics.
- NPS is a tool for polygenic risk scoring based on partitioning-based non-parametric shrinkage algorithm.
- RVTT is a novel statistical test of trend that assesses the relationship of the frequency of qualifying rare variants in a pathway with dichotomous disease phenotypes leveraging the Cochran-Armitage test statistic.

Tools for Cancer Genomics
- MutPanning is designed to detect rare cancer driver genes from aggregated whole-exome sequencing data.
- CBaSE enables cancer type and gene-specific estimation of the strength of negative and positive selection. It is available as a browser-based tool as well as for download as a standalone package.

Tools for Population Genetics
- simDoSe is a fast and flexible Wright-Fisher simulator for arbitrary diploid selection evolving through realistic human demography.

Python Modules
- Data
- Tools
Assembly
Annotation
- Roulette is a mutation rate model identifying the mutagenic effect of Polymerase III transcription at the basepair resolution.
- s_het are gene-based estimates of selection strength.

Long-read sequencing
- Long-read Assembly
Visualization
- Genome Browsers / Gene Diagrams
- Circos Related
Database Access
Resources
- Becoming a Bioinformatician
- Bioinformatics on GitHub
- Sequencing
- RNA-Seq
- ChIP-Seq
- YouTube Channels and Playlists
- Blogs
- Miscellaneous

Online networking groups

File format in Bioinfomatics

This section explains some of the commonly used file formats in bioinformatics^[3]


File formats	File extensions
FASTA	.fa, .fasta, .fsa
FASTQ	.fastq, .sanfastq, .fq
SAM (Sequence Alignment Map)	file.sam
BAM	file.bam
VCF (Variant Calling Format/File)	file.vcf
GFF (General Feature Format or Gene Finding Format)	file.gff2, file. gff3, file.gff
GTF (Gene Transfer format)	file.gtf

Usufull Tutorial Link

We can use BIOConda ^[4]

Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used

Lib and sources

Key online URLs


Libraries (Python 3.9)	Mamba or manual	Deps	Description	References
BWA	Mamba		BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.	https://bio-bwa.sourceforge.net/ https://wikis.utexas.edu/display/bioiteam/BWA
samtools	Mamba
ncbi-blast+^[5]	Manual	polyphen2	ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML).	https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/ http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=Download https://vcru.wisc.edu/simonlab/bioinformatics/programs/#blastplus setup rsync -avz polyphen-2.2.2/precomputed/* polyphen-2.2.3/precomputed/ others$ wget ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/2.9.0/ncbi-blast-2.9.0+-x64-linux.tar.gz $ tar vxaf ncbi-blast-2.9.0+-x64-linux.tar.gz
somatic-sniper	Mamba		a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
breakdancer	Mamba		a package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads
tigra-sv^[6]	Manual		a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI	https://bioinformatics.mdanderson.org/public-software/archive/tigra/
~~TopHat~~-> hisat2	NA		Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality^[7]	Not support Python 3
HISAT2	Mamba		HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
cufflinks^[8]	Manual		manual install to prefix (231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al	https://github.com/cole-trapnell-lab/cufflinks
bedtools	Mamba
T-COFFEE	Mamba
mafft	Mamba
maq	Manual		Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$	https://maq.sourceforge.net/maq-man.shtml https://mybiosoftware.com/sim4-20030613-align-expressed-dna-sequence-genomic-sequence.html https://mybiosoftware.com/maq-0-7-1-mapping-assembly-qualities.html Why do you need MAQ? Its latest version is more than 10 years old - I think you would be better of using some newer program. MAQ is really old, and by now it has problems compiling with current compilers. You can use the `fpermissive` flag to get it to compile:^[9] `make CFLAGS="-Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -fpermissive" CXXFLAGS="-Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -fpermissive"` Note I took the `CFLAGS` and `CXXFLAGS` from the Makefile, and appended `-fpermissive` to them. Your `CFLAGS` and `CXXFLAGS` may be different, check them before issuing make. Three executables, `maq', `maq.pl' and `farm-run.pl', will be copied to /usr/local/bin by default.
muscle,	Mamba
phyml	Mamba
primer3	Mamba
probcons	Mamba
sim4	Manual		(231216) hpcmate@223vmbase:~/biolib/compile/sim4/sim4.2012-10-10$ or	https://globin.bx.psu.edu/ftp/dist/sim4/https://globin.bx.psu.edu/html/docs/sim4.html
tigr-glimmer	mamba		tigr-glimmer
amap-align	mamba		AMAP is a multiple sequence alignment program based on sequence annealing	https://github.com/mes5k/amap-align
dialign -> dialign2	mamba
emboss	mamba
exonerate	mamba
kalign2 & kalign3	mamba
CNVnator	mamba
CREST	mamba
CAP3	mamba
Cluster -> mmseqs2	mamba
Cluster	mamba
FastQC	mamba
fastx_toolkit	mamba
IGVTools	mamba
MACS -> macs2	mamba		Need Python < 3
Meerkat -> django-meerkat	pip		pip install django-meerkat
RNAcode	mamba
RNAz	mamba
RepeatMasker	mamba
SNVMix2	manual			https://github.com/shahcompbio/snvmix https://github.com/shahcompbio/snvmix/test/biolibs/gitbuild/snvmix, Version 0.11.8-r4
SOAPdenovo2-src	mamba		SOAPdenovo2	dependency -> samtool 0.1.9
VarScan	mamba
ViennaRNA	mamba
bismark	mamba
blat	mamba	polyphen2		https://kentinformatics.com/Blat tools are necessary in order to analyze variants in novel, unannotated or otherwise non-standard genes and proteins. Note that PolyPhen-2 uses UCSC hg19 database as the reference source of all gene annotations and UniProtKB for protein sequences and annotations. If you want to analyze genes/proteins from a different source (e.g., RefSeq or Ensembl) this would also require Blat tools. Instructions for downloading Blat sources and executables can be found here: http://genome.ucsc.edu/FAQ/FAQblat.html#blat3 Complete set of binary executables for 64-bit Linux is available here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ PolyPhen-2 only needs the following three files: blat twoBitToFa bigWigToWig
circos	mamba			error : Is a directory: '/opt/anaconda/envs/231216/README' -> remov README directory
clustalw (=clustalW2)	mamba		ClustalW2 - Multiple Sequence Alignment^[10]	ClustalW, the command line version of clustalx ClustalW2 is a general purpose DNA or protein multiple sequence alignment program for three or more sequences. For the alignment of two sequences please instead use our pairwise sequence alignment tools. The ClustalW2 services have been retired. To access similar services, please visit the Multiple Sequence Alignment tools page. For protein alignments we recommend Clustal Omega. For DNA alignments we recommend trying MUSCLE or MAFFT. If you have any questions/concerns please contact us via the feedback link above.
clustalx -> NA	need X window		Multiple Sequence Alignment, Graphic interface	https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal ClustalX, the graphical interface, is available in the Bioinformatics menu
cnD	manual install		(231216) hpcmate@223vmbase:~/biolib/compile/cnD$	https://mybiosoftware.com/cnd-1-2-copy-number-variant-caller-inbred-strains.html cnD (Copy number variant detection) is a program to detect copy number variants from short read sequence data. How to install - https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/cnd.htm imp@CGX-GPU:~/test/bioinfomatics/cnD/cnD$
cpc -> CPC2	mamba			https://github.com/biocoder/cpc
fasta -> fasta3	mamba		The FASTA package - protein and DNA sequence similarity searching and alignment programs
gmap-gsnap -> gmap	mamba		gmap & gsnap packages
lobstr	manual		lobSTR is a tool for profiling Short Tandem Repeats (STRs) from high throughput sequencing data. version 4.0.4 hpcmate@223vmbase:~/biolib/compile/lobstr-code$	https://github.com/gymreklab/lobstr-code/blob/master/INSTALL`sudo apt install libgsl-dev autotools-dev libboost-all-dev libgsl-dev pkg-config zlib1g-dev zlib1g` .configure make
meme^[11]	mamba			https://meme-suite.org/meme/For Linux 64, Open MPI is built with CUDA awareness but this support is disabled by default. To enable it, please set the environment variable OMPI_MCA_opal_cuda_support=true before launching your MPI processes. Equivalently, you can set the MCA parameter in the command line: mpiexec --mca opal_cuda_support 1 ... In addition, the UCX support is also built but disabled by default. To enable it, first install UCX (conda install -c conda-forge ucx). Then, set the environment variables OMPI_MCA_pml="ucx" OMPI_MCA_osc="ucx" before launching your MPI processes. Equivalently, you can set the MCA parameters in the command line: mpiexec --mca pml ucx --mca osc ucx ... Note that you might also need to set UCX_MEMTYPE_CACHE=n for CUDA awareness via UCX. Please consult UCX's documentation for detail.
miRDP-> mirdeep2 2.0.1.3	mamba		miRDeep-P (miRDP) is a tool which can be used to detecting miRNAs in plants from deeply sequenced small RNA libraries. It was developed by modifying miRDeep, which is based on a probabilistic model of miRNA biogenesis in animals, with a plant-specific scoring system and filtering criteria. miRDP2 is adopted from miRDeep-P (miRDP) with new strategies and overhauled algorithm.
mirdeep-p2 1.1.4	mamba		A fast and accurate tool for analyzing the miRNA transcriptome in plants
mirdeep2	mamba
picard-tools -> picard 3.1.1	mamba		Java tools for working with NGS data in the BAM format conda	source : https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/picard.htm
polyphen	Manual		PolyPhen (Polymorphism Phenotyping) is a tool which predicts possible impact of an amino acid substitution on the structure and function of a human protein using straightforward physical and comparative considerations. Need at least 70 GB	http://genetics.bwh.harvard.edu/pph2/dokuwiki/downloads https://sunyaevlab.hms.harvard.edu/wiki/!web/software175 GB of free disk space Perl, Java, Perl is required to run PolyPhen-2. Minimal version is 5.14.2; version 5.30.0 was the latest one successfully tested, required perl modules, search by apt : apt search perl \| grep -i XML::Simple -B 3 List::Util <<< liblist-allutils-perl XML::Simple. libxml-opml-simplegen-perl DBD::SQLite. libdbd-sqlite3-perl CGI.pm. sudo apt-get install libscalar-list-utils-perl libxml-simple-perl libdbd-sqlite3-perl libcgi-pm-perl build-essential default-jre bioperl NCBI BLAST+ 2.9.0+ ~ 2.10.0+ : Precompiled NCBI BLAST+ binary executables for several platforms can be downloaded here: ftp://ftp.ncbi.nih.gov/blast/executables/LATEST to BLAST+ binaries need to be installed into $PPH/blast/bin/ Blat tools are necessary in order to analyze variants in novel, unannotated or otherwise non-standard genes and proteins downloading Blat sources and executables can be found here: http://genome.ucsc.edu/FAQ/FAQblat.html#blat3 Complete set of binary executables for 64-bit Linux is available here: http://hgdownload.cse.ucsc.edu/admin/exe/linux.x86_64/ PolyPhen-2 only needs the following three files: blat, twoBitToFa, bigWigToWig $ cp blat twoBitToFa bigWigToWig $PPH/bin/
rseq	Manual
seqtk-master -> seqtk	mamba
sickle-master -> sickle	mamba
snpEff	mamba
soap	pip
rSeq: RNA-Seq Analyzer	Manual		https://jhui2014.github.io/rseq/	On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src
SNVMix2	Manual		https://github.com/shahcompbio/snvmix	imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$
Samtools	mamba		SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments	https://samtools.sourceforge.net/ https://sourceforge.net/projects/samtools/files/samtools/ https://github.com/samtools/samtools/blob/develop/INSTALL
Breakdancer	mamba		BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use `make`. The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process `# --recursive option is important so that it gets the submodules too $ git clone --recursive https://github.com/genome/breakdancer.git`	https://github.com/genome/breakdancer/tree/master https://breakdancer.sourceforge.net/https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/breakdancer.htm https://github.com/shendurelab/LACHESIS/issues/30
`vcftools`	mamba		-c bioconda

References

[1] ttps://www.youtube.com/watch?v=ky1-mF0fHnQ

[2] ttps://github.com/danielecook/Awesome-Bioinformatics

[3] ttps://bioinformatics.uconn.edu/resources-and-events/tutorials-2/file-formats-tutorial/

[4] ttps://bioconda.github.io/index.html

[5] ttps://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html

[6] ttps://bioinformatics.mdanderson.org/public-software/archive/tigra/

[7] ttps://ccb.jhu.edu/software/tophat/index.shtml

[8] ttp://cole-trapnell-lab.github.io/cufflinks/

[9] ttps://www.biostars.org/p/353144/

[10] ttps://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal

[11] ttps://meme-suite.org/meme/doc/install.html?man_type=web

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

Privatre:bioinfomatics: Difference between revisions

Latest revision as of 10:48, 19 December 2023

Contents

Bioinfomatics

Bioinfomatics curated software list^[2]

File format in Bioinfomatics

Usufull Tutorial Link

We can use BIOConda ^[4]

Lib and sources

Key online URLs

References

Navigation menu

Privatre:bioinfomatics: Difference between revisions

Latest revision as of 10:48, 19 December 2023

Bioinfomatics

Bioinfomatics curated software list[2]

File format in Bioinfomatics

Usufull Tutorial Link

We can use BIOConda [4]

Lib and sources

Key online URLs

References

Navigation menu

Search

Bioinfomatics curated software list^[2]

We can use BIOConda ^[4]