Privatre:bioinfomatics: Difference between revisions

Revision as of 10:11, 17 December 2023

Bioinfomatics

A combined technologies with biology, computer science, mathmatics and statistics. ^[1]

Bioinfomatics workflow steps

quality control assessmemt steps
sequence alignment
data summarization into genes/regions
data annotation to genomics features
statistical comparisons
mutltiomic ingetration

Bioinfomatics curated software list^[2]

Package suites
Data Tools
- Downloading
- Compressing
Data Processing
- Command Line Utilities
Next Generation Sequencing
- Workflow Managers
- Pipelines
- Sequence Processing
- Data Analysis
- Sequence Alignment
  - Pairwise
  - Multiple Sequence Alignment
  - Clustering
- Quantification
- Variant Calling
  - Structural variant callers
- BAM File Utilities
- VCF File Utilities
- GFF BED File Utilities
- Variant Simulation
- Variant Prediction/Annotation
- Python Modules
  - Data
  - Tools
- Assembly
- Annotation
Long-read sequencing
- Long-read Assembly
Visualization
- Genome Browsers / Gene Diagrams
- Circos Related
Database Access
Resources
- Becoming a Bioinformatician
- Bioinformatics on GitHub
- Sequencing
- RNA-Seq
- ChIP-Seq
- YouTube Channels and Playlists
- Blogs
- Miscellaneous

Online networking groups

File format in Bioinfomatics

This section explains some of the commonly used file formats in bioinformatics^[3]


File formats	File extensions
FASTA	.fa, .fasta, .fsa
FASTQ	.fastq, .sanfastq, .fq
SAM (Sequence Alignment Map)	file.sam
BAM	file.bam
VCF (Variant Calling Format/File)	file.vcf
GFF (General Feature Format or Gene Finding Format)	file.gff2, file. gff3, file.gff
GTF (Gene Transfer format)	file.gtf

Usufull Tutorial Link

We can use BIOConda ^[4]

Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used

Lib and sources


Libraries	Mamba or manual	Description	References

meme^[5]	Mamba
BWA	Mamba	BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.	https://bio-bwa.sourceforge.net/ https://wikis.utexas.edu/display/bioiteam/BWA
samtools	Mamba
ncbi-blast+^[6]	Manual	ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML).	https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/
somatic-sniper	Mamba	a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
breakdancer	Mamba	a package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads
tigra-sv^[7]	Manual	a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI	https://bioinformatics.mdanderson.org/public-software/archive/tigra/
~~TopHat~~		Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality^[8]	Not support Python 3
HISAT2	Mamba	HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
cufflinks^[9]	Manual	manual install to prefix (231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al	https://github.com/cole-trapnell-lab/cufflinks
bedtools	Mamba
T-COFFEE	Mamba
mafft	Mamba
maq	Manual	Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences. hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$	https://maq.sourceforge.net/maq-man.shtml https://mybiosoftware.com/sim4-20030613-align-expressed-dna-sequence-genomic-sequence.html https://mybiosoftware.com/maq-0-7-1-mapping-assembly-qualities.html Why do you need MAQ? Its latest version is more than 10 years old - I think you would be better of using some newer program. MAQ is really old, and by now it has problems compiling with current compilers. You can use the `fpermissive` flag to get it to compile:^[10] `make CFLAGS="-Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -fpermissive" CXXFLAGS="-Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -fpermissive"` Note I took the `CFLAGS` and `CXXFLAGS` from the Makefile, and appended `-fpermissive` to them. Your `CFLAGS` and `CXXFLAGS` may be different, check them before issuing make. Three executables, `maq', `maq.pl' and `farm-run.pl', will be copied to /usr/local/bin by default.
muscle,	Mamba
phyml	Mamba
, primer3,	Mamba
probcons,	Mamba
sim4,	Manual	(231216) hpcmate@223vmbase:~/biolib/compile/sim4/sim4.2012-10-10$ or	https://globin.bx.psu.edu/ftp/dist/sim4/https://globin.bx.psu.edu/html/docs/sim4.html
tigr-glimmer	mamba	tigr-glimmer
amap-align	???
dialign -> dialign2	mamba
emboss	mamba
exonerate	mamba
kalign2 & kalign3	mamba
CNVnator	mamba
CREST	mamba
CAP3	mamba
Cluster -> mmseqs2	mamba
Cluster	mamba
FastQC	mamba
fastx_toolkit	mamba
IGVTools	mamba
MACS -> macs2	mamba	Need Python < 3
Meerkat -> django-meerkat	pip	pip install django-meerkat
RNAcode	mamba
RNAz	mamba
RepeatMasker	mamba
SNVMix2	manual		https://github.com/shahcompbio/snvmix https://github.com/shahcompbio/snvmix/test/biolibs/gitbuild/snvmix, Version 0.11.8-r4
SOAPdenovo2-src	mamba	SOAPdenovo2	dependency -> samtool 0.1.9
VarScan	mamba
ViennaRNA	mamba
bismark	mamba
blat	mamba		https://kentinformatics.com/
circos	mamba		error : Is a directory: '/opt/anaconda/envs/231216/README' -> remov README directory
clustalw	mamba		ClustalW, the command line version of clustalx ClustalW2 is a general purpose DNA or protein multiple sequence alignment program for three or more sequences. For the alignment of two sequences please instead use our pairwise sequence alignment tools. The ClustalW2 services have been retired. To access similar services, please visit the Multiple Sequence Alignment tools page. For protein alignments we recommend Clustal Omega. For DNA alignments we recommend trying MUSCLE or MAFFT. If you have any questions/concerns please contact us via the feedback link above.
clustalx	need X window		https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal ClustalX, the graphical interface, is available in the Bioinformatics menu
cnD	manual install		https://mybiosoftware.com/cnd-1-2-copy-number-variant-caller-inbred-strains.html cnD (Copy number variant detection) is a program to detect copy number variants from short read sequence data. How to install - https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/cnd.htm imp@CGX-GPU:~/test/bioinfomatics/cnD/cnD$
cpc -> CPC2	mamba		https://github.com/biocoder/cpc
fasta
gmap-gsnap
lobstr
meme
miRDP
mirdeep2
picard-tools
polyphen
rseq
seqtk-master
sickle-master
snpEff
soap
rSeq: RNA-Seq Analyzer		https://jhui2014.github.io/rseq/	On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src
SNVMix2		https://github.com/shahcompbio/snvmix	imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$
Samtools		SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments	https://samtools.sourceforge.net/ https://sourceforge.net/projects/samtools/files/samtools/ https://github.com/samtools/samtools/blob/develop/INSTALL
Breakdancer		BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use `make`. The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process `# --recursive option is important so that it gets the submodules too $ git clone --recursive https://github.com/genome/breakdancer.git`	https://github.com/genome/breakdancer/tree/master https://breakdancer.sourceforge.net/https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/breakdancer.htm https://github.com/shendurelab/LACHESIS/issues/30

References

[1] ttps://www.youtube.com/watch?v=ky1-mF0fHnQ

[2] ttps://github.com/danielecook/Awesome-Bioinformatics

[3] ttps://bioinformatics.uconn.edu/resources-and-events/tutorials-2/file-formats-tutorial/

[4] ttps://bioconda.github.io/index.html

[5] ttps://meme-suite.org/meme/doc/install.html?man_type=web

[6] ttps://mybiosoftware.com/scalablast-multiprocessor-implementation-ncbi-blast-library.html

[7] ttps://bioinformatics.mdanderson.org/public-software/archive/tigra/

[8] ttps://ccb.jhu.edu/software/tophat/index.shtml

[9] ttp://cole-trapnell-lab.github.io/cufflinks/

[10] ttps://www.biostars.org/p/353144/

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

@@ Line 315: / Line 315: @@
 |-
 |RNAcode
-|
+|mamba
 |
 |
 |-
 |RNAz
+|mamba
 |
+|
+|-
+|RepeatMasker
+|mamba
 |
 |
 |-
-|RepeatMasker
+|SNVMix2
+|manual
 |
+|<nowiki>https://github.com/shahcompbio/snvmix</nowiki>
+<nowiki>https://github.com/shahcompbio/snvmix/test/biolibs/gitbuild/snvmix</nowiki>, Version 0.11.8-r4
+|-
+|SOAPdenovo2-src
+|mamba
+|SOAPdenovo2
+|dependency -> samtool 0.1.9
+|-
+|VarScan
+|mamba
 |
 |
 |-
-|SNVMix2
+|ViennaRNA
+|mamba
+|
 |
+|-
+|bismark
+|mamba
 |
 |
 |-
-|SOAPdenovo2-src
+|blat
+|mamba
 |
+|<nowiki>https://kentinformatics.com/</nowiki>
+|-
+|circos
+|mamba
 |
+|error : Is a directory: '/opt/anaconda/envs/231216/README' -> remov README directory
+|-
+|clustalw
+|mamba
 |
+|ClustalW, the command line version of clustalx
+ClustalW2 is a general purpose DNA or protein multiple sequence alignment program for three or more sequences. For the alignment of two sequences please instead use our pairwise sequence alignment tools.
+The ClustalW2 services have been retired. To access similar services, please visit the Multiple Sequence Alignment tools page. For protein alignments we recommend Clustal Omega. For DNA alignments we recommend trying MUSCLE or MAFFT. If you have any questions/concerns please [[contact]] us via the [[feedback]] link above.
 |-
-|VarScan
+|clustalx
+|need X window
 |
+|<nowiki>https://vcru.wisc.edu/simonlab/bioinformatics/programs/#clustal</nowiki>
+ClustalX, the graphical interface, is available in the Bioinformatics menu
+|-
+|cnD
+|manual install
 |
+|<nowiki>https://mybiosoftware.com/cnd-1-2-copy-number-variant-caller-inbred-strains.html</nowiki>
+cnD (Copy number variant detection) is a program to detect copy number variants from short read sequence data.
+How to install
+- <nowiki>https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/cnd.htm</nowiki>
+imp@CGX-GPU:~/test/bioinfomatics/cnD/cnD$
+|-
+|cpc -> CPC2
+|mamba
 |
+|https://github.com/biocoder/cpc
 |-
-|ViennaRNA
+|fasta
 |
 |
 |
 |-
-|bismark
+|gmap-gsnap
 |
 |
 |
 |-
-|blat
+|lobstr
 |
 |
 |
 |-
-|circos
+|meme
 |
 |
 |
 |-
-|clustalw
+|miRDP
 |
 |
 |
 |-
-|clustalx
+|mirdeep2
 |
 |
 |
 |-
-|cnD
+|picard-tools
 |
 |
 |
 |-
-|cpc
+|polyphen
 |
 |
 |
 |-
-|fasta
+|rseq
 |
 |
 |
 |-
-|gmap-gsnap
+|seqtk-master
 |
 |
 |
 |-
-|lobstr
+|sickle-master
 |
 |
 |
 |-
-|meme
+|snpEff
 |
 |
 |
 |-
-|, , ,
+|soap
-, , , , , ,, ,
-, , , , , , , , , miRDP, mirdeep2,
-picard-tools, polyphen, rseq, seqtk-master, sickle-master, snpEff, soap
 |
 |

Privatre:bioinfomatics: Difference between revisions

Revision as of 10:11, 17 December 2023

Contents

Bioinfomatics

Bioinfomatics curated software list^[2]

File format in Bioinfomatics

Usufull Tutorial Link

We can use BIOConda ^[4]

Lib and sources

References

Navigation menu

Privatre:bioinfomatics: Difference between revisions

Revision as of 10:11, 17 December 2023

Bioinfomatics

Bioinfomatics curated software list[2]

File format in Bioinfomatics

Usufull Tutorial Link

We can use BIOConda [4]

Lib and sources

References

Navigation menu

Search

Bioinfomatics curated software list^[2]

We can use BIOConda ^[4]