Privatre:bioinfomatics

From HPCWIKI
Revision as of 21:25, 16 December 2023 by Admin (talk | contribs) (→‎References)
Jump to navigation Jump to search

Bioinfomatics

A combined technologies with biology, computer science, mathmatics and statistics. [1]

Bioinfomatics workflow steps

  1. quality control assessmemt steps
  2. sequence alignment
  3. data summarization into genes/regions
  4. data annotation to genomics features
  5. statistical comparisons
  6. mutltiomic ingetration

Bioinfomatics curated software list[2]

  • Package suites
  • Data Tools
    • Downloading
    • Compressing
  • Data Processing
    • Command Line Utilities
  • Next Generation Sequencing
    • Workflow Managers
    • Pipelines
    • Sequence Processing
    • Data Analysis
    • Sequence Alignment
      • Pairwise
      • Multiple Sequence Alignment
      • Clustering
    • Quantification
    • Variant Calling
      • Structural variant callers
    • BAM File Utilities
    • VCF File Utilities
    • GFF BED File Utilities
    • Variant Simulation
    • Variant Prediction/Annotation
    • Python Modules
      • Data
      • Tools
    • Assembly
    • Annotation
  • Long-read sequencing
    • Long-read Assembly
  • Visualization
    • Genome Browsers / Gene Diagrams
    • Circos Related
  • Database Access
  • Resources
    • Becoming a Bioinformatician
    • Bioinformatics on GitHub
    • Sequencing
    • RNA-Seq
    • ChIP-Seq
    • YouTube Channels and Playlists
    • Blogs
    • Miscellaneous
  • Online networking groups

File format in Bioinfomatics

This section explains some of the commonly used file formats in bioinformatics[3]

File formats File extensions
FASTA .fa, .fasta, .fsa
FASTQ .fastq, .sanfastq, .fq
SAM

(Sequence Alignment Map)

file.sam
BAM file.bam
VCF

(Variant Calling Format/File)

file.vcf
GFF

(General Feature Format or Gene Finding Format)

file.gff2, file. gff3, file.gff
GTF

(Gene Transfer format)

file.gtf

Usufull Tutorial Link

We can use BIOConda [4]

Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used

Lib and sources

Libraries Mamba or manual Description References
meme[5] Mamba
BWA Mamba BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
samtools Mamba
ncbi-blast+[6] Manual ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML). https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html

https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm

https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/

somatic-sniper Mamba a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
breakdancer Mamba a  package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads
tigra-sv[7] Manual a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI https://bioinformatics.mdanderson.org/public-software/archive/tigra/
TopHat Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality[8] Not support Python 3
HISAT2 Mamba HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
cufflinks[9] Manual manual install to prefix

(231216) hpcmate@223vmbase:~/biolib/download/cufflinks-2.2.1.Linux_x86_64$ ls -al

https://github.com/cole-trapnell-lab/cufflinks
bedtools Mamba
T-COFFEE Mamba
mafft Mamba , mafft, maq, muscle, phyml, primer3,

probcons, sim4, tigr-glimmer, amap-align, dialign, emboss, exonerate, kalign, CNVnator,

CREST, CAP3, Cluster, FastQC, Fastx-toolkit, IGVTools, MACS, Meerkat, RNAcode,

RNAz, RepeatMasker, SNVMix2, SOAPdenovo2-src, VarScan, ViennaRNA,bismark, blat,

circos, clustalw, clustalx, cnD, cpc, fasta, gmap-gsnap, lobstr, meme, miRDP, mirdeep2,

picard-tools, polyphen, rseq, seqtk-master, sickle-master, snpEff, soap

maq Manual Maq stands for Mapping and Assembly with Quality It builds assembly by mapping short reads to reference sequences.

hpcmate@223vmbase:~/biolib/compile/maq/maq-0.7.1$

https://maq.sourceforge.net/maq-man.shtml

https://mybiosoftware.com/sim4-20030613-align-expressed-dna-sequence-genomic-sequence.html

https://mybiosoftware.com/maq-0-7-1-mapping-assembly-qualities.html


Why do you need MAQ? Its latest version is more than 10 years old - I think you would be better of using some newer program.

MAQ is really old, and by now it has problems compiling with current compilers. You can use the fpermissive flag to get it to compile:[10]

make CFLAGS="-Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -fpermissive" CXXFLAGS="-Wall -m64 -D_FASTMAP -DMAQ_LONGREADS -g -O2 -fpermissive"

Note I took the CFLAGS and CXXFLAGS from the Makefile, and appended -fpermissive to them. Your CFLAGS and CXXFLAGS may be different, check them before issuing make.


Three executables, `maq', `maq.pl' and `farm-run.pl', will be copied to /usr/local/bin by default.

rSeq: RNA-Seq Analyzer https://jhui2014.github.io/rseq/ On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src
SNVMix2 https://github.com/shahcompbio/snvmix imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$
Samtools SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments
Breakdancer BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use make. The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process

# --recursive option is important so that it gets the submodules too $ git clone --recursive https://github.com/genome/breakdancer.git

References