Privatre:bioinfomatics

From HPCWIKI
Jump to navigation Jump to search

Bioinfomatics

A combined technologies with biology, computer science, mathmatics and statistics. [1]

Bioinfomatics workflow steps

  1. quality control assessmemt steps
  2. sequence alignment
  3. data summarization into genes/regions
  4. data annotation to genomics features
  5. statistical comparisons
  6. mutltiomic ingetration

Bioinfomatics curated software list[2]

  • Package suites
  • Data Tools
    • Downloading
    • Compressing
  • Data Processing
    • Command Line Utilities
  • Next Generation Sequencing
    • Workflow Managers
    • Pipelines
    • Sequence Processing
    • Data Analysis
    • Sequence Alignment
      • Pairwise
      • Multiple Sequence Alignment
      • Clustering
    • Quantification
    • Variant Calling
      • Structural variant callers
    • BAM File Utilities
    • VCF File Utilities
    • GFF BED File Utilities
    • Variant Simulation
    • Variant Prediction/Annotation
    • Python Modules
      • Data
      • Tools
    • Assembly
    • Annotation
  • Long-read sequencing
    • Long-read Assembly
  • Visualization
    • Genome Browsers / Gene Diagrams
    • Circos Related
  • Database Access
  • Resources
    • Becoming a Bioinformatician
    • Bioinformatics on GitHub
    • Sequencing
    • RNA-Seq
    • ChIP-Seq
    • YouTube Channels and Playlists
    • Blogs
    • Miscellaneous
  • Online networking groups

File format in Bioinfomatics

This section explains some of the commonly used file formats in bioinformatics[3]

File formats File extensions
FASTA .fa, .fasta, .fsa
FASTQ .fastq, .sanfastq, .fq
SAM

(Sequence Alignment Map)

file.sam
BAM file.bam
VCF

(Variant Calling Format/File)

file.vcf
GFF

(General Feature Format or Gene Finding Format)

file.gff2, file. gff3, file.gff
GTF

(Gene Transfer format)

file.gtf

Usufull Tutorial Link

We can use BIOConda [4]

Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used

Lib and sources

Libraries Mamba or manual Description References
, TopHat, cufflinks, bedtools, T-COFFEE, mafft, maq, muscle, phyml, primer3,

probcons, sim4, tigr-glimmer, amap-align, dialign, emboss, exonerate, kalign, CNVnator,

CREST, CAP3, Cluster, FastQC, Fastx-toolkit, IGVTools, MACS, Meerkat, RNAcode,

RNAz, RepeatMasker, SNVMix2, SOAPdenovo2-src, VarScan, ViennaRNA,bismark, blat,

circos, clustalw, clustalx, cnD, cpc, fasta, gmap-gsnap, lobstr, meme, miRDP, mirdeep2,

picard-tools, polyphen, rseq, seqtk-master, sickle-master, snpEff, soap

meme[5] Mamba
BWA Mamba BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.
samtools Mamba
ncbi-blast+[6] Manual ScalaBLAST is a high-performance multiprocessor implementation of the NCBI BLAST library. ScalaBLAST supports all 5 primary program types (blastn, blastp, tblastn, tblastx, and blastx) and several output formats (pairwise, tabular, or XML). https://blast.ncbi.nlm.nih.gov/doc/blast-help/downloadblastdata.html

https://vcru.wisc.edu/simonlab/bioinformatics/programs/install/blastplus.htm

https://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/

somatic-sniper Mamba a software for comparing tumor and normal pairs. The developer estimate its sensitivity and precision, and present several common sources of error resulting in miscalls.
breakdancer Mamba a  package that provides genome-wide detection of structural variants from next generation paired-end sequencing reads
tigra-sv[7] Manual a program that conducts targeted local assembly of structural variants (SV) using the iterative graph routing assembly (TIGRA) algorithm (L. Chen, unpublished). It takes as input a list of putative SV calls and a set of bam files that contain reads mapped to a reference genome such as NCBI https://bioinformatics.mdanderson.org/public-software/archive/tigra/
TopHat Please note that TopHat has entered a low maintenance, low support stage as it is now largely superseded by HISAT2 which provides the same core functionality[8] Not support Python 3
HISAT2 HISAT2 is a fast and sensitive alignment program for mapping next-generation sequencing reads (both DNA and RNA) to a population of human genomes as well as to a single reference genome.
rSeq: RNA-Seq Analyzer https://jhui2014.github.io/rseq/ On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src
SNVMix2 https://github.com/shahcompbio/snvmix imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$
Samtools SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments
Breakdancer BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use make. The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process

# --recursive option is important so that it gets the submodules too $ git clone --recursive https://github.com/genome/breakdancer.git

References