Privatre:bioinfomatics: Difference between revisions

From HPCWIKI
Jump to navigation Jump to search
No edit summary
Line 10: Line 10:
# statistical comparisons
# statistical comparisons
# mutltiomic ingetration  
# mutltiomic ingetration  
== Bioinfomatics curated software list<ref>https://github.com/danielecook/Awesome-Bioinformatics</ref> ==
* Package suites
* Data Tools
** Downloading
** Compressing
* Data Processing
** Command Line Utilities
* Next Generation Sequencing
** Workflow Managers
** Pipelines
** Sequence Processing
** Data Analysis
** Sequence Alignment
*** Pairwise
*** Multiple Sequence Alignment
*** Clustering
** Quantification
** Variant Calling
*** Structural variant callers
** BAM File Utilities
** VCF File Utilities
** GFF BED File Utilities
** Variant Simulation
** Variant Prediction/Annotation
** Python Modules
*** Data
*** Tools
** Assembly
** Annotation
* Long-read sequencing
** Long-read Assembly
* Visualization
** Genome Browsers / Gene Diagrams
** Circos Related
* Database Access
* Resources
** Becoming a Bioinformatician
** Bioinformatics on GitHub
** Sequencing
** RNA-Seq
** ChIP-Seq
** YouTube Channels and Playlists
** Blogs
** Miscellaneous
* Online networking groups
== File format in Bioinfomatics ==
This section explains some of the commonly used file formats in bioinformatics<ref>https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/file-formats-tutorial/</ref>
{| class="wikitable"
|+
!File formats
!File extensions
|-
|FASTA
|.fa, .fasta, .fsa
|-
|FASTQ
|.fastq, .sanfastq, .fq
|-
|SAM
(Sequence Alignment Map)
|file.sam
|-
|BAM
|file.bam
|-
|VCF
(Variant Calling Format/File)
|file.vcf
|-
|GFF
(General Feature Format or Gene Finding Format)
|file.gff2, file. gff3, file.gff
|-
|GTF
(Gene Transfer format)
|file.gtf
|}
== Usufull Tutorial Link ==
* https://vcru.wisc.edu/simonlab/bioinformatics/programs/
* https://github.com/danielecook/Awesome-Bioinformatics
* https://mybiosoftware.com/
*
* https://bioinformatics.uconn.edu/resources-and-events/tutorials-2/
* Cluster system software modules - https://bioinformatics.uconn.edu/cbc_software/software-2/
*


== We can use BIOConda <ref>https://bioconda.github.io/index.html</ref> ==
== We can use BIOConda <ref>https://bioconda.github.io/index.html</ref> ==
Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9  -> DLS38 can be used  
Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9  -> DLS38 can be used  


== Source compile Method ==
== Lib and sources ==
{| class="wikitable"
{| class="wikitable"
|+
|+
Line 20: Line 111:
!Description
!Description
!References
!References
|-
|'''rSeq: RNA-Seq Analyzer'''
|https://jhui2014.github.io/rseq/
|On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src
|-
|SNVMix2
|https://github.com/shahcompbio/snvmix
|imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$
|-
|-
|Samtools
|Samtools

Revision as of 17:30, 16 December 2023

Bioinfomatics

A combined technologies with biology, computer science, mathmatics and statistics. [1]

Bioinfomatics workflow steps

  1. quality control assessmemt steps
  2. sequence alignment
  3. data summarization into genes/regions
  4. data annotation to genomics features
  5. statistical comparisons
  6. mutltiomic ingetration

Bioinfomatics curated software list[2]

  • Package suites
  • Data Tools
    • Downloading
    • Compressing
  • Data Processing
    • Command Line Utilities
  • Next Generation Sequencing
    • Workflow Managers
    • Pipelines
    • Sequence Processing
    • Data Analysis
    • Sequence Alignment
      • Pairwise
      • Multiple Sequence Alignment
      • Clustering
    • Quantification
    • Variant Calling
      • Structural variant callers
    • BAM File Utilities
    • VCF File Utilities
    • GFF BED File Utilities
    • Variant Simulation
    • Variant Prediction/Annotation
    • Python Modules
      • Data
      • Tools
    • Assembly
    • Annotation
  • Long-read sequencing
    • Long-read Assembly
  • Visualization
    • Genome Browsers / Gene Diagrams
    • Circos Related
  • Database Access
  • Resources
    • Becoming a Bioinformatician
    • Bioinformatics on GitHub
    • Sequencing
    • RNA-Seq
    • ChIP-Seq
    • YouTube Channels and Playlists
    • Blogs
    • Miscellaneous
  • Online networking groups

File format in Bioinfomatics

This section explains some of the commonly used file formats in bioinformatics[3]

File formats File extensions
FASTA .fa, .fasta, .fsa
FASTQ .fastq, .sanfastq, .fq
SAM

(Sequence Alignment Map)

file.sam
BAM file.bam
VCF

(Variant Calling Format/File)

file.vcf
GFF

(General Feature Format or Gene Finding Format)

file.gff2, file. gff3, file.gff
GTF

(Gene Transfer format)

file.gtf

Usufull Tutorial Link

We can use BIOConda [4]

Bioconda only supports python 2.7, 3.6, 3.7, 3.8 and 3.9 -> DLS38 can be used

Lib and sources

Libraries Description References
rSeq: RNA-Seq Analyzer https://jhui2014.github.io/rseq/ On 61 sever, /test/bioinfomatics/rseq/rseq-0.2.2-src
SNVMix2 https://github.com/shahcompbio/snvmix imp@CGX-GPU:~/test/bioinfomatics/snvmix (master)$
Samtools SAM (Sequence Alignment/Map) format is a generic format for storing large nucleotide sequence alignments
Breakdancer BreakDancer uses CMake which is a cross-platform build tool. Basically it will generate a Makefile so you can use make. The requirements are the zlib, development library, gcc, gmake, cmake 2.8+. Beginning with version 1.4.4, BreakDancer includes samtools as part of the build process

# --recursive option is important so that it gets the submodules too $ git clone --recursive https://github.com/genome/breakdancer.git

BWA BWA is a software package for mapping low-divergent sequences against a large reference genome, such as the human genome. It consists of three algorithms: BWA-backtrack, BWA-SW and BWA-MEM.

References