Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Bioinformatics Cheat Sheet: Main Definitions, Cheat Sheet of Bioinformatics

This one page sheet provides some of the main Bioinformatics key terms with definitions

Typology: Cheat Sheet

2019/2020

Uploaded on 11/27/2020

tiuw
tiuw 🇺🇸

4.7

(18)

288 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
BIOINFORMATICS CHEAT SHEET
DEFINITIONS
Bait files: depict sequence capture regions that are typically downloaded from the sequence capture kit
manufacturer.
Binary Alignment/Map (BAM) file: a file format that is a binary version of SAM, making the file size more
compact.
Browser Extensible Data (BED) file: a standardized format for a tab-delimited file containing, at a
minimum, chromosomal coordinates for depicting regions of the genome.
CDCV (complex disease common variant hypothesis ): the rationale behind GWAS (genome-wide association
study) designs.
CDRV (complex disease rare variant hypothesis) assuming that rare variants make a greater contribution to
complex disease than do common variants; requires deep sequencing.
Depth of coverage: number of sequencing reads aligned to a specific genomic location. Higher depth of
coverage generally increases confidence in variant calls.
Edge prioritization: instead of prioritizing genes in isolation generate hypotheses about potential
interactions among the top candidates and ‘seed’ genes.
Cluster: a group of linked computers, working together thus in many respects forming a single computer.
Exome sequencing: sequencing every exon of every gene in the genome.
FASTQ: a file format for storing short read massively parallel sequencing data.
Galaxy: A web-based platform that makes command-line tools available to biologists, is flexible, sharable,
and can be run from (almost) any computer.
Massively parallel sequencing: a next-generation DNA sequencing technology that allows millions or
billions of base-pairs to be sequenced simultaneously.
Multiplexing: the application of nucleotide barcodes followed by subsequent pooling of multiple DNA
samples. Allows pooling of samples to increase throughput and take advantage of sequencer output.
Sequence Alignment/Map (SAM) file: a standardized and widely accepted format for storing large
nucleotide sequence alignments that is designed to be: flexible (accommodates alignment information from
various sequencers and alignment programs, compact in size, and easily convertable.
Sequence Capture: also known as targeted sequence capture or targeted genomic enrichment. The
massively parallel replacement for PCR. The process of simultaneously isolating thousands or millions of
regions of the genome prior to massively parallel sequencing.
Target Interval File: a file containing any areas of the genome that you think are biologically relevant (can
be a bait file, can be a subset of a bait file, or can be user-generated).
Variant Call Format (VCF) file: a standardized and widely accepted file type for storing genotype data
generated from variant calling algorithms.

Partial preview of the text

Download Bioinformatics Cheat Sheet: Main Definitions and more Cheat Sheet Bioinformatics in PDF only on Docsity!

BIOINFORMATICS CHEAT SHEET

DEFINITIONS

Bait files: depict sequence capture regions that are typically downloaded from the sequence capture kit manufacturer. Binary Alignment/Map (BAM) file: a file format that is a binary version of SAM, making the file size more compact. Browser Extensible Data (BED) file: a standardized format for a tab-delimited file containing, at a minimum, chromosomal coordinates for depicting regions of the genome. CDCV (complex disease common variant hypothesis ): the rationale behind GWAS (genome-wide association study) designs. CDRV (complex disease rare variant hypothesis) assuming that rare variants make a greater contribution to complex disease than do common variants; requires deep sequencing. Depth of coverage: number of sequencing reads aligned to a specific genomic location. Higher depth of coverage generally increases confidence in variant calls. Edge prioritization: instead of prioritizing genes in isolation generate hypotheses about potential interactions among the top candidates and ‘seed’ genes. Cluster: a group of linked computers, working together thus in many respects forming a single computer. Exome sequencing: sequencing every exon of every gene in the genome. FASTQ: a file format for storing short read massively parallel sequencing data. Galaxy: A web-based platform that makes command-line tools available to biologists, is flexible, sharable, and can be run from (almost) any computer. Massively parallel sequencing: a next-generation DNA sequencing technology that allows millions or billions of base-pairs to be sequenced simultaneously. Multiplexing: the application of nucleotide barcodes followed by subsequent pooling of multiple DNA samples. Allows pooling of samples to increase throughput and take advantage of sequencer output. Sequence Alignment/Map (SAM) file: a standardized and widely accepted format for storing large nucleotide sequence alignments that is designed to be: flexible (accommodates alignment information from various sequencers and alignment programs, compact in size, and easily convertable. Sequence Capture: also known as targeted sequence capture or targeted genomic enrichment. The massively parallel replacement for PCR. The process of simultaneously isolating thousands or millions of regions of the genome prior to massively parallel sequencing. Target Interval File: a file containing any areas of the genome that you think are biologically relevant (can be a bait file, can be a subset of a bait file, or can be user-generated). Variant Call Format (VCF) file: a standardized and widely accepted file type for storing genotype data generated from variant calling algorithms.