bcftools Primer

bcftools provides utilities for working with data in variant calling (.vcf) format. The manual fully documents the arguments and features, and the developers have written their own “HowTo” page. The goal of this post is to walk through some scenarios with a reproducible dataset to showcase the bcftools functionality I use regularly.

Note that this will not be an exhaustive demonstration of all bcftools features, nor will it include other .vcf parsing/manipulation tools or linux utilities (i.e. awk, sed) that can be handy for working with variant calling data.

The examples should be reproducible given setup described below. However, the output at the command line will look slightly different than the inline output in this post. For legibility, I’ve run each of the commands, excluded the header, and read the results back in as a text file. The inline output in this post will show a max of 6 rows with a final placeholder row (. . . . . . . . . . .) if necessary.


Setup

To get started we need to find some data to work with and do a bit of pre-processing:

  1. Download all of the files for the 20130502 release of the 1000 Genomes Project (these are in compressed .vcf.gz format, each with .tbi index)
  2. Download a .vcf.gz (and .tbi) for sites annotated by ClinVar1
  3. Create .vcf.gz files for each chromosome (1-22) filtered to only include the ClinVar sites
  4. Create tabix index for each of the newly created .vcf.gz files

The code that follows will perform all of the steps described above. Keep in mind that the each step (especially downloading and filtering the 1000 Genomes data) may take quite a while as these files are large (~ 20GB total). You’ll need a system with sufficient storage, and has wget, parallel, bcftools, and tabix installed.

## download 1000 genomes vcf files
wget ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/*.vcf.gz*

## download clinvar vcf
wget ftp://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz*

## use parallel to restrict each chromosome (chr1 to chr22) to clinvar sites
find . -type f -name "ALL.chr[1-9]*vcf.gz" | parallel "bcftools view {} -R clinvar.vcf.gz --output-type z --output {}.clinvar.vcf.gz"

## make sure all vcf.gz files are tabix indexed
find . -type f -name "ALL.chr[1-9]*.clinvar.vcf.gz" | parallel "tabix {}"

With the data processed we can move onto the scenarios.

All subsequent code will use bcftools version 1.10.

bcftools --version
bcftools 1.10.2-27-g9d66868
Using htslib 1.10.2-33-g1bbcd02
Copyright (C) 2019 Genome Research Ltd.
License Expat: The MIT/Expat license
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

Scenarios

Concatenate multiple files together

If we wanted to concatenate (i.e. “stack”) multiple .vcf files together we can use bcftools concat, so long as the input files share the same fields. In this example, we’ll combine all of the chromosomes (1-22) into a single file.

The --output-type z argument specifies that the output will be compressed, and the --output flag allows us to explicitly name the resulting file:

bcftools concat ALL.chr*.clinvar.vcf.gz --output-type z --output all.clinvar.vcf.gz

NOTE: bcftools concat is not equivalent bcftools merge. For an example of the latter see below.

Select individual samples by name

bcftools view -s allows for subsetting by sample ID.

The combined all.clinvar.vcf.gz file above contains multiple samples. Here we’ll create individual compressed .vcf files for NA20536 and HG03718 samples, along with a tabix index for each file (using bcftools index -t):

bcftools view -s NA20536 all.clinvar.vcf.gz --output-type z --output NA20536.clinvar.vcf.gz
bcftools view -s HG03718 all.clinvar.vcf.gz --output-type z --output HG03718.clinvar.vcf.gz

## note: bcftools index -t is equivalent to tabix here
bcftools index -t NA20536.clinvar.vcf.gz
bcftools index -t HG03718.clinvar.vcf.gz

Filter to only include INDELs

bcftools view -v will restrict the file to specified variant types: “snps”, “indels”, “mnps”, or “other”.

We can use the command to filter the .vcf to only include INDELs:

bcftools view -v indels NA20536.clinvar.vcf.gz
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA20536
1 978603 rs35881187 CCT C 100 PASS AC=2;AF=0.479233;AN=2;NS=2504;DP=14705;EAS_AF=0.8036;AMR_AF=0.6412;AFR_AF=0.0348;EUR_AF=0.5487;SAS_AF=0.5593;VT=INDEL GT 1|1
1 984171 rs140904842 CAG C 100 PASS AC=2;AF=0.920527;AN=2;NS=2504;DP=7127;EAS_AF=0.9891;AMR_AF=0.9769;AFR_AF=0.7602;EUR_AF=0.9742;SAS_AF=0.9714;VT=INDEL GT 1|1
1 1168239 rs533071750 C CG 100 PASS AC=0;AF=0.000599042;AN=2;NS=2504;DP=9648;EAS_AF=0;AMR_AF=0.0029;AFR_AF=0;EUR_AF=0.001;SAS_AF=0;AA=?|GGGGGGG|GGGGGGGG|unsure;VT=INDEL;EX_TARGET GT 0|0
1 2343991 rs570192538 CCA C 100 PASS AC=0;AF=0.00459265;AN=2;NS=2504;DP=9045;EAS_AF=0;AMR_AF=0;AFR_AF=0.0174;EUR_AF=0;SAS_AF=0;VT=INDEL GT 0|0
1 2435830 rs555614613 TTCC T 100 PASS AC=0;AF=0.00579073;AN=2;NS=2504;DP=15005;EAS_AF=0;AMR_AF=0.0029;AFR_AF=0.0204;EUR_AF=0;SAS_AF=0;VT=INDEL;EX_TARGET GT 0|0
1 2492946 rs149579135 AG A 100 PASS AC=0;AF=0.00359425;AN=2;NS=2504;DP=17775;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0129;EUR_AF=0;SAS_AF=0;AA=G|G|-|deletion;VT=INDEL GT 0|0
. . . . . . . . . .

Filter by rsid

With bcftools you can filter a .vcf file for certain sites by passing in a file that contains the IDs to be retained.

Assuming we have the following RSIDs in a file called snps.list2:

rs145413551
rs34610323
rs79548709
rs371163239
rs148716910
rs374704178

We can use snps.list to filter with bcftools view:

bcftools view --include ID==@snps.list NA20536.clinvar.vcf.gz
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA20536
17 648546 rs34610323 C T 100 PASS AC=0;AF=0.0159744;AN=2;NS=2504;DP=21874;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0575;EUR_AF=0;SAS_AF=0;AA=C|||;VT=SNP;EX_TARGET GT 0|0
2 31620566 rs145413551 G T 100 PASS AC=0;AF=0.000199681;AN=2;NS=2504;DP=19652;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET GT 0|0
21 45707000 rs374704178 G A 100 PASS AC=0;AF=0.000399361;AN=2;NS=2504;DP=11479;EAS_AF=0;AMR_AF=0;AFR_AF=0.0015;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET GT 0|0
5 151721 rs148716910 G A 100 PASS AC=0;AF=0.00279553;AN=2;NS=2504;DP=18789;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET GT 0|0
8 1841816 rs79548709 C T 100 PASS AC=0;AF=0.00519169;AN=2;NS=2504;DP=16683;EAS_AF=0;AMR_AF=0;AFR_AF=0.0197;EUR_AF=0;SAS_AF=0;AA=C|||;VT=SNP;EX_TARGET GT 0|0
8 3889458 rs371163239 T A 100 PASS AC=0;AF=0.000199681;AN=2;NS=2504;DP=15669;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP;EX_TARGET GT 0|0

Filter by chromosome and/or position

The --regions flag takes input chromosome and/or position coordinates to filter the .vcf.

If we wanted to restrict to chromosome 5:

bcftools view --regions 5 NA20536.vcf.gz 
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA20536
5 40417 esv3603720;esv3603721 G , 100 PASS AC=0,0;AF=0.000199681,0.000798722;AN=2;CS=DUP_uwash;END=176437;NS=2504;SVTYPE=CNV;DP=16231;EAS_AF=0,0;AMR_AF=0,0;AFR_AF=0,0;EUR_AF=0,0.003;SAS_AF=0.001,0.001;VT=SV;EX_TARGET GT 0|0
5 124186 esv3603731 T 100 PASS AC=0;AF=0.000199681;AN=2;CS=DUP_gs;END=163795;NS=2504;SVTYPE=DUP;DP=19153;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0.001;SAS_AF=0;VT=SV;EX_TARGET GT 0|0
5 143490 rs142208662 C T 100 PASS AC=0;AF=0.00279553;AN=2;NS=2504;DP=19664;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=c|||;VT=SNP;EX_TARGET GT 0|0
5 151721 rs148716910 G A 100 PASS AC=0;AF=0.00279553;AN=2;NS=2504;DP=18789;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET GT 0|0
5 156288 rs193920840 C T 100 PASS AC=0;AF=0.000199681;AN=2;NS=2504;DP=17617;EAS_AF=0;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0.001;AA=C|||;VT=SNP;EX_TARGET GT 0|0
5 162045 rs568109142 G A 100 PASS AC=0;AF=0.000199681;AN=2;NS=2504;DP=15391;EAS_AF=0.001;AMR_AF=0;AFR_AF=0;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET GT 0|0
. . . . . . . . . .

And if we were interested in a specific region (let’s say chromosome 10, anywhere between positions 800000:900000):

bcftools view --regions 10:800000-900000 NA20536.clinvar.vcf.gz
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA20536
10 859076 rs144565605 T C 100 PASS AC=0;AF=0.000199681;AN=2;NS=2504;DP=15608;EAS_AF=0;AMR_AF=0;AFR_AF=0.0008;EUR_AF=0;SAS_AF=0;AA=T|||;VT=SNP;EX_TARGET GT 0|0
10 860990 rs144883024 G A 100 PASS AC=0;AF=0.00259585;AN=2;NS=2504;DP=18990;EAS_AF=0;AMR_AF=0.0014;AFR_AF=0.0091;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET GT 0|0
10 871816 rs79707128 T A 100 PASS AC=0;AF=0.0211661;AN=2;NS=2504;DP=21039;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0703;EUR_AF=0;SAS_AF=0.0092;AA=T|||;VT=SNP;EX_TARGET GT 0|0

Format translated genotype output

bcftools query will output contents of the .vcf in text format. The contents can be specified in a string that includes fields to extract, separators, and line endings.

In this scenario, we’ll pull out the ID (RSID), chromosome, position, a translated genotype, and the “type” (SNP, INDEL, etc.) in tab-separated format:

bcftools query -f "%ID\t%CHROM\t%POS[\t%TGT]\t%TYPE\n" NA20536.clinvar.vcf.gz
ID CHROM POS GT TYPE
rs41285790 1 865628 G|G SNP
rs113383096 1 879481 G|G SNP
rs112433394 1 880944 G|G SNP
rs113226136 1 887409 G|G SNP
rs112966263 1 887989 A|A SNP
rs58931985 1 889450 C|C SNP
. . . . .

Merge vcf files together

bcftools merge will combine data from multiple files.

To merge individual sample .vcf files into one:

bcftools merge NA20536.clinvar.vcf.gz HG03718.clinvar.vcf.gz --output-type z --output NA20536.HG03718.clinvar.vcf.gz
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA20536 HG03718
1 865628 rs41285790 G A 100 PASS NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=33950;AF=0.00279553;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AN=4;AC=0 GT 0|0 0|0
1 879481 rs113383096 G C 100 PASS NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=27530;AF=0.0197684;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
1 880944 rs112433394 G A 100 PASS NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=41446;AF=0.00259585;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
1 887409 rs113226136 G C 100 PASS NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=39832;AF=0.00119808;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
1 887989 rs112966263 A G 100 PASS NS=2504;AA=G|||;VT=SNP;EX_TARGET;DP=36768;AF=0.00579073;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
1 889450 rs58931985 C A 100 PASS NS=2504;AA=C|||;VT=SNP;EX_TARGET;DP=32298;AF=0.00159744;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
. . . . . . . . . . .

Parse genotypes for multiple samples

Given that you have a mutli-sample .vcf you can parse genotypes for each individual:

bcftools query -f '[%CHROM\t%POS\t%SAMPLE\t%TGT\n]' NA20536.HG03718.clinvar.vcf.gz
CHROM POS SAMPLE GT
1 865628 NA20536 G|G
1 865628 HG03718 G|G
1 879481 NA20536 G|G
1 879481 HG03718 G|G
1 880944 NA20536 G|G
1 880944 HG03718 G|G
. . . .

Edit chromosome names

You can rename chromosomes with bcftools annotate --rename-chrs. The command requires that you supply a tab-separated file with the desired naming convention, organized as “old\tnew”:

1\tchr1
2\tchr2
3\tchr3
4\tchr4
5\tchr5
6\tchr6
7\tchr7
8\tchr8
9\tchr9
10\tchr10
11\tchr11
12\tchr12
13\tchr13
14\tchr14
15\tchr15
16\tchr16
17\tchr17
18\tchr18
19\tchr19
20\tchr20
21\tchr21
22\tchr22
bcftools annotate --rename-chrs chromosomes.txt NA20536.clinvar.vcf.gz
CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA20536
chr1 865628 rs41285790 G A 100 PASS AC=0;AF=0.00279553;AN=2;NS=2504;DP=16975;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AA=g|||;VT=SNP;EX_TARGET GT 0|0
chr1 879481 rs113383096 G C 100 PASS AC=0;AF=0.0197684;AN=2;NS=2504;DP=13765;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AA=g|||;VT=SNP;EX_TARGET GT 0|0
chr1 880944 rs112433394 G A 100 PASS AC=0;AF=0.00259585;AN=2;NS=2504;DP=20723;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AA=g|||;VT=SNP;EX_TARGET GT 0|0
chr1 887409 rs113226136 G C 100 PASS AC=0;AF=0.00119808;AN=2;NS=2504;DP=19916;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AA=g|||;VT=SNP;EX_TARGET GT 0|0
chr1 887989 rs112966263 A G 100 PASS AC=0;AF=0.00579073;AN=2;NS=2504;DP=18384;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AA=G|||;VT=SNP;EX_TARGET GT 0|0
chr1 889450 rs58931985 C A 100 PASS AC=0;AF=0.00159744;AN=2;NS=2504;DP=16149;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AA=C|||;VT=SNP;EX_TARGET GT 0|0
. . . . . . . . . .

View without header

To view only the results without header (i.e. remove the header) use the -H flag:

bcftools view -H NA20536.HG03718.clinvar.vcf.gz
X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11
1 865628 rs41285790 G A 100 PASS NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=33950;AF=0.00279553;EAS_AF=0;AMR_AF=0.0072;AFR_AF=0;EUR_AF=0.005;SAS_AF=0.0041;AN=4;AC=0 GT 0|0 0|0
1 879481 rs113383096 G C 100 PASS NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=27530;AF=0.0197684;EAS_AF=0;AMR_AF=0.0058;AFR_AF=0.0719;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
1 880944 rs112433394 G A 100 PASS NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=41446;AF=0.00259585;EAS_AF=0;AMR_AF=0;AFR_AF=0.0098;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
1 887409 rs113226136 G C 100 PASS NS=2504;AA=g|||;VT=SNP;EX_TARGET;DP=39832;AF=0.00119808;EAS_AF=0;AMR_AF=0;AFR_AF=0.0045;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
1 887989 rs112966263 A G 100 PASS NS=2504;AA=G|||;VT=SNP;EX_TARGET;DP=36768;AF=0.00579073;EAS_AF=0;AMR_AF=0;AFR_AF=0.0219;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
1 889450 rs58931985 C A 100 PASS NS=2504;AA=C|||;VT=SNP;EX_TARGET;DP=32298;AF=0.00159744;EAS_AF=0;AMR_AF=0;AFR_AF=0.0061;EUR_AF=0;SAS_AF=0;AN=4;AC=0 GT 0|0 0|0
. . . . . . . . . . .

View only header

To view only the header (i.e. extract header) use the -h flag:

bcftools view -h clinvar.vcf.gz
## ##fileformat=VCFv4.1
## ##FILTER=<ID=PASS,Description="All filters passed">
## ##fileDate=2020-02-17
## ##source=ClinVar
## ##reference=GRCh37
## ##ID=<Description="ClinVar Variation ID">
## ##INFO=<ID=AF_ESP,Number=1,Type=Float,Description="allele frequencies from GO-ESP">
## ##INFO=<ID=AF_EXAC,Number=1,Type=Float,Description="allele frequencies from ExAC">
## ##INFO=<ID=AF_TGP,Number=1,Type=Float,Description="allele frequencies from TGP">
## ##INFO=<ID=ALLELEID,Number=1,Type=Integer,Description="the ClinVar Allele ID">
## ##INFO=<ID=CLNDN,Number=.,Type=String,Description="ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
## ##INFO=<ID=CLNDNINCL,Number=.,Type=String,Description="For included Variant : ClinVar's preferred disease name for the concept specified by disease identifiers in CLNDISDB">
## ##INFO=<ID=CLNDISDB,Number=.,Type=String,Description="Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
## ##INFO=<ID=CLNDISDBINCL,Number=.,Type=String,Description="For included Variant: Tag-value pairs of disease database name and identifier, e.g. OMIM:NNNNNN">
## ##INFO=<ID=CLNHGVS,Number=.,Type=String,Description="Top-level (primary assembly, alt, or patch) HGVS expression.">
## ##INFO=<ID=CLNREVSTAT,Number=.,Type=String,Description="ClinVar review status for the Variation ID">
## ##INFO=<ID=CLNSIG,Number=.,Type=String,Description="Clinical significance for this single variant">
## ##INFO=<ID=CLNSIGCONF,Number=.,Type=String,Description="Conflicting clinical significance for this single variant">
## ##INFO=<ID=CLNSIGINCL,Number=.,Type=String,Description="Clinical significance for a haplotype or genotype that includes this variant. Reported as pairs of VariationID:clinical significance.">
## ##INFO=<ID=CLNVC,Number=1,Type=String,Description="Variant type">
## ##INFO=<ID=CLNVCSO,Number=1,Type=String,Description="Sequence Ontology id for variant type">
## ##INFO=<ID=CLNVI,Number=.,Type=String,Description="the variant's clinical sources reported as tag-value pairs of database and variant identifier">
## ##INFO=<ID=DBVARID,Number=.,Type=String,Description="nsv accessions from dbVar for the variant">
## ##INFO=<ID=GENEINFO,Number=1,Type=String,Description="Gene(s) for the variant reported as gene symbol:gene id. The gene symbol and id are delimited by a colon (:) and each pair is delimited by a vertical bar (|)">
## ##INFO=<ID=MC,Number=.,Type=String,Description="comma separated list of molecular consequence in the form of Sequence Ontology ID|molecular_consequence">
## ##INFO=<ID=ORIGIN,Number=.,Type=String,Description="Allele origin. One or more of the following values may be added: 0 - unknown; 1 - germline; 2 - somatic; 4 - inherited; 8 - paternal; 16 - maternal; 32 - de-novo; 64 - biparental; 128 - uniparental; 256 - not-tested; 512 - tested-inconclusive; 1073741824 - other">
## ##INFO=<ID=RS,Number=.,Type=String,Description="dbSNP ID (i.e. rs number)">
## ##INFO=<ID=SSR,Number=1,Type=Integer,Description="Variant Suspect Reason Codes. One or more of the following values may be added: 0 - unspecified, 1 - Paralog, 2 - byEST, 4 - oldAlign, 8 - Para_EST, 16 - 1kg_failed, 1024 - other">
## ##contig=<ID=1>
## ##contig=<ID=2>
## ##contig=<ID=3>
## ##contig=<ID=4>
## ##contig=<ID=5>
## ##contig=<ID=6>
## ##contig=<ID=7>
## ##contig=<ID=8>
## ##contig=<ID=9>
## ##contig=<ID=10>
## ##contig=<ID=11>
## ##contig=<ID=12>
## ##contig=<ID=13>
## ##contig=<ID=14>
## ##contig=<ID=15>
## ##contig=<ID=16>
## ##contig=<ID=17>
## ##contig=<ID=18>
## ##contig=<ID=19>
## ##contig=<ID=20>
## ##contig=<ID=21>
## ##contig=<ID=22>
## ##contig=<ID=X>
## ##contig=<ID=Y>
## ##contig=<ID=MT>
## ##bcftools_viewVersion=1.10.2-27-g9d66868+htslib-1.10.2-33-g1bbcd02
## ##bcftools_viewCommand=view -h clinvar.vcf.gz; Date=Fri Feb 28 19:06:40 2020
## #CHROM   POS ID  REF ALT QUAL    FILTER  INFO

  1. From ClinVar vcf documentation: This file contains variations submitted through clinical channels. The variations contained in this file are therefore a mixture of variations asserted to be pathogenic as well as those known to be non-pathogenic. The user should note that any variant may have different assertions regarding clinical significance and that this file will contain only those that are the most “pathogenic”.

  2. This solution is based on a Biostars post: https://www.biostars.org/p/373852/

Related