• Create Time: 2022-09-12
  • Update Time: 2022-09-13

我一直想要好好的整理一下有关于RNA-seq的实验中,stranded相关信息,但是从我有记录这项任务开始,到今天可能有一年多了。虽说网上已经有了很好的资源,说明的很清楚(参考目录中),但是毕竟是别人的,我觉得还是有必要整理一下,让自己更加清晰。

首先有很多下游(相对于实验来说)的分析,都需要有这项信息,来获得更加准确的结果,包括但不限于比对、转录组组装等。对于有些资深的人来说,几乎知道实验过程中使用的什么kit,就知道应该使用什么参数,可是仍旧有可能,当我们的数据来自于公共数据库,或者我们的数据是由公司产出,公司并没有使用标准的protocol,而是有一些小的修改,那么就可能造成strand信息的不正确。

NOTE It is very possible that a sequencing provider/core may make modifications to these kits. For example, in one case we obtained RNAseq data processed with NEBNext Ultra II Directional kit (dUTP method). However instead of using the NEB hairpin adapters, IDT xGen UDI-UMI adapters were substituted, and this results in the insert strandedness being flipped  (from RF/fr-firststrand to FR/fr-secondstrand). Because this level of detail is not always provided it is highly recommended to confirm your data’s strandedness empirically.

所以最好是能够自己检查数据,并且知道如何通过检查的结果来选择后续的分析参数,以及为什么是这样的选择。

分类

对于NGS来说,分类就非常的简单,只有三种:

  • read1的方向与transcript相同【forward】,read2的方向与transcript相反【reverse】(第一种)

  • read1的方向与transcript相反【reverse】,read2的方向与transcript相同【forward】(第二种)

  • read1的方向有时与transcript相同,有时相反(第三种)

上述括号中也就是的单词也就是最常作为参数的单词比如:

  • RF=read1 reverse + read2 forward (第二种)

  • FR=read1 forward + read2 reverse (第一种)

  • F2R1 = read2 forward + read1 reverse (第二种)

  • F1R2 = read1 forward + read2 reverse (第一种)

也就是没有标明数字的,那么前一个指代read1,后一个指代read2,对于标明了数字的,数字前面的是表示该read的方向。

其中比较需要注意的是TopHat的参数,fr-firststrand(第二种)和fr-secondstrand(第一种)。这里我不太好理解,所以去查了一下tophat的官方文档的说明,说明如下:

parameterKitNotes
fr-unstrandedStandard IlluminaReads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
fr-firststranddUTP, NSR, NNSRSame as above except we enforce the rule that the **right-most end of the fragment **(in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
fr-secondstrandLigation, Standard SOLiDSame as above except we enforce the rule that the** left-most end of the fragment** (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

这里我copy一份别人整理好的表格:

ToolRF/fr-firststrand stranded (dUTP)FR/fr-secondstrand stranded (Ligation)Unstranded
check_strandedness (output)RF/fr-firststrandFR/fr-secondstrandunstranded
IGV (5p to 3p read orientation code)F2R1F1R2F2R1 or F1R2
TopHat (–library-type parameter)fr-firststrandfr-secondstrandfr-unstranded
HISAT2 (–rna-strandness parameter)R/RFF/FRNONE
HTSeq (–stranded/-s parameter)reverseyesno
STARn/a (STAR doesn’t use library strandedness info for mapping)NONENONE
Picard CollectRnaSeqMetrics (STRAND_SPECIFICITY parameter)SECOND_READ_TRANSCRIPTION_STRANDFIRST_READ_TRANSCRIPTION_STRANDNONE
Kallisto quant (parameter)–rf-stranded–fr-strandedNONE
StringTie (parameter)–rf–frNONE
FeatureCounts (-s parameter)210
RSEM (–forward-prob parameter)010.5
Salmon (–libType parameter)ISR (assuming paired-end with inward read orientation)ISF (assuming paired-end with inward read orientation)IU (assuming paired-end with inward read orientation)
Trinity (–SS_lib_type parameter)RFFRNONE
MGI CWL YAML (strand parameter)firstsecondNONE
RegTools (strand parameter)-s 1-s 2-s 0
Example methods/kits: dUTP, NSR, NNSR, Illumina TruSeq Strand Specific Total RNA, NEBNext Ultra II DirectionalExample methods/kits: Ligation, Standard SOLiD, NuGEN Encore, 10X 5’ scRNA data**Example kits/data:**Standard Illumina, NuGEN OvationV2, SMARTer universal low input RNA kit (TaKara), GDC normalized TCGA data

方便之后自己再做参考使用。

Kit and Strandness

【待续】

参考

  1. https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/