RNA Strandness

Create Time: 2022-09-12
Update Time: 2022-09-13

我一直想要好好的整理一下有关于RNA-seq的实验中，stranded相关信息，但是从我有记录这项任务开始，到今天可能有一年多了。虽说网上已经有了很好的资源，说明的很清楚（参考目录中），但是毕竟是别人的，我觉得还是有必要整理一下，让自己更加清晰。

首先有很多下游（相对于实验来说）的分析，都需要有这项信息，来获得更加准确的结果，包括但不限于比对、转录组组装等。对于有些资深的人来说，几乎知道实验过程中使用的什么kit，就知道应该使用什么参数，可是仍旧有可能，当我们的数据来自于公共数据库，或者我们的数据是由公司产出，公司并没有使用标准的protocol，而是有一些小的修改，那么就可能造成strand信息的不正确。

NOTE It is very possible that a sequencing provider/core may make modifications to these kits. For example, in one case we obtained RNAseq data processed with NEBNext Ultra II Directional kit (dUTP method). However instead of using the NEB hairpin adapters, IDT xGen UDI-UMI adapters were substituted, and this results in the insert strandedness being flipped (from RF/fr-firststrand to FR/fr-secondstrand). Because this level of detail is not always provided it is highly recommended to confirm your data’s strandedness empirically.

所以最好是能够自己检查数据，并且知道如何通过检查的结果来选择后续的分析参数，以及为什么是这样的选择。

分类

对于NGS来说，分类就非常的简单，只有三种：

read1的方向与transcript相同【forward】，read2的方向与transcript相反【reverse】（第一种）
read1的方向与transcript相反【reverse】，read2的方向与transcript相同【forward】（第二种）
read1的方向有时与transcript相同，有时相反（第三种）

上述括号中也就是的单词也就是最常作为参数的单词比如：

RF=read1 reverse + read2 forward （第二种）
FR=read1 forward + read2 reverse （第一种）
F2R1 = read2 forward + read1 reverse (第二种)
F1R2 = read1 forward + read2 reverse （第一种）

也就是没有标明数字的，那么前一个指代read1，后一个指代read2，对于标明了数字的，数字前面的是表示该read的方向。

其中比较需要注意的是TopHat的参数，fr-firststrand（第二种）和fr-secondstrand（第一种）。这里我不太好理解，所以去查了一下tophat的官方文档的说明，说明如下：

parameter	Kit	Notes
fr-unstranded	Standard Illumina	Reads from the left-most end of the fragment (in transcript coordinates) map to the transcript strand, and the right-most end maps to the opposite strand.
fr-firststrand	dUTP, NSR, NNSR	Same as above except we enforce the rule that the right-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during first strand synthesis is sequenced.
fr-secondstrand	Ligation, Standard SOLiD	Same as above except we enforce the rule that the left-most end of the fragment (in transcript coordinates) is the first sequenced (or only sequenced for single-end reads). Equivalently, it is assumed that only the strand generated during second strand synthesis is sequenced.

这里我copy一份别人整理好的表格：

Tool	RF/fr-firststrand stranded (dUTP)	FR/fr-secondstrand stranded (Ligation)	Unstranded
check_strandedness (output)	RF/fr-firststrand	FR/fr-secondstrand	unstranded
IGV (5p to 3p read orientation code)	F2R1	F1R2	F2R1 or F1R2
TopHat (–library-type parameter)	fr-firststrand	fr-secondstrand	fr-unstranded
HISAT2 (–rna-strandness parameter)	R/RF	F/FR	NONE
HTSeq (–stranded/-s parameter)	reverse	yes	no
STAR	n/a (STAR doesn’t use library strandedness info for mapping)	NONE	NONE
Picard CollectRnaSeqMetrics (STRAND_SPECIFICITY parameter)	SECOND_READ_TRANSCRIPTION_STRAND	FIRST_READ_TRANSCRIPTION_STRAND	NONE
Kallisto quant (parameter)	–rf-stranded	–fr-stranded	NONE
StringTie (parameter)	–rf	–fr	NONE
FeatureCounts (-s parameter)	2	1	0
RSEM (–forward-prob parameter)	0	1	0.5
Salmon (–libType parameter)	ISR (assuming paired-end with inward read orientation)	ISF (assuming paired-end with inward read orientation)	IU (assuming paired-end with inward read orientation)
Trinity (–SS_lib_type parameter)	RF	FR	NONE
MGI CWL YAML (strand parameter)	first	second	NONE
RegTools (strand parameter)	-s 1	-s 2	-s 0
	Example methods/kits: dUTP, NSR, NNSR, Illumina TruSeq Strand Specific Total RNA, NEBNext Ultra II Directional	Example methods/kits: Ligation, Standard SOLiD, NuGEN Encore, 10X 5’ scRNA data	Example kits/data:Standard Illumina, NuGEN OvationV2, SMARTer universal low input RNA kit (TaKara), GDC normalized TCGA data

方便之后自己再做参考使用。

Kit and Strandness

【待续】

参考

https://rnabio.org/module-09-appendix/0009/12/01/StrandSettings/

陈玥茏

RNA Strandness

分类

Kit and Strandness

参考

Table of Contents