生物序列保守性
保守性
zhihu
在生物学中,保守序列指的是具有高度相似性或同一性的分子序列,这些序列可以是核酸序列(如RNA或DNA序列),蛋白质序列,蛋白质结构或糖类中的序列。这些序列高度相似,却来自不同的物种或同一生物体产生的不同分子
保守区域
序列基本不改变的区域
![在这里插入图片描述](https://img8.php1.cn/3cdc5/1850a/a6e/9c416418cb1719d3.png)
序列保守性的定义
2020-11_Theory in Biosciences_Eukaryotic and prokaryotic promoter prediction using hybrid approach
https://link.springer.com/article/10.1007/s12064-010-0114-8
原文:
For investigating the signal properties of promoter sequences, the conservation of oligonucleotide with length k-mer at the ith site can be calculated from following formula (Li and Lin 2006):
Mk(i)=∑x[pi(x)−pe]2/peM_k(i)=∑_x[p_i(x)−p_e]^2/p_e Mk(i)=x∑[pi(x)−pe]2/pe
where pi(x)p_i(x)pi(x)and pep_epe denote the observed probability and expected probability of k-mer oligonucleotide xxx at the ith site, respectively. Two approaches can be used to calculate expected probability pep_epe: one is equal distribution of the k-mer oligonucleotide; another is the real k-mer oligonucleotide counts for each species. In this study, the first approach was used to calculate the pep_epe. For example, if k = 1, the expected probabilities of four bases is 0.25; and the observed probabilities of bases A, C, G, and T at the ith site denote as pi(A)p_i (A)pi(A), pi(C)p_i (C)pi(C), pi(G)p_i (G)pi(G), and pi(T)p_i (T)pi(T), respectively. The M1(i)M_1(i)M1(i) denotes the conservation of bases at the ith site. It can be proved that the larger the Mk(i)M_k(i)Mk(i) value,the more conserved the ith site. M1(i)M_1(i)M1(i) equals to zero for random sequence.
理解:
在第i位点,kmer长的寡核苷酸的保守性计算公式为:Mk(i)=∑x[pi(x)−pe]2/peM_k(i)=∑_x[p_i(x)−p_e]^2/p_eMk(i)=∑x[pi(x)−pe]2/pe
-
xxx:kmer 寡核苷酸
x=1x=1x=1,代表有4种kmer;以此类推。
-
pi(x)p_i(x)pi(x):观测到的可能性
-
pep_epe:期望的可能性。两种计算方式
-
kmer寡核苷酸的分布
这里用的这种计算方式
-
对所有序列在每个位点的真实数量统计
-
Mk(i)M_k(i)Mk(i):在第i位点上kmer的保守值。值越大,第i位点越保守,反之,为0。
例如,如果k=1,则四个碱基的期望的可能性pep_epe就都是0.25,观测到的可能性就是 pi(A)p_i (A)pi(A), pi(C)p_i (C)pi(C), pi(G)p_i (G)pi(G)和pi(T)p_i (T)pi(T)。