MATLAB中的单峰或双峰分布-UnimodalorbimodaldistributioninMATLAB

作者：sueann88314_254 | 来源：互联网 | 2023-10-11 17:42

IsthereawayinMATLABtocheckwhetherthehistogramdistributionisunimodalorbimodal?在MATLAB

Is there a way in MATLAB to check whether the histogram distribution is unimodal or bimodal?

在MATLAB中是否有方法检查直方图分布是单峰还是双峰?

EDIT

编辑

Do you think Hartigan's Dip Statistic would work? I tried passing an image to it, and get the value 0. What does that mean?

你认为哈蒂根的Dip统计数据有用吗?我尝试传递一个图像给它，然后得到值0。这是什么意思?

And, when passing an image, does it test the distribution of the histogram of the image on the gray levels?

当传递一个图像时，它是否测试灰度上图像的直方图分布?

Thanks.

谢谢。

2 个解决方案

#1

Here is a script using Nic Price's implementation of Hartigan's Dip Test to identify unimodal distributions. The tricky point was to calculate xpdf, which is not probability density function, but rather a sorted sample.

这里有一个脚本，使用Nic Price实现的Hartigan Dip测试来识别单峰分布。难点在于计算xpdf，它不是概率密度函数，而是一个排序样本。

p_value is the probability of obtaining a test statistic at least as extreme as the one that was actually observed, assuming that the null hypothesis is true. In this case null hypothesis is that distribution is unimodal.

p_value是假设零假设为真，获得测试统计量的概率，至少与实际观察到的统计量一样极端。在这种情况下零假设是分布是单峰的。

close all; clear all;

function [x2, n, b] = compute_xpdf(x)
  x2 = reshape(x, 1, prod(size(x)));
  [n, b] = hist(x2, 40);
  % This is definitely not probability density function
  x2 = sort(x2);
  % downsampling to speed up computations
  x2 = interp1 (1:length(x2), x2, 1:1000:length(x2));
end

nboot = 500;
sample_size = [256 256];

% Unimodal
sample2d = normrnd(0.0, 10.0, sample_size);

[xpdf, n, b] = compute_xpdf(sample2d);
[dip, p_value, xlow, xup] = HartigansDipSignifTest(xpdf, nboot); 

figure;
subplot(1,2,1);
bar(n, b)
title(sprintf('Probability of unimodal %.2f', p_value))

% Bimodal
sample2d = sign(sample2d) .* (abs(sample2d) .^ 0.5);

[xpdf, n, b] = compute_xpdf(sample2d);
[dip, p_value, xlow, xup] = HartigansDipSignifTest(xpdf, nboot); 

subplot(1,2,2);
bar(n, b)
title(sprintf('Probability of unimodal %.2f', p_value))

print -dpng modality.png

Result of script execution

#2

There are many different ways to do what you are asking. In the most literal sense, "bimodal" means there are two peaks. Usually though, you want the "two peaks" to be separated by some reasonable distance, and you want them to each contain a reasonable proportion of the total counts. Only you know what is "reasonable" for your situation, but the following approach might help.

有很多不同的方法去做你要做的事。在最字面的意义上，“双峰”意味着有两个峰。但是，通常情况下，您希望“两个峰值”之间相隔一定的距离，并且希望每个峰值包含一定比例的总数。只有你知道什么对你的情况是“合理的”，但是下面的方法可能会有所帮助。

Create a histogram of the intensities
创建强度的直方图
Form the cumulative distribution with cumsum
形成累计分布与累计
For different values of the "cut" between distributions (25%, 30%, 50%, …), compute the mean and standard deviation of the two distributions (above and below the cut).
对于分布之间的“cut”的不同值(25%、30%、50%、…)，计算这两个分布的均值和标准差(在cut上面和下面)。
Compute the distance between the means divided by the sum of the standard deviations of the two distributions
计算均值之间的距离除以两个分布标准差之和
That quantity will be a maximum at the "best cut"
这个量在“最佳切割”时是最大的

You have to decide what size of that quantity represents "bimodal" for you. Here is some code that demonstrates what I am talking about. It generates bimodal distributions of different degrees of severity - two Gaussians, with increasing delta between them (steps = size of standard deviation). I compute the quantity described above, and plot it for a range of different values of delta. I then fit a parabola through this curve over a range corresponding to +- 1 sigma of the entire distribution. As you can see, when the distribution becomes more bimodal, two things happen:

你必须决定那个量的大小对你来说代表“双峰”。这里有一些代码演示了我正在谈论的内容。它产生不同严重程度的双模态分布-两个高斯分布，它们之间的增量增加(步骤=标准偏差的大小)。我计算了上面描述的量，并把它绘制成一系列不同的值。然后我把抛物线穿过这条曲线除以整个分布的+- 1。正如你所看到的，当分布变得更加双峰时，会发生两件事:

The curvature of this curve flips (it goes from a valley to a peak)
这条曲线的曲率会翻转(从一个山谷到一个山峰)
The maximum increases (it is about 1.33 for a Gaussian).
最大值增加了(高斯值大约是1.33)。

You can look at these quantities for some of your own distributions, and decide where you want to put the cutoff.

你可以看看你自己分布的这些量，然后决定你要把截止点放在哪里。

% test for bimodal distribution
close all
for delta = 0:10:50
    a1 = randn(100,100) * 10 + 25;
    a2 = randn(100,100) * 10 + 25 + delta;
    a3 = [a1(:); a2(:)];
    [h hb] = hist(a3, 0:100);
    cs = cumsum(h);
    llimi = find(cs <0.2 * max(cs(:)));
    ulimi = find(cs > 0.8 * max(cs(:)));
    llim = hb(llimi(end));
    ulim = hb(ulimi(1));
    cuts = linspace(llim, ulim, 20);
    dmean = mean(a3);
    dstd = std(a3);
    for ci = 1:numel(cuts)
        d1 = a3(a3=cuts(ci));
        m(ci,1) = mean(d1);
        m(ci, 2) = mean(d2);
        s(ci, 1) = std(d1);
        s(ci, 2) = std(d2);
    end
    q = (m(:, 2) - m(:, 1)) ./ sum(s, 2);
    figure; 
    plot(cuts, q);
    title(sprintf('delta = %d', delta))
    % compute curvature of plot around mean:
    xlims = dmean + [-1 1] * dstd;
    indx = find(cuts  xlims(1));
    pf = polyfit(cuts(indx), q(indx), 2);
    m = polyval(pf, dmean);
    fprintf(1, 'coefficients: a = %.2e, peak = %.2f\n', pf(1), m);
end

Output values:

输出值:

coefficients: a = 1.37e-03, peak = 1.32
coefficients: a = 1.01e-03, peak = 1.34
coefficients: a = 2.85e-04, peak = 1.45
coefficients: a = -5.78e-04, peak = 1.70
coefficients: a = -1.29e-03, peak = 2.08
coefficients: a = -1.58e-03, peak = 2.48

Sample plots:

示例图:

delta = 0