Bayes

From ECLR
Jump to: navigation, search

Problem Setup

This section is yet to be written, but you could check out this YouTube clip I produced to illustrate how a Bayesian Estimation of a proportion would work.

The problem is to estimate the proportion of years in which the average global temperature increases. This is a slightly odd way to check whether there is a detectable global temperature trend, although there surely may be better ways to do this. But it is a nice example to illustrate how Bayesian estimation works.

In fact estimating a proportion is about the simplest problem you can tackle in the Bayesian estimation framework, but it makes the workings very obvious.

The data used in the example can be downloaded from here: HadCRUT4-gl.xlsx.

At this stage you will have to use the comments in the code for explanations.

% Example: using HadCRUT4-gl.xlsx global temperature anomaly data

% Estimate the following parameter: p, probability of an upward trend

% For Bayesian Reasoning we need to start with a prior probability
% distribution: define p as the probability of having an up-tick
% We now need to specify a distribution for p, say f(p) = 1/101
% when we treat p as a discrete distribution that can take values
% of p = 0.00,0.01,0.02,...,1.00

clc;
clear all;

pval = (0:0.01:1)'; % values of p at which to evaluate probabilities


%% load global temperature data
data = xlsread('HadCRUT4-gl.xlsx');
years = data(1:164,1);              % extract the years for which we have data
temp = data(1:164,2);               % extract the temperature data
diff = temp(2:end)-temp(1:end-1);   % calculate the change in temperature
ups = (diff>0);                     % dummy variable 1 if diff > 0, 0 otherwise
% ups = ups(124:end);                 % Select data from 1973 onwards only
n = size(ups,1);                    % number of observations

plot(years,temp);
title('Global Temperature Anamoly');
%% Frequentist approach

pbar = mean(ups);
sd_pbar = sqrt(pbar*(1-pbar)/n);

%% Bayesian Updating

% Prior 1: Normal (m1,sd1^2)
m1 = 0.5;
sd1 = 0.05;
fp_prior1 = normcdf(pval+0.005,m1,sd1);   % continuous
fp_prior1 = [0;fp_prior1(2:end)-fp_prior1(1:end-1)];   % discretised

% Prior 2: Normal (m2,sd2^2)
m2 = 0.6;
sd2 = 0.05;
fp_prior2 = normcdf(pval+0.005,m2,sd2);   % continuous
fp_prior2 = [0;fp_prior2(2:end)-fp_prior2(1:end-1)];   % discretised

% Prior 3: Uniform
fp_prior3 = ones(size(pval)).*(1/size(pval,1));

subplot(2,1,1)
plot(pval,[fp_prior1 fp_prior2 fp_prior3]);
title('Prior Distributions');
axis([0 1 0 0.15]);
subplot(2,1,2)
% Let's say we have a year with temp increase, then the probability at each pval, pval(i),
% is updated as follows
% P(pval(i)|up) = P(pval(i) and up)/P(up) = fp_prior(i)*pval(i)/P(up)
% where p(up) = sum over all j(fp_prior(j)*pval(j))

for sim = 1:n

    % Prior 1
    P_joint1 = fp_prior1.*pval.*ups(sim) + fp_prior1.*(1-pval).*(1-ups(sim));
    P_tick1  = sum(P_joint1);
    fp_post1 = P_joint1./P_tick1;   % this scales all probs such that they sum to 1
    bar(pval,fp_post1,'blue');
    axis([0 1 0 0.15]);
    hold on
    fp_prior1 = fp_post1;

    % Prior 2
    P_joint2 = fp_prior2.*pval.*ups(sim) + fp_prior2.*(1-pval).*(1-ups(sim));
    P_tick2  = sum(P_joint2);
    fp_post2 = P_joint2./P_tick2;   % this scales all probs such that they sum to 1
    bar(pval,fp_post2,'green');
    axis([0 1 0 0.15]);
    hold on
    fp_prior2 = fp_post2;

    % Prior 3
    P_joint3 = fp_prior3.*pval.*ups(sim) + fp_prior3.*(1-pval).*(1-ups(sim));
    P_tick3  = sum(P_joint3);
    fp_post3 = P_joint3./P_tick3;   % this scales all probs such that they sum to 1
    bar(pval,fp_post3,'red');
    axis([0 1 0 0.15]);
    fp_prior3 = fp_post3;

    title('Posterior Distributions');
    hold off;

    if sim < 20;

        pause(0.5);
    else
        pause(0.0003);
    end
end


disp('Bayes 1');
disp('P(p>0.5)');
disp(sum(fp_post1(51:end)));
disp('');
disp('Bayes 2');
disp('P(p>0.5)');
disp(sum(fp_post2(51:end)));
disp('');
disp('Bayes 3');
disp('P(p>0.5)');
disp(sum(fp_post3(51:end)));