Activating project at `~/Documents/GitHub/STAT638_Applied-Bayes-Methods/hw/hw9`
Random.TaskLocalRNG()
Description
Course: STAT638, 2022 Fall
Deadline: 2022/11/17, 12:00 pm
Read Chapter 9 in the Hoff book. Then do Problems 9.1 and 9.2 in Hoff.
For both regression models, please include an intercept term (\(\beta_0\)).
In 9.1(b), please replace “max” by “min”. (This is not listed in the official book errata, but appears to be a typo.)
For 9.2, the azdiabetes.dat data are described in Exercise 6 of Chapter 7 (see errata).
Note: This PDF file is generated by Quarto and LualaTeX. There is unsolved problem to display the special character in the source code. Thus, I leave the html version here for reference that displays the complete source code:
Extrapolation: The file swim.dat contains data on the amount of time in seconds, it takes each of four high school swimmers to swim \(50\) yards. Each swimmer has \(6\) times, taken on a biweekly basis.
(a)
Perform the following data analysis for each swimmer separately:
Fit a linear regression model of swimming time as the response and week as the explanatory variable. To formulate your prior, use the information that competitive times for this age group generally range from \(22\) to \(24\) seconds.
For each swimmer \(j\), obtain a posterior predictive distribution for \(Y^{*}_j\), their time if they were to swim \(2\) weeks from the last recorded time.
"""Problem 9.1 (a)"""@protostruct SwimmingModel S =1000# Number of sampling# Data y n =length(y) # number of records# Model X =hcat( ones(n), collect(0:2:10) ) p =size(X)[2]# Prior β₀ =MvNormal([23., 0.], [0.10; 00.1]) ν₀ =1. σ₀² =0.2endfunctionSSR(β, y, X) ssrV = (y - X*β)' * (y - X*β)returnsum(ssrV)endfunctionβ_FCD(σ², m::SwimmingModel) Σₙ =( m.β₀.Σ^-1+ m.X'* m.X / σ²)^-1 μₙ =Σₙ*(m.β₀.Σ^-1* m.β₀.μ + m.X'* m.y / σ²)returnMvNormal(vec(μₙ), Hermitian(Σₙ))endfunctionσ²_FCD(β, m::SwimmingModel) α = (m.ν₀ + m.n)/2 θ = (m.ν₀*m.σ₀²) +SSR(β, m.y, m.X)returnInverseGamma(α, θ)endfunctionpred(X, m::SwimmingModel)# Sampling vector βsmp =zeros(m.S, length(m.β₀.μ)) σ²smp =zeros(m.S) y =zeros(m.S)# Init βsmp[1,:] =rand(m.β₀) σ²smp[1] = m.σ₀² y[1] = m.y[1]for i in2:m.S βsmp[i,:] =rand(β_FCD(σ²smp[i-1], m)) σ²smp[i] =rand(σ²_FCD(βsmp[i-1,:], m))# Predict y[i] = βsmp[i,:]' * X + rand(Normal(0., σ²smp[i]))endreturn (y=y, β=βsmp, σ²=σ²smp)endj_swim =1ms = [ SwimmingModel(y =hcat(ys[i,:]), S=10000 ) for i in1:size(ys)[1] ]ys_pred =zeros(size(ys)[1], ms[1].S) X_pred = [1,12]for i ineachindex(ms) ys_pred[i,:] =pred([1,12], ms[i]).yend## Plottingp = [histogram(ys_pred[i,:], label="Swimmer $i", color="black", xlabel="Week", ylabel="Seconds" ) for i in1:size(ys)[1]]plot(p...)
(b)
The coach of the team has to decide which of the four swimmers will compete in a swimming meet in \(2\) weeks. Using your predictive distributions, compute \(Pr(Y^{*}_{j} = \max\{Y^{*}_1,\dots, Y^{*}_4\}|Y)\) for each swimmer \(j\), and based on this make a recommendation to the coach.
am =argmax(ys_pred, dims=1)y_count =zeros(1, size(ys)[1])for a in am y_count[a[1]] +=1endpmax =vec(y_count ./length(am))## Recommendationds =DataFrame( Dict("Swimmer"=>collect(1:size(ys)[1]),"Pr(Y_i is max)"=> pmax ))
4×2 DataFrame
Row
Pr(Y_i is max)
Swimmer
Float64
Int64
1
0.023
1
2
0.4773
2
3
0.0424
3
4
0.4573
4
Swimmer 2 is the most probable winner.
Problem 9.2
Model selection: As described in Example 6 of Chapter 7, the file azdiabetes.dat contains data on health-related variables of a population of \(532\) women. In this exercise we will be modeling the conditional distribution of glucose level (glu) as a linear combination of the other variables, excluding the variable diabetes.
(a)
Fit a regression model using the \(g\)-prior with \(g=n\), \(\nu_0 =2\) and \(\sigma^{2}_{0} = 1\). Obtain posterior confidence intervals for all of the parameters.
dt = data[1:end , 1:end-1]y =float.(dt[2:end, 2])X =float.(dt[2:end, 1:end.!=2])ns = data[1,1:end-1]ns = ns[1:end.!=2]@protostruct DiabetesModel S =500# Number of sampling# Data y X n =length(y) # number of records p =size(X)[2]# Model # Prior g = n # g prior ν₀ =2. σ₀² =1.endfunctionβ_FCD(σ², m::DiabetesModel)returnβ_FCD(σ², m.g, m.X, m.y)endfunctionβ_FCD(σ², g, X, y) Σₙ = g/(g+1) * σ² * (X'X)^-1 μₙ = g/(g+1) *β̂(σ², y, X) βₙ =MvNormal(μₙ, Hermitian(Σₙ))return βₙ endfunctionβ̂(σ², y, X)return σ² * (X'X)^-1* (X'y / σ²)endfunctionσ²_FCD(m::DiabetesModel)returnσ²_FCD(m.ν₀, m.σ₀², m.n, m.X, m.y, m.g)endfunctionσ²_FCD(ν₀, σ₀², n, X, y, g) α = ν₀ + n /2. θ = (ν₀ * σ₀² +SSR(X, y, g))/2. σ² =InverseGamma(α, θ)return σ²endfunctionSSR(m::DiabetesModel)returnSSR(m.X, m.y, m.g)endfunctionSSR(X, y, g)return y'*(I - g/(g+1)*X*(X'X)^-1*X')*yendm =DiabetesModel(y=y, X=X )σ²smp =zeros(m.S, 1)βsmp =zeros(m.S, size(m.X)[2])for i in1: m.S σ²smp[i] =rand(σ²_FCD(m)) βsmp[i,:] =rand(β_FCD(σ²smp[i], m))endps = [histogram(βsmp[:,i], xlabel="Quantity", ylabel="Probability",normalize=true, label="$(ns[i])", color="black") for i in1:m.p]plot(ps...)
(b)
Perform the model selection and averaging procedure described in Section 9.3. Obtain \(Pr(\beta_j \neq 0 |y)\), as well as posterior confidence intervals for all of the parameters. Compare to the results in part (a).
\[\{y|X_{z(k)},\beta_{z(k)}, \sigma^2\} \sim \text{ multivariate normal }(X_{z(k)}\beta_{z(k)}, \sigma^2 I)\]
inds = prB .>=0.5b_select = prB[inds]ps = [histogram(βsmp[:, i], xlabel="Quantity", ylabel="Probability",normalize=true, label="$(ns[i])", color="black") for i infindall(inds .==1)]plot(ps...)
DataFrame(Dict("Parameters"=> ns[inds], "Confidence interval"=> [quantile(βsmp[:,i], [0.25, 0.975]) for i infindall(inds .==1)]))
1×2 DataFrame
Row
Confidence interval
Parameters
Array…
Any
1
[0.0, 1.1847]
age
Conclusion
There might be some bugs in FCD formulation. The current results show that the age is the only factor remained after model selection. The distribution is far different from what it is in part (a).