Note
This tutorial was generated from a Jupyter notebook that can be downloaded here.
Pixel sampling¶
One of the fundamental issues with expressing planetary and stellar surface intensities or albedos as a spherical harmonic expansion is that it can be difficult to enforce positivity of the surface map everywhere. In particular, there’s no way to exactly express the positivity constraint as a prior probability distribution on the spherical harmonic coefficientss. This is a problem not only because the posterior will have support for unphysical solutions, but (more importantly) because a non-negativity prior can be very constraining, greatly reducing the number of degeneracies in the problem.
The approach we recommend here is to perform the sampling in pixel space, but evaluate the model in spherical harmonic space. This combines the best of both worlds: pixel space makes it easy to impose physical constraints on the intensity, while in spherical harmonic space the flux evaluation is analytic, fast, and differentiable. Plus, it’s easy to construct linear operators that take us back and forth between these two spaces (at some finite resolution).
[3]:
import starry
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import colors
import pymc3 as pm
import pymc3_ext as pmx
import theano
import theano.tensor as tt
starry.config.lazy = True
starry.config.quiet = True
cmap = plt.get_cmap("plasma")
cmap.set_under("#666666")
cmap.set_over("w")
cnorm = lambda: colors.Normalize(vmin=0.0)
if starry.compat.USE_AESARA:
theano_config = dict(aesara_config=dict(compute_test_value="ignore"))
else:
theano_config = dict(theano_config=dict(compute_test_value="ignore"))
Mock data¶
Let’s first generate a mock dataset so we can experiment with.
Mock surface map¶
We’ll create a stellar (or planetary) surface map with several large, distinct spots. We’ll do this at high (\(l=20\)) resolution so we can get crisp features without significant ringing. Note that we set the inclination to \(85^\circ\) so we can break the north-south degeneracy from map viewed edge-on.
[4]:
# Params
ydeg_tru = 20
inc = 85
# Instantiate
map_tru = starry.Map(ydeg_tru, inc=inc)
map_tru.add_spot(amp=-0.03, relative=False, sigma=0.05, lat=30, lon=0)
map_tru.add_spot(amp=-0.06, relative=False, sigma=0.1, lat=-20, lon=60)
map_tru.add_spot(amp=-0.03, relative=False, sigma=0.05, lat=10, lon=150)
map_tru.add_spot(amp=-0.03, relative=False, sigma=0.05, lat=60, lon=-90)
map_tru.add_spot(amp=-0.025, relative=False, sigma=0.04, lat=-30, lon=-90)
map_tru.add_spot(amp=-0.025, relative=False, sigma=0.04, lat=0, lon=-150)
map_tru.amp = 1.0
y_tru = np.array(map_tru.y.eval())
amp_tru = 1.0
map_tru.show(projection="moll", colorbar=True, norm=cnorm(), cmap=cmap)

The spot amplitudes were tuned so the intensity remains non-negative everywhere. The background intensity (about 0.4) doesn’t matter for our purposes, since we’re going to normalize the flux when doing inference.
Mock light curve¶
Now let’s generate a mock light curve. As our object rotates, we’ll transit it with several occultors of different sizes and at different impact parameters. This isn’t necessarily realistic, but the point here is to ensure our light curve encodes information about the map at a variety of positions and a variety of scales.
We begin by constructing the flux design matrix, which dots into the spherical harmonic coefficient vector to give us the flux:
[5]:
# Params
npts = 1000
prot = 1.0 / 7.0
bo = [-0.5, 0.25, 0.75, 0.5]
ro = [0.5, 0.75, 0.3, 0.2]
time = np.linspace(0, 1, npts)
theta = (360.0 * time / prot).reshape(len(ro), -1)
X_tru = np.vstack(
[
map_tru.design_matrix(
theta=theta[i],
xo=np.linspace(-1 - ro[i], 1 + ro[i], len(theta[i])),
yo=bo[i],
ro=ro[i],
).eval()
for i in range(4)
]
)
Now we generate the light curve, add a tiny bit of noise, and median-normalize it.
[6]:
# Generate the light curve
flux_tru = amp_tru * X_tru.dot(y_tru)
# Add noise
np.random.seed(0)
ferr_tru = 1e-2
flux = flux_tru + ferr_tru * np.random.randn(len(flux_tru))
# Normalize
norm = np.nanmedian(flux)
flux = flux / norm
ferr = ferr_tru / norm
# Plot
plt.plot(time, flux, "k.", alpha=0.3, label="observed")
plt.plot(time, flux_tru / norm, "C0", label="true")
plt.legend()
plt.xlabel("time")
plt.ylabel("flux");

Inference¶
It’s time to do inference. For simplicity, we’ll assume we know all the information about the occultors and the object’s rotation rate exactly, so the only unknown is the surface map.
We’ll begin by pre-computing some of the linear operators we’ll use throughout this notebook:
[7]:
# Params
ydeg_inf = 15
# Instantiate
map_inf = starry.Map(ydeg_inf, inc=inc)
X_inf = X_tru[:, : (ydeg_inf + 1) ** 2]
lat, lon, Y2P, P2Y, Dx, Dy = map_inf.get_pixel_transforms(oversample=2)
npix = lat.shape[0]
We’ll discuss these in more detail below, but X_inf
is our design matrix at the resolution we’re doing inference at, Y2P
and P2Y
are the linear operators that transform back and forth between spherical harmonic coefficients and pixel intensities, and Dx
and Dy
are the numerical derivative operators, which dot into a vector of pixel intensities to produce the vector of \(x\)- and \(y\)-derivatives at those pixels.
We’ll consider various inference methods below. Let’s store all our solutions in these variables:
[8]:
nmeth = 7
labels = ["L2", "U", "Beta", "U+Mix", "U+TV", "Aizawa", "U+TV+Mix"]
y = np.zeros((nmeth, (ydeg_inf + 1) ** 2))
amp = np.zeros(nmeth)
flux_model = np.zeros((nmeth, npts))
L2 inference¶
Our first method is the usual one: L2 regularization on the spherical harmonic coefficients. The posterior is analytic and so we don’t need to do any sampling. The only thing we need to decide on is the prior on the coefficients, which for simplicity we’ll assume is a zero-mean Gaussian with covariance that’s a scalar multiple of the identity. We just need to choose that scalar, which we’ll call \(\lambda\). If we make \(\lambda\) small enough, we actually can ensure the map is positive everywhere, since we’re severely restricting the variability of the features about the mean (which is non-zero). The problem is that usually that ends up being too restrictive a prior, since we’re forcing the pixel intensities to be clustered about the mean. That makes it difficult to capture surface maps like the one in our example, where the intensities are either high or low, but not often in between.
[10]:
map_inf.set_data(flux, C=ferr ** 2)
map_inf.set_prior(L=1e-4)
map_inf.solve(design_matrix=X_inf)
y[0] = np.array(map_inf.y.eval())
amp[0] = map_inf.amp.eval()
flux_model[0] = amp[0] * X_inf.dot(y[0])

Pixel inference w/ uniform prior¶
Now let’s instead sample in pixel space. We place a uniform prior on the pixels, then transform to spherical harmonics via the P2Y
matrix, and use those to compute the flux. Note that the posterior we care about is still over the spherical harmonic coefficients, since that is the space in which we are computing the light curve model. The pixel sampling is merely a trick so we can enforce our prior.
[12]:
with pm.Model(**theano_config) as model:
# Uniform prior on the *pixels*
p = pm.Uniform("p", lower=0.0, upper=1.0, shape=(npix,))
norm = pm.Normal("norm", mu=0.5, sd=0.25)
x = norm * tt.dot(P2Y, p)
# Compute the flux
lc_model = tt.dot(X_inf, x)
pm.Deterministic("lc_model", lc_model)
lc_model_guess = pmx.eval_in_model(lc_model)
# Store the Ylm coeffs. Note that `x` is the
# *amplitude-weighted* vector of spherical harmonic
# coefficients.
pm.Deterministic("amp", x[0])
pm.Deterministic("y", x / x[0])
# The likelihood function assuming known Gaussian uncertainty
pm.Normal("obs", mu=lc_model, sd=ferr, observed=flux)
[13]:
with model:
soln = pmx.optimize(options=dict(maxiter=9999))
y[1] = np.array(soln["y"])
amp[1] = soln["amp"]
flux_model[1] = soln["lc_model"]
optimizing logp for variables: [norm, p]
message: Desired error not necessarily achieved due to precision loss.
logp: -282471.47846820205 -> 2265.282525824095

Pixel inference w/ Beta prior¶
Looking at the histogram of true pixel intensities above, it looks like we may be able to capture this behavior with a \(Beta(0.5, 0.5)\) prior, which places most of it weight at 0 and 1. Let’s try that and see what happens:
[15]:
with pm.Model(**theano_config) as model:
# Beta prior on the *pixels*
p = pm.Beta("p", alpha=0.5, beta=0.5, shape=(npix,))
norm = pm.Normal("norm", mu=0.5, sd=0.25)
x = norm * tt.dot(P2Y, p)
# Compute the flux
lc_model = tt.dot(X_inf, x)
pm.Deterministic("lc_model", lc_model)
lc_model_guess = pmx.eval_in_model(lc_model)
# Store the Ylm coeffs
pm.Deterministic("amp", x[0])
pm.Deterministic("y", x / x[0])
# The likelihood function assuming known Gaussian uncertainty
pm.Normal("obs", mu=lc_model, sd=ferr, observed=flux)
[16]:
with model:
soln = pmx.optimize(options=dict(maxiter=9999))
y[2] = np.array(soln["y"])
amp[2] = soln["amp"]
flux_model[2] = soln["lc_model"]
optimizing logp for variables: [norm, p]
message: Desired error not necessarily achieved due to precision loss.
logp: -282731.59010644874 -> 2043.3174075886718

Pixel inference w/ Gaussian mixture prior¶
We could also try a Gaussian mixture model to place weight at the two extrema:
[18]:
with pm.Model(**theano_config) as model:
# Uniform prior on the *pixels*
p = pm.Uniform(
"p", lower=0.0, upper=1.0, shape=(npix,), testval=0.99 * np.ones(npix)
)
norm = pm.Normal("norm", mu=0.5, sd=0.25)
x = norm * tt.dot(P2Y, p)
# Mixture of Gaussians
sig = 0.35
pm.Potential("mix", -0.5 * tt.sum(tt.minimum((1 - p) ** 2, (p ** 2))) / sig ** 2)
# Compute the flux
lc_model = tt.dot(X_inf, x)
pm.Deterministic("lc_model", lc_model)
lc_model_guess = pmx.eval_in_model(lc_model)
# Store the Ylm coeffs
pm.Deterministic("amp", x[0])
pm.Deterministic("y", x / x[0])
# The likelihood function assuming known Gaussian uncertainty
pm.Normal("obs", mu=lc_model, sd=ferr, observed=flux)
[19]:
with model:
soln = pmx.optimize(options=dict(maxiter=9999))
y[3] = np.array(soln["y"])
amp[3] = soln["amp"]
flux_model[3] = soln["lc_model"]
optimizing logp for variables: [norm, p]
message: Desired error not necessarily achieved due to precision loss.
logp: -747794.200977716 -> 1961.7108745855085

Pixel inference w/ TV prior¶
Place a total variation (TV) L1 prior on the magnitude of the gradient:
[21]:
with pm.Model(**theano_config) as model:
# Uniform prior on the *pixels*
p = pm.Uniform("p", lower=0.0, upper=1.0, shape=(npix,))
norm = pm.Normal("norm", mu=0.5, sd=0.25)
x = norm * tt.dot(P2Y, p)
# Apply the TV penalty
theta = 0.1
TV = tt.sum(tt.sqrt(tt.dot(Dx, p) ** 2 + tt.dot(Dy, p) ** 2))
pm.Potential("TV", -TV / theta)
# Compute the flux
lc_model = tt.dot(X_inf, x)
pm.Deterministic("lc_model", lc_model)
lc_model_guess = pmx.eval_in_model(lc_model)
# Store the Ylm coeffs
pm.Deterministic("amp", x[0])
pm.Deterministic("y", x / x[0])
# The likelihood function assuming known Gaussian uncertainty
pm.Normal("obs", mu=lc_model, sd=ferr, observed=flux)
[22]:
with model:
soln = pmx.optimize(options=dict(maxiter=9999))
y[4] = np.array(soln["y"])
amp[4] = soln["amp"]
flux_model[4] = soln["lc_model"]
optimizing logp for variables: [norm, p]
message: Desired error not necessarily achieved due to precision loss.
logp: -282471.4792703229 -> 1262.3888741583378

Pixel inference w/ Aizawa+20 prior¶
[24]:
with pm.Model(**theano_config) as model:
# Uniform prior on the *pixels*
p = pm.Uniform("p", lower=0.0, upper=1.0, shape=(npix,))
norm = pm.Normal("norm", mu=0.5, sd=0.25)
x = norm * tt.dot(P2Y, p)
# Apply the L1 penalty on the
# *difference between the pixels and unity*
# since we want sparseness relative to the
# background flux (which is unity)
lam = 1.0
pm.Potential("L1", -lam * tt.sum(tt.abs_((1 - p))))
# Apply the TSV penalty as an L2 norm on the gradient
sig = 0.5
pm.Potential(
"TSV", -0.5 * tt.sum((tt.dot(Dx, p)) ** 2 + (tt.dot(Dy, p)) ** 2) / sig ** 2
)
# Compute the flux
lc_model = tt.dot(X_inf, x)
pm.Deterministic("lc_model", lc_model)
lc_model_guess = pmx.eval_in_model(lc_model)
# Store the Ylm coeffs
pm.Deterministic("amp", x[0])
pm.Deterministic("y", x / x[0])
# The likelihood function assuming known Gaussian uncertainty
pm.Normal("obs", mu=lc_model, sd=ferr, observed=flux)
[25]:
with model:
soln = pmx.optimize(options=dict(maxiter=9999))
y[5] = np.array(soln["y"])
amp[5] = soln["amp"]
flux_model[5] = soln["lc_model"]
optimizing logp for variables: [norm, p]
message: Desired error not necessarily achieved due to precision loss.
logp: -282759.4784682021 -> 1720.3110018401856

Pixel inference w/ TV + Gaussian mixture prior¶
[27]:
with pm.Model(**theano_config) as model:
# Uniform prior on the *pixels*
# Initialize at the solution to the mixture model run
guess = amp[3] * np.maximum(0, np.dot(Y2P, y[3]))
scale = np.max(guess)
pad = 1e-2
guess = pad + guess / scale * (1 - 2 * pad)
p = pm.Uniform("p", lower=0.0, upper=1.0, shape=(npix,), testval=guess)
norm = pm.Normal("norm", mu=0.5, sd=0.25, testval=scale)
x = norm * tt.dot(P2Y, p)
# Mixture of Gaussians
sig = 0.35
pm.Potential("mix", -0.5 * tt.sum(tt.minimum((1 - p) ** 2, (p ** 2))) / sig ** 2)
# Apply the TV penalty
theta = 3.0
TV = tt.sum(tt.sqrt(tt.dot(Dx, p) ** 2 + tt.dot(Dy, p) ** 2))
pm.Potential("TV", -TV / theta)
# Compute the flux
lc_model = tt.dot(X_inf, x)
pm.Deterministic("lc_model", lc_model)
lc_model_guess = pmx.eval_in_model(lc_model)
# Store the Ylm coeffs
pm.Deterministic("amp", x[0])
pm.Deterministic("y", x / x[0])
# The likelihood function assuming known Gaussian uncertainty
pm.Normal("obs", mu=lc_model, sd=ferr, observed=flux)
[28]:
with model:
soln = pmx.optimize(options=dict(maxiter=9999))
y[6] = np.array(soln["y"])
amp[6] = soln["amp"]
flux_model[6] = soln["lc_model"]
optimizing logp for variables: [norm, p]
message: Desired error not necessarily achieved due to precision loss.
logp: 1643.5924431484536 -> 1867.2429073191272

Compare all the methods¶
[30]:
fig, ax = plt.subplots(3, 3, figsize=(15, 8))
ax = ax.flatten()
for axis in ax:
axis.axis("off")
# True map
map_tru.show(
ax=ax[0],
projection="moll",
colorbar=True,
norm=cnorm(),
cmap=cmap,
)
ax[0].annotate(
"true",
xy=(0, np.sqrt(2)),
xycoords="data",
xytext=(0, 5),
textcoords="offset points",
ha="center",
va="bottom",
fontsize=10,
fontweight="bold",
)
# True map at inference resolution
map_inf.amp = amp_tru
map_inf[1:, :] = y_tru[1 : (ydeg_inf + 1) ** 2]
map_inf.show(
ax=ax[1],
projection="moll",
colorbar=True,
norm=cnorm(),
cmap=cmap,
)
ax[1].annotate(
"true (l={})".format(ydeg_inf),
xy=(0, np.sqrt(2)),
xycoords="data",
xytext=(0, 5),
textcoords="offset points",
ha="center",
va="bottom",
fontsize=10,
fontweight="bold",
clip_on=False,
)
# Inferred maps
for i in range(nmeth):
map_inf.amp = amp[i]
map_inf[1:, :] = y[i][1:]
map_inf.show(
ax=ax[i + 2],
projection="moll",
colorbar=True,
norm=cnorm(),
cmap=cmap,
)
ax[i + 2].annotate(
labels[i],
xy=(0, np.sqrt(2)),
xycoords="data",
xytext=(0, 5),
textcoords="offset points",
ha="center",
va="bottom",
fontsize=10,
fontweight="bold",
clip_on=False,
)

[ ]: