, I wrote an article concerning the concept (and a few functions!) of density estimation, and the way it’s a highly effective software for a wide range of strategies in statistical evaluation. By overwhelmingly well-liked demand, I believed it could be attention-grabbing to make use of density estimation to derive some perception on some attention-grabbing information — on this case, information associated to authorized concept.
Though it’s nice to dive deep into the mathematical particulars behind the statistical strategies to type a strong understanding behind the algorithm, on the finish of the day we need to use these instruments to derive cool insights from information!
On this article, we’ll use density estimation to research some information concerning the influence of a two-verdict vs. a three-verdict system on the juror’s perceived confidence of their remaining verdict.
Contents
Background & Dataset
Our authorized system within the US makes use of a two-option verdict system (responsible/not responsible) in prison trials. Nonetheless, another international locations, particularly Scotland, use a three-verdict system (responsible/not responsible/not confirmed) to find out the destiny of a defendant. On this three-verdict system, jurors have the extra selection to decide on a verdict of “not confirmed”, which implies that the prosecution has delivered inadequate proof to find out whether or not the defendant is responsible or harmless.
Legally, the “not confirmed” and “not responsible” verdicts are equal, because the defendant is acquitted below both final result. Nonetheless, the 2 verdicts carry totally different semantic meanings, as “not confirmed” is meant to be chosen by jurors when they don’t seem to be satisfied that the defendant is culpable for or harmless from the crime at hand.
Scotland has just lately abolished this third verdict as a result of its complicated nature. Certainly, when studying about this myself, I came across conflicting definitions for this verdict — some sources outlined it as the choice to pick when the juror believes that the defendant is culpable, however the prosecution has didn’t ship adequate proof to convict them. This may occasionally give a defendant who has been acquitted by the “not confirmed” final result an identical stigma as a defendant who was discovered responsible within the eyes of the general public. In distinction, different sources outlined the decision as the center floor between responsible and innocence (complicated!).
On this article, we’ll analyze information containing the perceived confidence of verdicts from mock jurors below the two-option and three-option verdict system. The information additionally accommodates info concerning whether or not there was conflicting proof current within the testimony. These options will enable us to research whether or not the perceived confidence ranges of jurors of their remaining verdicts differ relying on the decision system and/or the presence of conflicting proof.
For extra details about the information, take a look at the doc.
Density Estimation for Exploratory Evaluation
With out additional ado, let’s dive into the information!
mock <- learn.csv("information/MockJurors.csv")
abstract(mock)

Our information consists of 104 observations and three variables of curiosity. Every remark corresponds to a mock juror’s verdict. The three variables we’re fascinated about are described beneath:
verdict
: whether or not the juror’s resolution was made below the two-option or three-option verdict system.battle
: whether or not conflicting testimonial proof was current within the trial.confidence
: the juror’s diploma of confidence of their verdict on a scale from 0 to 1, the place 0/1 corresponds to low/excessive confidence, respectively.
Let’s take a short have a look at every of those particular person options.
# barplot of verdict
ggplot(mock, aes(x = verdict, fill = verdict)) +
geom_bar() +
geom_text(stat = "depend", aes(label = after_stat(depend)), vjust = -0.5) +
labs(title = "Depend of Verdicts") +
theme(plot.title = element_text(hjust = 0.5))
# barplot of battle
ggplot(mock, aes(x = battle, fill = battle)) +
geom_bar() +
geom_text(stat = "depend", aes(label = after_stat(depend)), vjust = -0.5) +
labs(title = "Depend of Battle Ranges") +
theme(plot.title = element_text(hjust = 0.5))
# crosstab: verdict & battle
# i.e. distribution of conflicting proof throughout verdict ranges
ggplot(mock, aes(x = verdict, fill = battle)) +
geom_bar(place = "dodge") +
geom_text(
stat = "depend",
aes(label = after_stat(depend)),
place = position_dodge(width = 0.9),
vjust = -0.5
) +
labs(title = "Verdict and Battle") +
theme(plot.title = element_text(hjust = 0.5))



The observations are evenly cut up among the many verdict ranges (52/52) and practically evenly cut up throughout the battle
issue (53 no, 51 sure). Moreover, the distribution of battle
seems to be evenly cut up throughout each ranges of verdict
i.e. there are roughly an equal variety of verdicts made below conflicting/no conflicting proof recorded for each verdict programs. Thus, we are able to proceed to check the distribution of confidence ranges throughout these teams with out worrying about imbalanced information affecting the standard of our distribution estimates.
Let’s have a look at the distribution of juror confidence ranges.
We will visualize the distribution of confidence ranges utilizing density estimates. Density estimates, can present a transparent, intuitive show of a variable’s distribution, particularly when working with giant quantities of knowledge. Nonetheless, the estimate could differ significantly with respect to a couple parameters. As an example, let’s have a look at the density estimates produced by varied bandwidth choice strategies.
bws <- listing("SJ", "ucv", "nrd", "nrd0")
# Arrange a 2x2 grid for plotting
par(mfrow = c(2, 2)) # 2 rows, 2 columns
for (bw in bws) {
pdf_est <- density(mock$confidence, bw = bw, from = 0, to = 1)
# Plot PDF
plot(pdf_est,
primary = paste("Density Estimate: Confidence (", bw, ")" ),
xlab = "Confidence",
ylab = "Density",
col = "blue",
lwd = 2)
rug(mock$confidence)
# polygon(pdf_est, col = rgb(0, 0, 1, 0.2), border = NA)
grid()
}
# Reset plotting structure again to default (optionally available)
par(mfrow = c(1, 1))

The density estimates produced by the Sheather-Jones, unbiased cross-validation, and regular reference distribution strategies are pictured above.
Clearly, the selection of bandwidth may give us a really totally different image of the boldness degree distribution.
- Utilizing unbiased cross-validation gives the look that the distribution of
confidence
could be very sparse, which isn’t stunning contemplating how small our dataset is (104 observations). - The density estimates produced by the opposite bandwidths are pretty related. The estimates produced by the traditional reference distribution strategies seem like barely smoother than that produced by Sheather-Jones, for the reason that regular reference distribution strategies use the Gaussian kernel of their computation. Total, confidence ranges seem like extremely concentrated round values of 0.6 or higher, and its distribution seems to have a heavy left tail.
Now, let’s get into the attention-grabbing half and study how juror confidence ranges could change relying on the presence of conflicting proof and the decision system.
# plot distribution of Confidence by Battle
# use Sheather-Jones bandwidth for density estimate
ggplot(mock, aes(x = confidence, fill = battle)) +
geom_density(alpha = 0.5, bw = bw.SJ(mock$confidence)) +
labs(title = paste("Density: Confidence by Battle")) +
xlab("Confidence") +
ylab("Density") +
theme(plot.title = element_text(hjust = 0.5))

It seems that juror confidence ranges don’t differ a lot within the presence of conflicting proof, as proven by the big overlap within the confidence
density estimates above. Maybe within the presence of no conflicting proof, jurors could also be barely extra assured of their verdicts, because the confidence
density estimate below no battle seems to indicate increased focus of confidence values higher than 0.8 relative to the density estimate below the presence of conflicting proof. Nonetheless, the distributions seem practically the identical.
Let’s study whether or not juror confidence ranges differ throughout two-option vs. three-option verdict programs.
# plot distribution of Confidence by Verdict
# use Sheather-Jones bandwidth for density estimate
ggplot(mock, aes(x = confidence, fill = verdict)) +
geom_density(alpha = 0.5, bw = bw.SJ(mock$confidence)) +
labs(title = paste("Density: Confidence by Verdict")) +
xlab("Confidence") +
ylab("Density") +
theme(plot.title = element_text(hjust = 0.5))

This visible supplies extra compelling proof to recommend that confidence
ranges are usually not identically distributed throughout the 2 verdict programs. It seems that jurors could also be barely much less assured of their verdicts below the two-option verdict system relative to the three-option system. That is supported by the truth that the distribution of confidence
below the two-option and three-option verdict programs seem to peak round 0.625 and 0.875, respectively. Nonetheless, there’s nonetheless vital overlap within the confidence
distributions for each verdict programs, so we would want to formally take a look at our declare to conclude whether or not confidence ranges differ considerably throughout these verdict programs.
Let’s study whether or not the distribution of confidence
differs throughout joint ranges of verdict
and battle
.
# plot distribution of Confidence by Battle & Verdict
# use Sheather-Jones bandwidth for density estimate
ggplot(mock, aes(x = confidence, fill = battle)) +
geom_density(alpha = 0.5, bw = bw.SJ(mock$confidence)) +
facet_wrap(~ verdict) +
labs(title = paste("Density: Confidence by Battle & Verdict")) +
xlab("Confidence") +
ylab("Density") +
theme(plot.title = element_text(hjust = 0.5))

Analyzing the distribution of confidence
stratified by battle
and verdict
offers us some attention-grabbing insights.
- Underneath the two-verdict system, confidence ranges of verdicts made below conflicting proof/no conflicting proof seem like very related. That’s, jurors appear to be equally assured of their verdicts within the face of conflicting proof when working below the standard responsible/not responsible judgement paradigm.
- In distinction, below the three-option verdict, jurors appear to be extra assured of their verdicts below no conflicting proof relative to when conflicting proof is current. Their corresponding density plots present that verdicts with no conflicting proof present a lot increased focus at excessive
confidence
ranges (confidence
> 0.75) in comparison with verdicts made with conflicting proof. Moreover, there are practically no verdicts made below the absence of conflicting proof the place the jurors reportedconfidence
ranges lower than 0.2. In distinction, within the presence of conflicting proof, there’s a a lot bigger focus of verdicts that had lowconfidence
ranges (confidence
< 0.25).
Formally Testing Distributional Variations
Our exploratory information evaluation confirmed that juror confidence ranges could differ relying on the decision system and whether or not there was conflicting proof. Let’s formally take a look at this by evaluating the confidence
densities stratified by these components.
We’ll perform checks to check the distribution of confidence
within the following settings (as we did above in a qualitative method):
- Distribution of
confidence
throughout ranges ofbattle
. - Distribution of
confidence
throughout ranges ofverdict
. - Distribution of
confidence
throughout ranges ofbattle
andverdict
.
First, let’s evaluate the distribution of confidence
within the presence of conflicting/no conflicting proof. We will evaluate these confidence
distributions throughout these battle
ranges utilizing the sm.density.evaluate() perform that’s supplied as a part of the sm bundle. To hold out this take a look at, we are able to specify the next key parameters:
x
: vector of knowledge whose density we need to mannequin. For our functions, this shall beconfidence
.group
: the issue over which to check the density ofx
. For this instance, this shall bebattle
.mannequin
: setting this toequal
will conduct a speculation take a look at figuring out whether or not the distribution ofconfidence
differs throughout ranges ofbattle
.
Moreover, we’ll set up a typical bandwidth for the density estimates of confidence
throughout the degrees of battle
. We’ll do that by computing the Sheather-Jones bandwidth for the confidence
ranges for every battle
subgroup, then computing the harmonic imply of those bandwidths, after which set that to the bandwidth for our density comparability.
For all of our speculation checks beneath, we shall be utilizing the usual α = 0.05 standards for statistical significance.
set.seed(123)
# outline subsets for battle
no_conflict <- subset(mock, battle=="no")
yes_conflict <- subset(mock, battle=="sure")
# compute Sheather-Jones bandwidth for subsets
bw_n <- bw.SJ(no_conflict$confidence)
bw_y <- bw.SJ(yes_conflict$confidence)
bw_h <- 2/((1/bw_n) + (1/bw_y)) # harmonic imply
# evaluate densities
sm.density.evaluate(x=mock$confidence,
group=mock$battle,
mannequin="equal",
bw=bw_h,
nboot=10000)

The output of our name to sm.density.evaluate() produces the p-value of the speculation take a look at talked about above, in addition to a graphical show overlaying the density curves of confidence
throughout each ranges of battle
. The big p-value (p=0.691) means that we’ve inadequate proof to reject the null speculation that the densities of confidence
for battle/no-conflict are equal. In different phrases, this implies that jurors in our dataset are inclined to have related confidence of their verdicts, no matter whether or not there was conflicting proof within the testimony.
Now, we’ll conduct an identical evaluation to formally evaluate juror confidence ranges throughout each verdict programs.
set.seed(123)
# outline subsets for battle
two_verdict <- subset(mock, verdict=="two-option")
three_verdict <- subset(mock, verdict=="three-option")
# compute Sheather-Jones bandwidth for subsets
bw_2 <- bw.SJ(two_verdict$confidence)
bw_3 <- bw.SJ(three_verdict$confidence)
bw_h <- 2/((1/bw_2) + (1/bw_3)) # harmonic imply
# evaluate densities
sm.density.evaluate(mock$confidence, group=mock$verdict, mannequin="equal",
bw=bw_h, nboot=10000)

We see that the p-value related to the comparability of confidence
throughout the two-verdict vs. three-verdict system is way smaller (p=0.069). Though we nonetheless fail to reject the null speculation, a p-value of 0.069 on this context implies that if the true distribution of confidence
ranges was an identical for two-verdict and three-verdict programs, then there’s an roughly 7% probability that we come throughout empirical information the place the distribution of confidence
throughout each verdict programs differs at the least as a lot as what we see right here. In different phrases, our empirical information is pretty unlikely to happen if jurors have been equally assured of their verdicts throughout each verdict programs.
This conclusion aligns with what we noticed in our qualitative evaluation above, the place it appeared that the boldness ranges for verdicts below the two-verdict vs. three-verdict system have been totally different — particularly, verdicts below the three-verdict system gave the impression to be made with increased confidence than verdicts made below two-verdict programs.
Now, for the needs of future investigation, it might be nice to increase the information to incorporate the ultimate verdict resolution (i.e. responsible/not responsible/not confirmed). Maybe, this extra information might assist make clear how jurors really see the “not confirmed” verdict.
- If we see increased confidence ranges within the “responsible”/“not responsible” verdicts below the three-verdict system relative to the two-verdict system, this may increasingly recommend that the “not-proven” verdict is successfully capturing the uncertainty behind the choice making of the jurors, and having it as a 3rd verdict supplies fascinating flexibility that two-option verdict system lacks.
- If the boldness ranges within the “responsible”/“not responsible” verdicts are roughly equal throughout each verdict programs, and the boldness ranges of all three verdicts are roughly equal within the three-verdict system, then this may increasingly recommend that the “not confirmed” verdict is serving as a real third possibility impartial of the everyday binary verdicts. That’s, jurors are opting to decide on “not confirmed” primarily for causes aside from their uncertainty behind classifying the defendant as responsible/not responsible. Maybe, jurors view “not confirmed” as the decision to decide on when the prosecution has didn’t ship convincing proof, even when the juror has a touch of the true culpability of the defendant.
Lastly, let’s take a look at whether or not there are any variations within the distribution of confidence
throughout totally different ranges of battle
and verdict
.
To check for variations within the distribution of confidence throughout these subgroups, we are able to run a Kruskal-Wallis take a look at. The Kruskal-Wallis take a look at is a non-parametric statistical technique to check for variations within the distribution of a variable of curiosity throughout teams. It’s applicable if you need to keep away from making assumptions concerning the variable’s distribution (i.e. non-parametric), the variable is ordinal in nature, and the subgroups below comparability are impartial of one another. Primarily, it’s possible you’ll consider it because the non-parametric, multi-group model of a one-way ANOVA.
R makes this simple for us through the kruskal.take a look at() API. We will specify the next parameters to hold out our take a look at:
x
: vector of knowledge whose distribution we need to evaluate throughout teams. For our functions, this shall beconfidence
.g
: issue figuring out the teams over which we need to evaluate the distribution ofx
. We’ll set this togroup_combo
, which accommodates the subgroups ofverdict
andbattle
.
kruskal.take a look at(x=mock$confidence,
g=mock$group_combo) # group_combo: subgroups outlined by verdict, battle

The output of the Kruskal-Wallis take a look at (p=0.189) means that we lack adequate proof to assert that juror confidence ranges differ throughout ranges of verdict
and battle
.
That is considerably sudden, as our qualitative evaluation appeared to recommend that partitioning every verdict
group by battle
segmented the confidence
values in a significant approach. It’s worthy to notice that there was a small quantity of knowledge in every of those subgroups (25-27 observations), so accumulating extra information may very well be a subsequent step to research this additional.
Future Investigation & Wrap-up
Let’s briefly recap the outcomes of our evaluation:
- Our exploratory information evaluation appeared to point that juror confidence ranges differed throughout verdict programs. Moreover, the presence of conflicting proof appeared to have an effect on juror confidence ranges within the three verdict system, however have little have an effect on within the two-verdict system. Nonetheless, none of our statistical checks supplied vital proof to assist these conclusions.
- Though our statistical checks weren’t supportive, we shouldn’t be so fast to dismiss our qualitative evaluation. Subsequent steps for this investigation might embrace getting extra information, as we have been working with solely 104 observations. Moreover, extending our information to incorporate the decision choices of the jurors (responsible/not responsible/not confirmed) might allow additional investigation into when jurors choose to decide on the “not confirmed” verdict.
Thanks for studying! If in case you have any further ideas about how you’ll’ve carried out this evaluation, I’d love to listen to it within the feedback. I’m actually no area professional on authorized concept, so making use of statistical strategies on authorized information was an important studying expertise for me, and I’d love to listen to about different attention-grabbing issues on the intersection of the 2 fields. In case you’re fascinated about studying additional, I extremely suggest trying out the sources beneath!
The creator has created all photographs on this article.
Sources
Information:
Authorized concept:
Statistics: