A developmental trajectory describes the course of a behavior over age or time.
Daniel Nagin pioneered a method
called Group-based Trajectory Modeling to cluster these trajectories into
groups. Link. This
method is quite popular in the medical and social sciences. In this post I will
take a look at his
paper
from 1999 - Analyzing Developmental Trajectories - A Semiparametric
Group-based approach and provide
some code in R to work through the datasets.
Datasets
There are two interesting datasets associated with this paper. The first is
from the Cambridge study of Delinquint Development. It tracked 411 British
males from a working area of London. Data collection began in the early 60s
when the boys were 8 years old and continued till they were around 32. It
included criminal convictions and measured variables related to a number of
factors including psychological makeup, family circumstances, parenting
behavior and performance in school/work.
The second dataset is a study of 1037 White males of French ancestry. This also
measures similar variables to the Cambridge study.
We shall mainly focus on the Cambridge study.
Cambridge Study
This dataset has many columns and we can make some educated guesses about what is in them
x01-x23 : Offense Counts (Number of offense counts in a year)
x24-x46 : Unknown
t1-t23 : Age
tt1-tt23 : Scaled Age
p1-p23 : Prevalence (Whether an offense was committed that year or not)
y10 : ID
other y’s : Unknown (Probably covariates)
We will mainly be working with the Offense Counts but first let’s convert this
dataset from wide to long. The dplyr toolset makes this easy.
We look at the average number of offense counts by the boys ages and put some
confidence intervals around the mean.
It looks like males commit a lot of offenses in the mid/late teens compared to
the other years.