April 17, 2026

EXpert in Medical

Self Love, Healthy Love

Patterns of maternal and child health services utilization and associated socioeconomic disparities in sub-Saharan Africa

Patterns of maternal and child health services utilization and associated socioeconomic disparities in sub-Saharan Africa

Data source

In this multi-national study, we analyzed publicly available data from the DHS program36. The DHS are nationally representative cross-sectional surveys, mostly done in low-and middle-income countries. The surveys contain information on household characteristics, HIV, reproductive health, women’s and children’s health, nutrition, and mortality37. DHS are conducted by national central statistics agencies or research institutes. Procedures and questionnaires utilized are reviewed and approved by ICF and country-specific Institutional Review Boards. The institutions that approved, provided funding for, or implemented the surveys were responsible for ethical clearance, which guaranteed consent (for children, consent was given by their caregiver), and confidentiality of the respondents’ information. The surveys use similar multi-stage cluster sampling methods to select women of reproductive age (15–49 years) and children younger than five years for inclusion38. We used the most recent survey (conducted no earlier than 2014 until May 2024) from 31 SSA countries (see details in Supplementary Note 1). We utilized the birth recode file, which contains full birth history for women interviewed, including information on pregnancy, postnatal care, and health for children born within the last five years. We focused exclusively on births that ended in U5M, and on the last birth for each woman, as the majority of available RMNCH utilization indicators pertain to the last birth. The final dataset contained 9307 births meeting these criteria. No statistical method was used to predetermine the current study’s sample size. In this pooled dataset, we rescaled the sampling weights (provided by DHS), such that each country’s total weight is proportional to the country’s population size during the year of survey. We obtained population estimates from the United Nations World Population Prospects39.

Utilization indicators

We selected 16 key internationally recognized core health indicators, including ten RMNCH intervention indicators, and six protective environmental and cultural factors4. The RMNCH indicators describe essential services along the continuum of care for women, neonates, and children, including family planning demand satisfaction; antenatal care; antenatal iron supplementation; neonatal tetanus protection; institutional childbirth; births attended by skilled health personnel; postpartum care; postnatal care; birth spacing; and breastfeeding. The environmental factors include primary reliance on clean fuels, improved sanitation facilities, and protected drinking water sources. Tobacco use, marriage at a mature age, and decision-making participation by mothers were the three cultural factors. We included these 16 variables based on their relevance in the scientific literature pertaining to U5M4,40. We did not include crucial indicators on immunizations and case management for common illnesses (diarrhea and pneumonia) among children because this data was not collected for births that ended in U5M. Details about the variables are provided in the Supplementary Material (Supplementary Note 2).

Predictors of utilization subgroups

Child’s sex, mother’s age, marital status, and parity were considered as demographic predictors of the utilization subgroups. The family’s wealth quintile, mother’s employment status, education level, and place of residence (i.e., rural or urban) were included as SES indicators.

Data preprocessing

The dataset contained missing values for 14 variables, with relatively low missingness percentages ranging from 0% to 7.84%. These included mother’s employment status (0.16%), health facility delivery (0.17%), skilled delivery provider (0.20%), breastfeeding (0.59%), postnatal check (0.73%), iron pills during pregnancy (0.89%), family planning demand satisfaction (1.34%), neonatal tetanus protection (1.62%), antenatal care visits (4+) (1.71%), protected drinking water source (2.25%), clean cooking fuel (2.25%), improved sanitation facility (2.29%), postpartum check (5.78), and marriage at mature age (7.84%). Mode imputation was performed for each variable. We conducted a sensitivity analysis to assess the impact of using an imputed dataset compared to performing a complete-case analysis. We found no significant differences in the revealed latent classes (see Supplementary Note 3).

MLCA

The overall analytical strategy of this study involved two stages. First, we defined latent variables representing patterns of maternal and child health service utilization across multiple countries. Second, we examined the association between utilization patterns and SES, and subsequently quantified inequality gaps in service utilization across population subgroups.

To identify data-driven patterns of maternal and child health service utilization, we used latent class analysis (LCA). LCA is a “person-centered” approach that groups individuals into discrete classes or groups based on a set of responses to a set of observable variables41. The LCA model assumes that observations are independent of each other. However, this assumption is often violated when the data have a multilevel structure, namely when lower-level units are nested in higher-level ones42, as is our case (individuals are nested within primary sampling units nested within countries). For this reason, we employed the MLCA method42 to account for the hierarchical data structure. To maintain analytical tractability while still accounting for the hierarchical structure in the data, we focused exclusively on country-level clustering in our multilevel framework. We fitted a multilevel latent class model estimated with a two-step estimator using the multiLCA43 function in R. The model was specified with individual births as the lower-level units and countries as the higher-level units. The model was initialized using the k-means algorithm to establish low-level latent classes, followed by the application of the expectation maximization (EM) algorithm to optimize the model’s fit to the data by maximizing the log likelihood formally defined as:

$$L\left(m{{m}}{{,}}{{{\mathbf{\omega }}}}\right)={\sum }_{j=1}\log P\left({{{{\bf{Y}}}}}_{j}\right)$$

(1)

and parameterized by \({{{\mathbf{\Phi }}}}\), the class-specific item-response probabilities for the low-level latent classes; \({{{\mathbf{\Pi }}}},\) the conditional low-level class-membership probabilities for individuals given their country’s high-level class; and \({{{\mathbf{\omega }}}}\), the distribution of countries in high-level latent classes. \(P({{{{\bf{Y}}}}}_{j})\) represents the probability of observing a specific combination of responses within each country, and is formally defined as:

$$P\left({{{{\bf{Y}}}}}_{j}\right)={\sum }_{m=1}^{M}{{{{\rm{\omega }}}}}_{m}{\prod }_{i=1}^{{n}_{j}}{\sum }_{t=1}^{T}{\pi }_{t{{|}}m}{\prod }_{h=1}^{H}P\left({Y}_{{ijh}}\bigg|{X}_{{ij}}=t\right)$$

(2)

In this context, \({Y}_{{ij}}={({Y}_{{ij}1},\ldots,{Y}_{{ijH}})}^{{\prime} }\) is the vector of observed responses, \({Y}_{{ijh}}\) is the response of an individual \(i=(1,\ldots,{n}_{j})\) in a country \(j=\left(1,\ldots,J\right)\) on the \({h}_{{th}}\) utilization indicator variable, with \(h=1,\ldots,H\). A multilevel latent class model specifies the probability \(P({{{{\bf{Y}}}}}_{j})\) of observing a particular response configuration for each country \(j.\) The probability expression \({{{{\rm{\omega }}}}}_{m}\) represents the probability that country \(j\) belongs to class \(m\), \({\pi }_{t|m}\) denotes the conditional probability that an individual belongs to latent class \(t\) given that their country is in latent class \(m\), and \(P({Y}_{{ijh}}|{X}_{{ij}}=t)\) denotes the probability of the response \({Y}_{{ijh}}\) given the latent class \(t\). Details of this model have been published elsewhere42.

To determine the optimal number of high-level and low-level latent classes, we systematically explored various model configurations by estimating latent class models for one to nine classes to ascertain the optimal structure for latent classes. We first examined low-level latent classes using the 16 indicators of utilization of maternal and child health services for each individual. The selection criteria were based on evaluating the AIC44, BIC45, and ICL-BIC46 for each model. AIC is defined as:

$${{{\rm{AIC}}}}=-\!\!2\,{{{\mathrm{ln}}}}\,L+2k$$

(3)

where: \(L\) is the maximized likelihood of the model given the data; \(k\) is the number of parameters in the model; and \(n\) is the number of data points. BIC, on the other hand, is defined as:

$${{{\rm{BIC}}}}=-\!\!2\,{{{\mathrm{ln}}}}\,L+k\,{{{\mathrm{ln}}}}\,n$$

(4)

The ICL-BIC is an extension of BIC for latent class and mixture models, by adding an entropy term. It is defined as:

$${{{\rm{ICL}}}}-{{{\rm{BIC}}}}={{{\rm{BIC}}}}+{{{\rm{Entropy}}}}$$

(5)

Entropy measures how well the model separates data into distinct latent classes. Therefore, ICL-BIC accounts for both model fit and the quality of the latent class solution, preferring models that produce clear, well-separated groups. This approach enabled us to assess the trade-off between model complexity and fit, guiding our selection of the most parsimonious and theoretically sound model. Upon establishing the optimal low-level latent class model, we extended our analysis to include high-level latent classes, focusing on the nested nature of the data with countries as the high-level indicators. We similarly evaluated models ranging from one to nine high-level latent classes based on AIC, BIC, and ICL-BIC, aiming to identify the configuration that best balanced detailed representation of the data with model simplicity.

Countries were assigned to high-level classes based on the maximum posterior probabilities derived from the MLCA. Each country was categorized into one of three groups by identifying the class with the highest probability from the model outputs. Bar plots were used to represent the assigned high-level class assignment for each country. Similarly, for individuals, class assignments were made by selecting the low-level class with the highest posterior probability for each individual, after marginalization over the high-level classes. We used radar plots to visually describe the generated low-level latent classes. The plots were generated by normalizing the mean of each categorical indicator for a specific class to the mean of the overall cohort.

Multinomial analysis

Following MLCA, a multivariable multinomial regression model was fitted, with low-level class membership as the dependent variable, and demographic and SES indicators as the independent variables. The multinomial logistic regression model for the three latent classes was formally defined as:

$$\log \left(\frac{P\left({{{{\bf{Y}}}}}_{i}=c\right)}{P\left({{{{\bf{Y}}}}}_{i}=3\right)}\right)={\beta }_{0c}+{\beta }_{1c}{X}_{i1}+\cdots+{\beta }_{{pc}}{X}_{{ip}},\,c=1,\,2$$

(6)

where \({{{{\bf{Y}}}}}_{i}\) is the low-level latent class for individual, \(i\), \({X}_{{ip}}\) are predictor variables, and \(\beta\)’s represent regression coefficients. To properly account for the complex survey design of the DHS datasets, we implemented the analysis using the svydesign framework, incorporating sampling weights, cluster identifiers (primary sampling units), and stratification variables as specified in the DHS sampling methodology. This approach ensures that our estimates correctly reflect the hierarchical sampling structure and provides appropriate standard errors accounting for the design effects. The highest utilization subgroup was used as the reference subgroup for class membership. Odds ratios (OR) with 95% CI of belonging to a given class were reported, and variables with p-value < 0.05 in the multivariable analysis were considered significant predictors of utilization subgroups. All statistical tests were two-tailed to account for both positive and negative associations.

Measures of inequality

To quantify utilization inequalities across wealth, education, place of residence, and employment dimensions, we employed two key inequality measures: the SII and the RCI. SII is an absolute measure of inequality that shows the difference in estimated indicator values between the most-advantaged and most-disadvantaged subgroups, while accounting for all other subgroups—through appropriate regression modeling47. RCI, on the other hand, is a relative measure showing the gradient across population subgroups on a relative scale47. It indicates the extent to which an indicator is concentrated among disadvantaged or advantaged subgroups. Subgroups are weighted by their population share in both measures. For both measures, a value of zero indicates no inequality, positive values indicate a concentration of the indicator among the advantaged, and negative values indicate a concentration of the indicator among the disadvantaged. Since the highest utilization subgroup represents the most desirable health-related characteristics, we applied these inequality measures to evaluate its distribution across the population across various SES groups. We implemented this analysis using the healthequal library in R. To properly account for the complex survey design of the DHS datasets, we specified the sampling weights, cluster identifier (primary sampling units), and stratification variable as specified in the DHS sampling methodology.

All modeling and statistical analysis were performed using R (version 4·3·1) with packages multilevLCA (2.0.1), survey (4.4.2), svrepmisc (0.2.2), and healthequal (1.0.1). Data preprocessing and visualizations were generated using Python (version 3.11.5) with packages pandas (2.2.3) and matplotlib (3.9.2).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

link