A new protocol for evaluating putative causes for multiple variables in a spatial setting, illustrated by its application to European cancer rates.
We introduce a statistical protocol for analyzing spatially varying data, including putative explanatory variables. The procedures comprise preliminary spatial autocorrelation analysis (from an earlier study), path analysis, clustering of the resulting set of path diagrams, ordination of these diagrams, and confirmatory tests against extrinsic information. To illustrate the application of these methods, we present incidence and mortality rates of 31 organ- and sex-specific cancers in Europe; these rates vary markedly with geography and type of cancer. Additionally, we investigated three factors (ethnohistory, genetics, and geography) putatively affecting these rates. The five variables were correlated separately for the 31 cancers over European reporting stations. We analyzed the correlations by path analysis, k-means clustering, and nonmetric multidimensional scaling; coefficients of the 31 path diagrams modeling the correlations vary substantially. To simplify interpretation, we grouped the diagrams into five clusters, for which we describe the differential effects of the three putative causes on incidence and mortality. When scaled, the path coefficients intergrade without marked gaps between clusters. Ethnic differences make for differences in cancer rates, even when the populations tested are ancient and complex mixtures. Path analysis usefully decomposes a structural model involving effects and putative causes, and estimates the magnitude of the model's components. Smooth intergradation of the path coefficients suggests the putative causes are the results of multiple forces. Despite this continuity of the path diagrams of the 31 cancers, clustering offers a useful segmentation of the continuum. Etiological and other extrinsic information on the cancers map significantly into the five clusters, demonstrating their epidemiological relevance.