The Health Poverty Index (HPI) Visualisation Tool Background Image Background Image Background Image
   
Welcome to the HPI tool
HPI Indicators
HPI Demonstration
HPI Background
The HPI Tool

All Indicators > Indicator SH2: Health capital

Definition Individuals potential for health across the life course
Dimension Situation of health
Sector Health status (individual)
Components
  • SH2_1 Obesity
  • SH2_2 Blood pressure
  • SH2_3 Cholesterol
  • SH2_4 Low birth weight (for infants)
Source Various – see component details

Component SH2_1: Obesity

Definition Modelled estimate of proportion with a Body Mass Index greater than 30. Body Mass Index is calculated from height and weight data (i.e. the ratio of weight (kg)/height (m2))
Source

2001, 2001 Ethnic: Health Survey for England, 1998 to 2001, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London / Department of Health
(See: Health Survey for England)

2003: Health Survey for England, 2001 to 2003, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London / Department of Health
(See: Health Survey for England)

Additional details

In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of obesity. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.

A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.

The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity). Data were gathered from the Health Survey for England (HSE) (1998 – 2003 ) for estimates for both the whole population and for the ethnic groups.

In 1999, the focus of the HSE was the health of minority ethnic groups as a means to increase understanding through the monitoring of trends that will enable us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Indian, Pakistani, Bangladeshi, Chinese, and Irish. For the purpose of this estimate, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers). The table below shows the number of ethnic groups available for each year that were used in the modelling of estimates for ethnic groups:

   
Year
Total
   
1998
1999
2000
2001
Ethnic Group White
18019
10437
8851
17322
54629
Black Caribbean
183
2029
143
296
2651
Black African
143
73
98
172
486
Indian
321
1909
203
287
2720
Pakistani
198
2148
91
225
2662
Bangladeshi
73
1905
64
83
2125
Chinese
39
961
17
37
1054
Total
18796
19462
9467
18422
66327

Only the main, adult sample, and not the oversampled ‘special populations’, was included in the modelling process for the whole population. For the ethnic population estimates, the adult and 1999 ethnic minority boost was used.

Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts, logistic model was run, with level one being the individual i, level two the primary sampling unit j and level three the LAD k. Covariates from within the survey, shown in lower case, and LAD level data, shown in upper case, were used to predict the individual level behaviour.

Logit (Pijk) = Xijk B + Ujk + Vk + Eijk

Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random ‘noise’ elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. These covariates are given in the table below:

2001 Total Population - Obesity
    Covariates
  Constant -1.980
Individual effects 20-24 years 0.547
  25-29 years 0.927
  30-34 years 1.150
  35-39 years 1.306
  40-44 years 1.271
  45-49 years 1.467
  50-54 years 1.610
  55-59 years 1.608
  60-64 years 1.714
  65-69 years 1.660
  70-74 years 1.599
  75+years 1.213
  Male -0.160
  Social class I, II and IIIA -0.248
  Income Support recipient 0.240
PSU area effects Proportion Black 0.582
  Proportion Asian -0.308
  Proportion higher social class -0.318
  Proportion living alone -0.631
LAD area effects Proportion higher social class -1.123

 

2001 Ethnic Groups - Obesity
    Covariates
  Constant -2.56
Individual effects Bangladeshi -1.09
Black African 0.058
Black Caribbean 0.292
Chinese -1.49
Indian -0.236
Pakistani 0.051
20-24 years 0.531
25-29 years 0.862
30-34 years 1.103
35-39 years 1.275
40-44 years 1.236
45-49 years 1.435
50-54 years 1.558
55-59 years 1.571
60-64 years 1.665
65-69 years 1.612
70-74 years 1.553
75+ years 1.177
Male -0.203

 

2003 Total Population - Obesity
    Covariates
  Constant -1.934
Individual effects 20-24 years 0.736
  25-29 years 0.803
  30-34 years 1.223
  35-39 years 1.294
  40-44 years 1.344
  45-49 years 1.452
  50-54 years 1.600
  55-59 years 1.520
  60-64 years 1.677
  65-69 years 1.641
  70-74 years 1.532
  75+years 1.213
  Male -0.074
  Higher social class -0.213
  Income Support recipient 0.192
PSU Area Effects Proportion Asian -0.233
  Proportion higher social class -0.482
  Proportion living alone -0.265
LAD area effects Proportion higher social class -0.296

Step 2
The fixed effects part of the model were then taken and applied to the matrix of small area covariates X held by SDRC for 100% of individuals and LADs across England, the random LAD area effect added (where it was available for an LAD), and the anti-logit applied. The probability was then summed and averaged over the LAD to produce a vector of synthetic LAD level estimates:

Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )

This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the model. If they were not, the sample design would not bias the standard estimators of the parameters.

Included in our models are measures of non-manual social classes and a ‘level’ for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year that the estimates will not be biased.


Component SH2_2: Blood Pressure

Definition Modelled estimate of proportion with high blood pressure - SBP>=160 mmHg or DBP>=95 mmHg.
Source

2001, 2001 Ethnic: Health Survey for England, 1998 to 2001, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London / Department of Health (See: Health Survey for England)

2003: Health Survey for England, 2001 to 2003, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London / Department of Health (See: Health Survey for England)

Additional details

In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of high blood pressure. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.

A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.

The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity). Data were gathered from the Health Survey for England (HSE) (1998 – 2003) for estimates for both the whole population and for the ethnic groups.

In 1999, the focus of the HSE was the health of minority ethnic groups as a means to increase understanding through the monitoring of trends that will enable us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese, and Irish. For the purpose of this estimate, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers). The table below shows the number of ethnic groups available for each year that were used in the modelling of estimates for ethnic groups:

   
Year
Total
   
1998
1999
2000
2001
Ethnic Group White
18019
10437
8851
17322
54629
Black Caribbean
183
2029
143
296
2651
Black African
143
73
98
172
486
Indian
321
1909
203
287
2720
Pakistani
198
2148
91
225
2662
Bangladeshi
73
1905
64
83
2125
Chinese
39
961
17
37
1054
Total
18796
19462
9467
18422
66327

Only the main, adult sample, and not the oversampled ‘special populations’, was included in the modelling process for the whole population. For the ethnic population estimates, the adult and 1999 ethnic minority boost was used.

Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts, logistic model was run, with level one being the individual i, level two the primary sampling unit j and level three the LAD k. Covariates from within the survey, shown in lower case, and LAD level data, shown in upper case, were used to predict the individual level behaviour.

Logit (Pijk) = Xijk B + Ujk + Vk + Eijk

Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random ‘noise’ elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. These covariates are given in the table below:

2001 Total Population - Blood Pressure
    Covariates
  Constant -5.276
Individual effects 20-24 years 0.747
  25-29 years 0.516
  30-34 years 1.388
  35-39 years 1.875
  40-44 years 2.522
  45-49 years 3.017
  50-54 years 3.491
  55-59 years 3.754
  60-64 years 4.063
  65-69 years 4.344
  70-74 years 4.710
  75+years 4.798
  Male 0.111
  Social class I, II and IIIA -0.099
  Income Support recipient 0.107
LAD area effects Proportion higher social class -0.936

 

2001 Ethnic Groups - Blood Pressure
    Covariates
  Constant -5.669
Individual effects Bangladeshi 0.058
Black African 0.656
Black Caribbean 0.357
Chinese -0.076
Indian 0.568
Pakistani 0.338
20-24 years 0.738
25-29 years 0.703
30-34 years 1.392
35-39 years 1.969
40-44 years 2.641
45-49 years 3.103
50-54 years 3.605
55-59 years 3.857
60-64 years 4.21
65-69 years 4.476
70-74 years 4.844
75+years 4.953
Male 0.125

 

2003 Total Population - Blood Pressure
    Covariates
  Constant -5.950
Individual effects 20-24 years 1.404
25-29 years 1.105
30-34 years 2.001
35-39 years 2.489
40-44 years 3.144
45-49 years 3.548
50-54 years 4.076
55-59 years 4.252
60-64 years 4.438
65-69 years 4.749
70-74 years 5.026
75+years 5.308
Male 0.092
Higher social class -0.121
Income Support recipient 0.054
LAD area effects Proportion higher social class -0.523

Step 2
The fixed effects part of the model were then taken and applied to the matrix of small area covariates X held by SDRC for 100% of individuals and LADs across England, the random LAD area effect added (where it was available for an LAD), and the anti-logit applied. The probability was then summed and averaged over the LAD to produce a vector of synthetic LAD level estimates:

Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )

This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the model. If they were not, the sample design would not bias the standard estimators of the parameters.

Included in our models are measures of non-manual social classes and a ‘level’ for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year, the estimates will not be biased.


Component SH2_3: Cholesterol

Definition Modelled estimate of proportion with high cholesterol - if valid cholesterol result >=6.5 mmol/l
Source

2001, 2001 Ethnic: Health Survey for England, 1998 to 2001, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology and Public Health, University College London / Department of Health (See: Health Survey for England)

2003: Health Survey for England, 2003, Joint Survey Unit of the National Centre for Social Research and the Department of Epidemiology (See: Health Survey for England)

Additional details

In the absence of any suitable administrative or census data, survey data was the only source of information available to construct an indicator of high cholesterol. However there are a number of problems associated with using survey data to produce Local Authority District (LAD) estimates, including small or non-existent samples in some areas leading to large variances and unstable estimates and biases introduced by particular sampling strategies.

A great deal of work, particularly in the last twenty years, has gone into addressing these issues. Although a number of different approaches have been used, all the methods tend to fall somewhere on a continuum between using direct estimates, suitably weighted for sample design, and a modelling approach using local area covariates to estimate the indicator of interest. Some are based on only one or other of the methods. However the two methods each have their own particular problems. Direct estimates, weighted as necessary, are unbiased but may have large variances; on the other hand the modelled estimates will have small variances but will be biased. Hence many estimates attempt to combine information from both in order to solve the common problem of minimising the Mean Square Error of the final estimate.

The method used in the HPI required that a well-fitted micro level model could be identified. It also assumed that the important ways in which a group may have been over-sampled in a survey sample can be captured by covariates available in the survey and at a small area level. It involved combining all surveys available for the required year with the necessary dependent and independent variables (e.g. socio-economic status, age, gender and ethnicity). Data were gathered from the Health Survey for England (HSE) (1998 – 2003) for estimates for both the whole population and for the ethnic groups.

In 1999, the focus of the HSE was the health of minority ethnic groups as a means to increase understanding through the monitoring of trends that will enable us to make predictions. For this purpose a boost sample was designed in order to yield interviews with members of the most populous six minority ethnic groups: Black Caribbean, Black African, Indian, Pakistani, Bangladeshi, Chinese and Irish. For the purpose of this estimate, Irish is included under White, and the Black African group added (although this sample was not boosted, hence the low numbers). The table below shows the number of ethnic groups available for each year that were used in the modelling estimates for ethnic groups:

   
Year
Total
   
1998
1999
2000
2001
Ethnic Group White
18019
10437
8851
17322
54629
Black Caribbean
183
2029
143
296
2651
Black African
143
73
98
172
486
Indian
321
1909
203
287
2720
Pakistani
198
2148
91
225
2662
Bangladeshi
73
1905
64
83
2125
Chinese
39
961
17
37
1054
Total
18796
19462
9467
18422
66327

Only the main, adult sample, and not the oversampled ‘special populations’, was included in the modelling process. For the ethnic population estimates, the adult and 1999 ethnic minority boost was used. Cholesterol levels were derived from the blood samples taken on the nurse visit.

Step 1
Using combined survey data, with LAD geocoding, a multi-level, variable intercepts, logistic model was run, with level one being the individual i, level two the primary sampling unit j and level three the LAD k. Covariates from within the survey, shown in lower case, and LAD level data, shown in upper case, were used to predict the individual level behaviour.

Logit (Pijk) = Xijk B + Ujk + Vk + Eijk

Where P is a vector of probabilities associated with individual i in Primary Sampling Unit (PSU) j within LAD k, B a vector of regression coefficients, X a matrix of covariates associated with the individual measured within the survey, U a random vector of area effects associated with the PSU and V the LAD and E is a vector of independent random ‘noise’ elements. The matrix of covariates included PSU area measures, based on aggregated individual level survey counts within the PSU. These covariates are given in the table below:

2001 Total Population - Cholesterol
    Covariates
  Constant -3.915
Individual effects 20-24 years 0.451
  25-29 years 1.406
  30-34 years 1.862
  35-39 years 1.957
  40-44 years 2.194
  45-49 years 2.667
  50-54 years 2.955
  55-59 years 3.105
  60-64 years 3.289
  65-69 years 3.511
  70-74 years 3.529
  75+years 3.356
  Male -0.276
  Income Support recipient 0.147
PSU area effects Proportion Black -0.783

 

2001 Ethnic Groups - Cholesterol
    Covariates
  Constant -3.957
Individual effects Bangladeshi -0.435
Black African -0.5
Black Caribbean -0.656
Chinese -0.69
Indian -0.308
Pakistani -0.672
20-24 years 0.453
25-29 years 1.488
30-34 years 1.869
35-39 years 2.006
40-44 years 2.255
45-49 years 2.626
50-54 years 3.002
55-59 years 3.108
60-64 years 3.264
65-69 years 3.474
70-74 years 3.488
75+years 3.308
Male -0.205

 

2003 Total Population - Cholesterol
    Covariates
  Constant -4.136
Individual effects 20-24 years 1.325
  25-29 years 1.747
  30-34 years 2.114
  35-39 years 2.560
  40-44 years 2.813
  45-49 years 2.991
  50-54 years 3.265
  55-59 years 3.686
  60-64 years 3.726
  65-69 years 3.702
  70-74 years 3.654
  75+years 3.486
  Male -0.075
  Income Support recipient 0.076

Step 2
The fixed effects part of the model were then taken and applied to the matrix of small area covariates X held by SDRC for 100% of individuals and LADs across England, the random LAD area effect added (where it was available for an LAD), and the anti-logit applied. The probability was then summed and averaged over the LAD to produce a vector of synthetic LAD level estimates:

Yk = 1 / Nk x Sum ( anti-Logit ( Xijk B + Vk ) )

This method does not use weighting to remove bias in the parameter estimators introduced by unequal selection probabilities in the survey sampling schemes. Instead important characteristics of the sample are included in the model as covariates. The sample indicator variable S will therefore be unrelated to Y conditional on these covariates. In this case the sample can be viewed as uninformative and ignorable. There is little conflict in including theses covariates because they are, by definition, predictors of Y and so should be included in the model. If they were not, the sample design would not bias the standard estimators of the parameters.

Included in our models are measures of non-manual social classes and a ‘level’ for the primary sampling unit. Together these will capture, to a great extent, the unequal selection probabilities associated with the sample design. Other variables such as age will ensure that where a question or measure was taken of only a particular age group in a specific survey year, the estimates will not be biased.


Component SH2_4: Low birthweight

Definition Number of singleton live births under 2500 grams as a percentage of total live births
Source

2001, 2001 Ethnic: Annual District Birth Extract, 1999, 2000, 2001, ONS

2003: Annual District Birth Extract, 2001, 2002, 2003, ONS

Additional details

Births without a stated birth weight, extreme birth weight values of less than 500g and more than 6,000g and stillbirths have been excluded.

 

Further Information

The HPI tool is in the third phase of development. We would welcome your feedback.

Please remember to reference the project if you use the data or charts from this site.

Dibben, C, Sims, A., Watson, J., Barnes, H., Smith, T., Sigala, M. , Hill, A. and Manley, D. (2004) The Health Poverty Index. South East Public Health Observatory, Oxford.

Email us:
» Keep up to date
» Send feedback

 

Information Centre logo