Dr. Paul Pronyk, Director of Monitoring and Evaluation for the Millennium Villages Project (joined by Prof. Jeffrey D. Sachs, Director of the MVP and Dr. Prabhjot Singh, Director of System Design of the MVP)
As the Director of Monitoring and Evaluation at the Centre for Global Health and Economic Development at The Earth Institute, I thought I might offer some reflections to this vigorous discussion around the optimal methods for evaluating the Millennium Villages project. Firstly, I’d like to thank those that have contributed to these debates. I truly hope they help move us all forward in our collective understanding of how to best evaluate complex multi-country development initiatives such as the Millennium Villages.
On site selection and randomization
The history and evolution of the project played an important role in shaping decisions regarding evaluation design. When the project was initiated in 2005, there were just two sites – one in Kenya and another in Ethiopia. The aim was to capitalize on the momentum of the UN Millennium Project (2005), which had identified several investment pathways to the MDGs. The UN General Assembly adopted the main recommendations of the Millennium Project in September 2005. The goal of the Millennium Village Project, in essence, is to show how those recommendations and agreements can be translated into practical realities at scale in low-income and especially least-developed countries, and notably in the rural sector dominated by agriculture-centered economies. The principal aim of the project is to achieve the MDGs in a well-documented, replicable, and scalable manner. With just two sites at the start of the project, the MVP’s initial M&E systems therefore were set-up to track changes over time, and to compare with trends at the sub-national and national levels, so that the pathways to the MDGs in these sites could be better understood and could inform scale-up efforts.
The project very quickly expanded to an additional 10 sites in 2006, with the full establishment of these sites extending into 2007. Those supporting the work – the scientists, UN system, donors and governments – desired to see what implementation systems could and should be designed to achieve the MDGs across a highly diverse range of national contexts – where the farming systems, disease profiles, levels of infrastructure and human capacity are likely to differ substantially. The project currently represents over 90% of the agro-ecological zones in Africa.
Within participating countries, host governments and many UN staff worked with Earth Institute scientists to identify new MV project areas. There were three overarching criteria. First, the site should be representative of one of the main agro-ecological zones, so that the MVP as a whole would cover the range of Africa’s main farming systems. Second, communities had to be rural and have high levels of extreme poverty – with at least 20% of children being undernourished. The goal was to intervene in regions with a high prevalence of extreme poverty, hunger, and disease. This ruled out privileged areas, such as regions with extensive high-value cash crops, regions close to urban areas, and regions otherwise favored by distinctive infrastructure (such as highway, rail, power, and other network access). Third, the communities had to express interest in the project and a willingness to participate.
Given these considerations, establishing a systematic process of random selection with 10 host country governments was felt to be politically unworkable and operationally infeasible. Had MVP been operating in a single country, say in a Progresa-type project, particularly where the intervention would be scaled-up at the national level, village-level randomization might have been possible. Even in the case of Progresa, randomization was politically feasible since all communities were informed that Progresa would be scaled up nationally within a short period of time.
We recognize that the Millennium Villages are not and cannot be representative of all villages, since there is a range of village circumstances within a country. On one hand, the MV sites that were chosen are less well off than average when measured against a range of indicators such poverty, health and nutrition status, and access to basic infrastructure. On the other hand, they rarely represent the most extreme isolation in the country either.
How to evaluate?
Given the complex multi-component, multi-sector nature of MVP, the extensive learning-by-doing anticipated in the project, and with 10-12 sites across ten countries and a 10-year evaluation period, what is the most effective approach to program evaluation? As an evaluator, the starting point for the design of any evaluation has to be on the key questions you are trying to answer. In the case of the MVP, these questions include:
- Can the MDGs be achieved in very poor, rural areas across a range of agro-ecological zones?
- What are the most effective systems for delivering key public interventions in health, agriculture, education, infrastructure, and business development, taking into account that these will be designed in a community-based learning-by-doing environment?
- Are the observed changes a result of the program interventions?
- How are systems adapted to local contexts and what are the barriers and facilitators to implementation?
- What is the cost of implementation, and the financial implications for replicability and scalability?
- Finally, what is the sustainability of the suite of interventions identified as the top priorities of the project?
My personal background is in Public Health and HIV/AIDS in Africa, using cluster randomized trials to assess integrated and adaptive interventions similar to MVP. Many of the tools and systems we adopted in MVP mirror the lessons learned from working in these contexts.
Establishing counterfactuals: We established comparison sites after the 2006-7 expansion to 10 countries. Once sites in 10 countries were established, we felt that comparison villages would be helpful to address whether observed differences over time in the Millennium Villages were due to the project or might have taken place on their own.
We chose the comparison sites at random from among three matched candidates. These aren’t single villages but rather clusters of villages – mirroring the MV sites. Questions have been raised regarding matching criteria, how they are weighted, and so forth. It’s safe to say we matched on pretty much everything we could think of, and finding villages that were as poor as the MV sites was a major challenge. The relative importance of various criteria differs between project settings. The idea of using ‘multiple’ comparison clusters per MV site was just not something we felt was feasible –the additional statistical power gained didn’t seem to justify the cost, time and community relations challenges we’d have to endure. I can tell you from many long meetings with village elders that gaining their support to participate in a 10-year long evaluation process without direct tangible benefits from the project for their villages was no easy task.
We also recognize the complexity and limitations of interpreting data from such comparison sites. Through the efforts of governments and NGOs, many of the same kinds of interventions as those being administered in the project would very likely be introduced in these sites. Documenting the full range of inputs and spending in areas where we are not working is an extremely challenging undertaking. One NGO intervention for example, might lead to an even better financed system in health, education, or some other area compared with the Millennium Village site. While comparison sites were established at a ‘safe distance’ (40 km away on average) to match for local ecological conditions and governance issues, over a decade there will certainly be many spillovers from the MV and complex independent changes – such as the adoption or new farming techniques or crop varieties. Alongside these local changes, the MVP is also spurring national policy changes – for example universal access to bed nets in most project countries, national fertilizer subsidies in a growing number of countries, and the introduction of salaried community health workers. (See, for example, the statement of Minister Charity Ngilu, explaining that the Millennium Village Project was crucial in national scale-up of malaria control in Kenya). Thus, the comparison sites are inevitably experiencing spillovers from the MVs, as are most villages in each country in at least some areas of policy. This suggests that difference-in-difference-type comparisons tend to underestimate the measured impact of the villages.
Notwithstanding the challenges of their selection and the limitations in interpreting the results, we went ahead with paired comparisons to glean as much information as best as possible. In the end, we hope that any observed differences between groups will provide a conservative estimate of program effects.
In addition to the comparison sites, we also plot observed changes against national and sub-national data. While these data are collected infrequently, and may not reflect what is happening at the local level in as much detail as we’d like, they provide additional important information on long-term trends that can be helpful in understanding attribution and program effects.
Finally, to enhance our ability to assess whether observed changes were a result of the program, we assess changes along pre-specified impact pathways, and use real-time performance monitoring data and innovations in mobile technology to see whether changes correspond to the timing and sequence of interventions.
Complementary methods: The most exciting questions being explored by the project are, in fact, the ‘how-to’ questions. How can delivery systems be designed and implemented in primary health care, community-based disease control, ICT in the schools, off-grid solar power, agricultural micro-lending, farmer-based cooperatives, and so forth? These are complicated questions crucial to success in the MDGs.
To better understand the local adaptation of the model, what’s working well, what’s not and why, the project employs detailed implementation research – conducting periodic qualitative assessments with site teams, communities, government partners and others. When viewed alongside survey data, these complementary methods help paint a picture of both process and outcomes, and help to document lessons learned from the project. Finally, an economic costing assessment is ongoing to understand spending by site, stakeholder and sector over the full project duration.
On objectivity and oversight
There has been much discussion regarding whether an evaluation that is not ‘independent’ can be truly rigorous. These perspectives were surprising to me coming from the public health field – where virtually all primary research is conducted by the investigators themselves. The amount of oversight within the MV project is quite striking in fact. We are independently overseen by 11 institutional review boards to whom we report annually.
To ensure additional transparency, we took a number of steps that are routine in clinical medicine and public health, but are perhaps less common in other disciplines. For scientific oversight we submitted our evaluation protocol for scientific peer-review at The Lancet– perhaps the world’s most eminent journal. A protocol is only accepted and registered if it survives detailed scrutiny by multiple experts unknown to the project. The MVPs evaluation protocol been accepted. We also registered the evaluation protocol with the US National Institutes of Health. What this means in practice is that all tools and methods have been vetted and are publicly accessible and that the both the selection of outcome indicators and the analysis plans have been defined in advance of data collection and final analysis. We encourage others to subject their own evaluations to the same degree of oversight and transparency.
Project time horizon and reporting
There also seems to be concerns regarding limited impact reporting from the project to date. Just to clarify here – as we’re working across a large number of sites, our evaluation process has been staggered. Each assessment round involves over 10,000 different surveys, a similar number of biological specimens and growth measurements, as well as crop-yield assessments and market surveys. For sites starting in 2006 and extending into 2007, the end-of-year 3 data collection process finishes at the end of 2010, with data cleaning and analysis taking approximately an additional 6 months (through 2011). We did release very limited before-and-after findings from 4 sites last year to enhance public understanding of the program in advance of the UN MDG Summit. These were not intended to be received as an evaluation report. We are just now in the process of writing up early scientific findings. An impact paper examining nutrition outcomes across all sites and comparing these to national trends will be published in the American Journal of Clinical Nutrition this week. Several other cross-site impact papers that examine changes relative to comparison sites are under review.
Open data access
There have also been calls for making the MV data publically available that warrant brief comment. Open data access is a very complex issue to summarize. While there are a number of publically available data sets such as the Demographic and Health Surveys that anyone can access and conduct secondary analyses, making primary research publically available has been unusual for a number of reasons:
– Intellectual investment: The basic principle here is that scientists who spend the years of investment in the fundraising, intervention design, data collection and analysis should be the first to analyze the results of their efforts. In academia, data is our currency – it is the fruit of innovation and exploration, and why many of us come to work in the morning. In many universities, the publications that result from the scientific process governs who gets grants, it underpins career advancement and how promotions are tendered, and determines the level of government support received by institutions. In the MV context, there are additional sensitivities. All too often African scientists who the primary implementers and researchers on projects are left to gather data while academics elsewhere with resources and time do the analyses and take credit. Within the MVP, we are committed to respecting the intellectual investment of these scientists, and ensuring that adequate time is provided for them to conduct the primary analyses of data that they have spent many years collecting.
– Confidentiality and Institutional Review Boards: Maintaining the confidentiality of participants and communities is of the utmost importance. Community confidentiality is a particular concern for the MV sites given their very ‘public’ profile. In addition, the release of data to those not listed on study protocols is a violation of Institutional Review Board stipulations that govern human subjects research.
– Data archiving systems: Anyone who has worked with complex longitudinal data sets knows the amount of time, energy and resources required to transform a data set into something that can be a public resource. The MV is currently working with more than 1000 datasets. Data anonymization, linking data sets to one another, ensuring data meet existing meta data standards to ease interpretation, establishing archiving systems, setting out data sharing and licensing agreements, and many other steps are enormous undertakings and are generally underfunded by donors.
Finally, the project is just mid-way through a 10-year research initiative. While we respect the important principles of transparency and stand behind the scientific integrity of the project, it would be irresponsible to make mid-stream data sets available to those not involved in its collection prior to the systematic process of vetting data quality, conducting consistency checks, performing basic analyses and subjecting these processes to peer-review to ensure the findings and their interpretation are clear and accurate.
Summary
In all program evaluations, there are trade-offs and limitations – and the need to design systems that are appropriate to the real-world questions and challenges faced by the project. Evaluators should never start with a methodology and define their intervention around it. Rather we have to start with the challenges at hand, and construct the best possible methods to understand what works and why. With this in mind, we have done our best to pull together a suite of analytical methods for learning, documentation, monitoring, evaluation, and scaling.
The Millennium Villages are offering a wealth of knowledge about the systems needed to achieve the MDGs. We are noting not only the key successes – such as in the reduction of malaria, the mobilization of community health workers, the development of pre-paid electricity systems, and more – but we are also learning about the challenges, costs, human resource needs, strategies for community leadership, methods of national policy scale up, and much more. The tools and methods that are in place, and new systems that are being developed over time, allow for the measurement of specific outcomes while simultaneously providing insights into how real-time systems of public services and investment can be replicated and scaled.
I look forward to ongoing discussion, debate, and constructive new ideas around these and other issues, and welcome colleagues to meet with us at The Earth Institute or in the Villages themselves to understand the project and these complex systems and design challenges first hand.
Dr. Paul Pronyk is the Director of Monitoring and Evaluation for the Millennium Villages Project at the Earth Institute, Columbia University. He is based in New York.
Dr. Jeffrey Sachs is the Director of The Earth Institute, Quetelet Professor of Sustainable Development, and Professor of Health Policy and Management at Columbia University. Dr. Sachs is also President and Co-Founder of Millennium Promise.
Dr. Prabhjot Singh is Assistant Professor of International and Public Affairs and leads the System Design Group of the Millennium Villages Project. He is based at the Center on Globalization and Sustainable Development, the Earth Institute, Columbia University, New York.