The case for randomized trials in public policy
Policy needs to focus on results first, and RCTs enable us to do so.
The following is an excerpt from my second book Promoting Progress: A Radical New Agenda to Create Abundance for All. You can order e-books at a discounted price at my website, or you can purchase full-price ebooks, paperback, or hardcovers on Amazon.
Other books in my “From Poverty to Progress” book series:
See also my other articles on reforming the policy-making process:
The case for randomized trials in public policy (this article)
In my previous post, I made the case that government policies typically fail to accomplish their desired results for two key reasons:
Our political system has not identified policies that actually work.
People involved in politics do not care.
The problem is not bad people. The problem is a bad implementation and evaluation process. Our political system is very poor at implementing solutions that actually work and then iterating based on results. The American people know it and are gradually losing confidence in our governing institutions.
Fortunately, there is another path.
Experimentation
We need to figure out better ways of actually implementing solutions to the problems that people care about. At its most basic, we need to develop a methodology for implementing the best possible solution, given our resources in a highly complex society. Modern societies are so complex that no one truly understands the results of many of our policies, the incentive those policies create, and the second-order behaviors of people as a result of those incentives.
This is a tough nut to crack that gets to the heart of trying to use policies to get desired results in extraordinarily complex societies. Fortunately, we can copy a methodology that is quite common in other domains.
Medical researchers also face very high levels of complexity within biological systems that they do not fully understand. They have developed methodologies to partially overcome that problem.
Congress funds medical research but does not tell researchers what works. Congress gives medical researchers funding to solve a problem, usually involving curing a specific disease or ailment. Congress also gives medical researchers the latitude to conduct research, develop options, and experiment with each of those options while forcing each to use a rigorous methodology.
Imagine if we did that for every issue!
Randomized Controlled Trials
As Jim Manzi pointed out in his book Uncontrolled, (click the link to read summary) a time-tested methodology for determining the most effective policies is Randomized Controlled Trials (RCTs). Widely used in the medical field and rapidly catching on in business, RCTs are an experimental form of impact evaluation in which potential recipients are randomly sorted into two groups: an experimental group, which receives the treatment, and a control group, which does not.
In this book series, I have applied the concept of evolution to a number of areas. I believe that the concept of evolution is extremely useful for understanding complex interactions.
One can think of RCTs as a controlled experiment in evolution. If the government systematically performed RCTs using many different policy options, we would create the variation necessary to fuel evolution. By using clear metrics of the results, we would be measuring outcomes in the same way that evolution “measures” results by the probability that a variation leads to survival and reproduction.
For example, let’s use the testing of a Covid vaccine as an example. Health researchers might post a call for volunteers to participate in a study. They might receive a small financial reward and a warning of potential risks. All those who volunteer are sorted into two groups: one that receives a Covid vaccine, and another that receives an injection of a placebo. After a period of time, the researchers look at the difference in the number of participants who actually got Covid. If the difference is statistically significant, and if there were no adverse effects, then the vaccine is deemed effective.
If done rigorously and with a large enough sample size, RCTs are the gold standard in identifying which public policies work. Unfortunately, they are rarely used in the field of public policy.
We have run enough RCTs to know that most government programs fail to achieve positive results. Unfortunately, we have not run enough RCTs to identify which policies actually do work.
Widespread use of RCTs that force the government to focus on results would do much to rebuild the confidence of the American people in their government. Citizens intuitively understand that both sides in the partisan wars just want to win, ram through their policies, and ignore results.
Negative public attitudes towards our institutions are not solely based on populism or irrational distrust of experts. The people intuitively understand what most policy experts know but rarely talk about.
A Better Process
I believe that we should implement a better policy-making process that is more likely to yield positive results. The new process would look something like this:
1. When Congress passes legislation, instead of implementing a program, they clearly define the problem, determine how much money will be spent identifying the best solution to the problem, establish metrics of success, and identify possible solutions that need to be tested. No actual programs would be implemented at this time.
2. A newly-created Bureau of Policy Assessment designs a series of RCTs that will test each of the proposed solutions.
3. The methodology is posted on the internet as open for comment.
4. The methodology is vetted by a newly-created Congressional Committee on Policy Assessment and independent experts on RCTs.
5. Once approved, the executive bureaucracy runs the series of RCTs. For example, the U.S. Department of Housing and Human Services would run RCTs on policies related to housing.
6. The numeric results of the study are posted to the internet, and they are analyzed by the Bureau of Policy Assessment, the Congressional Committee on Policy Assessment, and independent experts.
7. For any programs that achieve positive initial results, steps #2-#6 are repeated with different groups of a larger sample size to verify the results and hone in on which specific characteristics of the solution that work best.
8. Based on the results, Congress could pass additional legislation that slowly scales up a new program and increases the number and size of the RCTs.
9. All beneficiaries of the new program are required to participate in RCTs so that the program can keep improving.
The above process is obviously a radical departure from current practice. Rather than Congress or state legislatures implementing a program, they should instead decide on which problem to solve and how much funding should be devoted to finding a solution. Federal or state bureaucrats would then be required to run hundreds of small-scale RCTs trying to identify the policy that most cost-effectively solves the problem.
Ideally, every reasonable policy option that does not conflict with fundamental constitutional rights or is prohibitively expensive should be tested. The policy that produces the best results relative to the cost should go on to further rounds of RCTs with larger sample sizes and different populations.
Some will probably complain that his new process is slow and cumbersome. They might claim that they will make it difficult for the government to react to sudden emergencies.
I would argue, however, that domestic issues rarely suddenly emerge. Think of homelessness, drug addiction, poverty, health care, education, etc. They are problems that have existed for generations, if not all of human history. Though political activists define every issue as a crisis, policy problems almost always have a long history.
I would also argue that it is far better to wait a few months or even years to implement a policy that actually works rather than rushing through a policy that is highly likely to fail. Our track record of implementing programs that show positive, long-term results is quite poor. Virtually every problem that we have today would be in a better place if the government had gone through this process five years ago.
It is important to remember that shutting down a program that does not work frees up additional funding resources for programs that do work. So a program that is determined to be a failure early in the implementation process should be viewed as a positive achievement. The alternative is wasting money for decades, while programs that actually do work are being starved for funding.
If a policy is a true emergency, then Congress can establish large amounts of money for a large number of rapid RCTs. Covid, for example, was such a sudden, unanticipated emergency in 2020. If all viable interventions had been tested using RCTs in 2020, we would know far more about what works today.
It is also possible to blend quick action with long-term RCTs. For example, we could have had a lockdown for the elderly and nursing homes, which were obviously the most vulnerable sectors, while subjecting vaccines, mask mandates, school closings, and business closings to RCTs. Fortunately, we did do RCTs for vaccines, but not for the other policies.
Of course, it is impossible in practice to test all possible ideas. Foreign policy, for example, is not very conducive to RCTs. Nor are policies with long delays between funding and receiving benefits, such as Social Security. It is also much more difficult to run RCTs on policies that have already been implemented, particularly if they are defined as an entitlement.
But we should experiment with a few of the policies that seem most likely to have positive results while carving out a domain for continual experimentation even after programs have been scaled up. After all, the best policy in 2020 may not be the best policy in 2030.
Most importantly, Congress and state legislatures should not authorize large-scale funding for a new program until it has passed many levels of rigorous RCTs. This would be a radical departure from current practice. Many elected officials would hate this because it undercuts their ability to force through a program regardless of results.
Doing so would effectively limit the power of elected officials and bureaucrats to decide how the government solves problems. In this sense, it would decentralize power. I know that this is not using the word “decentralization” in its usual sense, but my proposed process does undermine the power of political elites to determine how the government solves problems. They could only fully fund a new policy if it has already been rigorously tested and thereby proven to show net positive results.
Of course, Congress would still have the power to choose which problems need solving and the budget devoted to finding those solutions. Rather than just implementing a solution as they do today, bureaucrats should be required to conduct large numbers of small controlled experiments to see which policy option best solves the problem.
Rigorous RCTs
Not all RCTs are equally good. In order to properly run RCTs, they need to fulfill a number of rigorous requirements, including:
The methodology used in the study must be fully vetted and approved by independent experts before the study commences.
A large sample size must be used to ensure statistically valid results.
Random sorting of participants into either the test group or the control group must be done after they decide to participate to eliminate selection bias.
Double-blind experiments must be used where even those conducting the experiment do not know which group each participant has been sorted into.
To save cost widely available data should be used (getting the data is typically the biggest cost of running an RCT).
A wide variety of interventions should be assessed. This may be within the same trial or in separate trials.
Transparency of results by posting on the internet in understandable language.
Short duration of tests, if possible, to enable rapid iteration.
Initial trials should be followed up by long-term assessments to see if the impact lingers.
There should be multiple rounds of RCTs in different geographical locations and with different demographic groups. This helps to verify the initial results and identify sub-populations with differing results.
The federal government and state governments should establish a Bureau of Policy Evaluation whose sole purpose is to run evaluation studies on specific policy problems. Congress should also establish its own Congressional Office of Policy Evaluation as a potential check. Those who staff these new organizations must learn how to effectively run RCTs from medical researchers and business analysts. They should also work closely with academic experts in policy evaluation.
These new institutions should establish best practices for running RCTs and posting those practices on the internet for all to see and comment on. These departments should also be insulated from partisan politics, interest groups, and the bureaucratic agencies that will actually implement the best policy in the future as much as possible.
An Example
Let me give a few examples of how this might work. Let’s focus on teaching methods first. Education is the perfect domain for widespread RCTs. With roughly 100,000 schools and even more classrooms, the sample sizes could be in the millions. And there is no reason to wait for annual standardized tests. The testing could be delivered weekly, daily, or even hourly while the students learn.
Rather than teachers and administrators arguing over which teaching methods work, try them all at scale. Teachers in one class can each use one teaching methodology, while other teachers use different methodologies. To a certain extent, this is already done, but what if all students' test scores are matched to the teaching methodology? What if the content is delivered via computer along with assessments?
All subjects could be divided up into small ten-minute online modules with quick assessments to see if the student grasped the concepts. If the student fails to grasp the concepts, then a student could repeat the same content using another teaching methodology, to see if a different teaching method works better. Oh, by the way, this is good teaching practice anyway, so it would not interfere with student learning.
And this does not undermine our ability to have standards. All institutions are caught between trying to enforce certain minimum standards for all while still maintaining enough variation to allow the standards to improve.
RCTs are a perfect mechanism for doing this. With huge sample sizes, it would be simple for the vast majority of students to receive what is currently considered to be the most effective teaching methodology, while 20% get what other experts think might be a better method. Thus 80% of students would effectively be the control group, while the other 20% could be many different experimental groups.
That 20% might be randomly divided into ten different teaching methodologies. With digital technology, we could easily get meaningful results in just a few weeks. If one methodology outperformed the current standard, then it could become the new standard for the next group of students.
Constantly Improving Standards
Contrary to what most people in government think, uniformity is not a good thing; it is often a bad thing. Rather than thinking of a standard as what every person needs to receive, we should see it as what most people receive because it is currently the most impactful intervention.
We do not just want standards; we want constantly improving standards. And we want those standards to be based upon results, not politics, interest groups, or personal preference. RCTs at scale would enable the vast majority of students to get the best possible results, while still constantly trying to improve that standard.
If you add on all the benefits of federalism that I discussed earlier, I can foresee local and state governments conducting a vast number of RCTs. Federal aid to poorer state governments could be tied to states running RCTs. The federal government would give them the financial incentive by paying for these trials and the expertise for how to do so. If all levels of government are running thousands of different RCTs in dozens of different policy domains, imagine the learning that would take place.
The above was an excerpt from my forthcoming book Promoting Progress: A Radical New Agenda to Create Abundance for All. You can order e-books at a discounted price at my website, or you can purchase full-price ebooks, paperback, or hardcovers on Amazon.
Other books in my “From Poverty to Progress” book series:
See also my other articles on reforming the policy-making process:
The case for randomized trials in public policy (this article)