I finally got so tired about everyone talking about how concerned they were about the lack of diversity in STEM that I decided to study the question and my conclusion is that I have to call BS. For what it is worth, here is the book I wrote with the results of my research, my recommendations, and plenty of commentary. THERE ARE NOW TWO VERSION BELOW: ONE COMPLETE, AND ONE WITH THE MATH AND CODE DEPRECATED WHICH IS HALF THE SIZE.
I made the deprecated math and code version because I have noticed that some otherwise intelligent people don't like math and these sections were not essential for the main point. In fact, my favorite bit of feedback I got was, "Love the book, despite the math!"
As a side benefit, as I said above, this had the effect of making the book about half the size, so it is now a faster and easier read since there is no need to consciously decide what to skip.
Of course, there are plenty of people who like to read, and are even possibly interested in the ideas presented here, but just don't have the time, so I will list the key take away ideas. I will try and confine this list to the conclusions which were a bit surprising. In other words, conclusions like "STEM is not very diverse" , which we confirmed, are important, but not really interesting because everybody already knows that.
The biggest surprise for me was that math may actually be less diverse when it comes to darker people than other parts of STEM. We assumed that all STEM subjects were all approximately equally bad, but we found a significant difference between math and computer science. I just said "significant" but we did not do any follow up hypothesis testing, so I will leave that to others. The significant difference I am referring to is the observation that math appeared to be significantly worse than computer science at replicating the diversity of its graduate students relative to its faculty. This raises all sorts of interesting questions: is computer science better than math because it is a newer subject without the same legacy of the disparagement of Africans and mathematics? Perhaps the computer science numbers are just better due to the large numbers south Asians excelling in this area? Would we have found similar numbers if we had compared mathematics with, say, physics? I only picked math and computer science since these were the areas I was most familiar with. Of course, this means that, in retrospect, my initial assumption that math and cs were good "proxies" for all of STEM, which I stated in the first paragraph of the book, is not exactly correct and the title should have been "On the Lightness Distribution of Math vs CS". That doesn't have quite the same ring to it though.
In the original version of this document, I actually performed the same analysis with gender. The preliminary conclusions I came to were that, as a group, women were doing significantly better than darker Black, Latinx, and Native American students. I actually expected some difference but it was larger than I expected. I decided that I might not have enough data and that it might be prudent to stay in my gender lane. My suspicion is that the progress of women in recent years was mainly confined to white and Asian women however, and while this is certainly a good thing, one does have to ask the unpleasant question about the relative resources being allocated to various populations. Keep in mind, I am now confining these conjectures to math and computer science; hopefully biology, for example, is doing a better job.
Over the past 70 years, since Brown v. Board of Education, plenty of time and money has been spent allegedly working to improve the diversity problem in math -- I am going to stop saying that my conclusions can be extrapolated outside of math. I think it is safe to say that the results have been pretty paltry and therefore these resources were allocated inefficiently at best. Whenever I see this sort of persistent incompetence and failure to achieve objectives, I start to wonder what incentives are operating and which parties are benefitting from the status quo. This has led to me a really terrible conclusion: many of the diversity efforts in mathematics of the past 70 years have been largely performative and were never really expected to lead to any significant results in the diversity of math faculty. Moreover, the primary beneficiaries were always just the faculty and universities involved. The grant system simply isn't up to the task of change on a large geographic and temporal scale apparently. All those years of me saying, "Hey, why don't we just find groups of young talented students, including the underrepresented, and teach them? That's what worked with me and every other smart mathematician I know" to anybody who would listen, were futile since it wasn't clear how it could be turned into a grant (diversity problem = nail, grant = hammer). The caveat to include the underrepresented is crucial since there are actually many such programs but they just don't seem to have many Black, Latinx, and Native American kids. For that matter, I suspect that they don't even have many poor white kids.
This second to last point was a surprise to me but perhaps it should not have been. Suppose you are the parent of a mathematically talented underrepresented child and you want to send them off to a college which has a good math reputation. Suppose further that you don't have lots of money, so you really can't afford to make any mistakes since poor kids don't get endless fail up "do over" opportunities. It turns out to be extremely difficult to find an answer to a simple question like "What percentage of the graduates of your math department are X?" where X represents the identity of the underrepresented child. This information should be transparently available at the level of departments, not merely universities: students go to universities but they major in subjects housed by departments, and, taking a microeconomic view, in the absence of this information, how can optimal decisions be made? Moreover, taking a macroeconomic view, how can an efficient market be created? From this perspective, it is really no surprise that we have seen no progress. I address my efforts to find this information in the book.
This last point is less a conclusion and more of a recommendation. It has been obvious for many years that college costs were out of control, and thereby pricing out poor students. As a result, many students are looking for alternative paths in general, and more career focused education and training in particular. Online learning modalities, supported by AI, show some promise in this arena with some caveats however. The first caveat is obvious: keep it affordable. Yes, AI training is power hungry and expensive, but inference is relatively cheap, and this cannot be an excuse for absurd inflation in costs. The second caveat is apparently not obvious at all: make sure that online classes do not simply replicate the same negative messages about underrepresented students that they sometimes encounter with in person education. This was an entirely unexpected issue and I address it in a case study in the book.
Penultimately, as I have said elsewhere, this document is really a warning to current and future students and a kind of "Hail Mary pass" into the future. The current math and math education establishment can't afford to agree with any of this since it is obvious that I am talking about them when I say beneficiaries. If it wasn't obvious, it should be now. To anyone who tries to address this problem in the future, try and avoid repeating the mistakes of the past.
Finally, if you are someone who has worked in this area and you feel like your results are great and are offended, I'm sorry. I am not talking about you personally. Obviously, I am making broad statistical statements that don't necessarily apply to individuals or even particular programs. By all means, if underrepresented graduates of your program are consistently going on to successfully graduate from math and computer science doctoral programs, I would urge you to find a way to scale up your efforts and also to better publicize the results.