Monday, January 4, 2016

Genetics, Race, and White Privilege

Stephanie Gogarten has a PhD in Astronomy but currently works as a staff scientist in the Department of Biostatistics at the University of Washington. She lives on an island near Seattle with her wife and three young children.

I recently read the book Seeing White, recommended by John Johnson. As an astronomer turned statistical geneticist (Career profile), I spend a fair amount of time at work thinking about genetic ancestry and how that relates to the social construct of race. As a person with some African-American heritage who looks white, I have also struggled with how to define my own race: other people see me as white, but how do I see myself?

In reading the book, I was especially struck by the stories of people whose identities cross the boundary between white and non-white. In particular, I was interested in the story of Gregory Williams, author of Life on the Color Line: The True Story of a White Boy Who Discovered He Was Black. I still remember the dinner table conversation when I was a young child in which my parents explained to me that although my family appeared white, my father's identity was mixed race European American, African American, and Native American. Due to pale skin and a strong resemblance to my white mother, I am seen as white by everyone I meet. My sister has darker skin and more closely resembles my father, so she is more frequently seen as mixed race (at least, people are not shocked when they learn she is mixed race, as they are with me). In the shuffling of genetic traits from one generation to the next, I got more of the external features usually associated with whiteness -- a chance occurrence that has shaped both our identities. 

One of the central arguments of Seeing White is that whiteness is all in how one appears to others; if I am seen as white, I am white. However, the book also contains a discussion of "passing," which (in the context of whiteness) is when a light-skinned person with non-European ancestry deliberately presents as white or allows him- or herself to be seen as white. One of my great-uncles "passed" by cutting ties with the rest of his family for much of his life, in order to pursue career opportunities that would have been denied to him had his African-American ancestry been known. I have never wanted to try to hide my heritage, because there is a certain degree of shame commonly associated with passing. But I feel that I am not really Black enough to qualify as passing. Am I white because other people see me that way?

Throughout my life, I have experienced a certain degree of angst every time I have to fill out a form that asks me to check a box for my race. On forms that ask for one box that best describes me from a limited list, I check "white". But if I am given the option of multiple boxes, or if there is a "mixed" or "other" category, I usually try to decide what the form is asking me. If it is for something like college admissions, I assume what they really want to know is if I have white privilege. I certainly have that in spades, so I check "white" to avoid being given any affirmative action benefits that I don't deserve. If the form is for a census or related to my health, I usually check multiple boxes or "other" to give more complete information. But this is still not satisfying; I really want a box that says, "mostly white, but not entirely."

My frustrations are not unique, and illustrate the problem with race: it is a social construction that attempts to fit complex, multidimensional data into a few discrete boxes. My work in genetics, as well as the availability of direct-to-consumer DNA testing, allowed me to develop a much more nuanced view of genetic ancestry and my own identity.

I sent a saliva sample to 23andMe, which gave me for the first time a way to quantify my identity. 11% of my DNA is African. 87% is European, and less than 1% is Native American. Describing myself as (with rounding) 90% white and 10% black is much more satisfying than being forced to choose between the categories. When forms allow a write-in answer for race, I have taken to supplying percentages.

I also learned that although my grandmother identified strongly as Native American rather than African American, and although I identified with that part of my heritage as a child, the DNA evidence indicates that the Native part of my family tree is small at best. It perhaps says something about the relative cultural standing of Black and Native Americans that some members of my family chose to identify that way.

In addition to providing their best estimates of ancestry percentages, 23andMe also allows customers to download raw data. Since I am now a geneticist, I can employ the methods I use at work every day to analyze my own genome.

The genotyping array used by 23andMe at the time I submitted my sample looked at half a million positions on the genome where there is common variation between people. (The genomes of any two humans are 99.9% identical, and we are 99% identical to chimpanzees, so the differences between humans are explained by a very small fraction of our DNA.) In a technique familiar to many astronomers, we can use Principal Component Analysis (PCA) to find a set of orthogonal components that describe the variance in the data.Plotting the values of their principal components separates people into groups with certain genotype variants in common, i.e., they share a common ancestry.

A resource commonly used by geneticists is the HapMap project, which contains the genotypes of people from around the world. By combining my data with HapMap, I can do PCA to see how similar my DNA is to people from many different populations. Below is a plot of the first two principal components from this analysis, showing me along with several of the populations from HapMap.

The first two PCs define a triangle in which one corner (left) is African, one corner (top) is european, and one corner (bottom right) is Asian. People of Mexican ancestry living in Los Angeles occupy the space between European and Asian, since they are the result of mixing between the European (mostly Spanish) colonizers of Mexico and the indigenous population. (Native Americans are more genetically similar to Asians than to Europeans or Africans. There is no Native American population in HapMap, but if we included them as a separate group in the PCA we would get a three-dimensional structure with four vertices.) African Americans, whose ancestry includes both African and European, mostly fall on the line between the African and European clusters, though a few have Asian ancestry as well. And where am I? The black triangle near the top, about 10% of the way from the European to the African cluster.

If we tried to identify the white people on this plot, it's not obvious where the dividing line should be. My best guess might look something like this:

However, we should keep in mind that the people who contributed their DNA to the HapMap project were selected because they belong to relatively homogeneous populations. This data is useful to geneticists because it serves as a reference: for example, in order to figure out how much of my DNA is African and how much is European, we need to compare my genetic markers to people whose ancestry is completely African and completely European. If we fill in this plot with a more representative sample of the population, the clear separation between different groups disappears. 

The plot below shows the first two principal components from PCA of ~55,000 people genotyped as part of the Gene, Environment Association Studies (GENEVA) consortium.

The scales of the axes and orientation of the triangle defining the three main ancestry groups differ between the two sample sets, but the overall shape of the distribution, and its meaning, remains the same. The colors indicate how the participants in the studies self-identified their race or ethnicity. There is significant overlap between people identifying as white (gray points) and people identifying as Black (pink points). If you had to draw a line between white and non-white on this plot, where would you put it?

It is important to keep in mind that the proportion of variance explained by even the first few principal components is a small fraction of the total variance between any two people. PCA works by identifying a set of variants shared by a particular group of people, but each pair of people in that group likely has a similar number of total differences in their genomes as people in two different groups. Likewise there are no genes found in some people but not in others - all humans have the same set of genes, with variation in the way they are expressed. Try taking this quiz; the answers may surprise you.

Scientists at 23andMe recently published a paper looking at the genetic ancestry of self-identified African Americans, Latinos and European Americans in the United States. They found that the percentages of African, European, and Native American ancestry in these groups varies according to geography.

So where does this analysis leave us? The way race is defined in our society often has little to do with our genes or even our thoughts about ourselves, but comes down to a few external characteristics that allow people to be quickly sorted into "white" and "non-white", "privileged" and "not privileged". I can identify as mixed race, but I can't give up my white privilege.