1 Introduction

Gender differences in behavior under competitive pressures on the one hand, and in attitudes towards competition on the other, have been recognized for over a decade now (Gneezy et al. 2003; Niederle and Vesterlund 2007; Balafoutas and Sutter 2012; Wozniak et al. 2014; Brandts et al. 2015). The existing literature has predominantly focused on one particular dimension of competition, which is a rivalry for resources (Stigler 1987). There is, however, another dimension that seems to have escaped scholarly attention. Competition typically entails a ranking of relative performance, since high-ranking performance determines the winner(s) in competitive environments. If socially recognized, such a performance ranking yields a ranking in terms of social status, as defined by Ball et al. (2001). This ‘social’ aspect of the recognition is important. If a ranking is only privately known, then no social status is involved.

Competition often creates a social-status ranking amongst the competitors. For example, competition for highly regarded jobs or promotions involves a rivalry for resources where some people are successful and others not; but it also implies applicants being ranked relative to others with the successful applicant obtaining higher social status than those who did not get the job or promotion. This ranking is social because the successful applicant is known and the employer (and often others) knows those who did not succeed.Footnote 1

In this paper we study gender differences in the reaction to status ranking in isolation from the rivalry dimension of competition. In particular, we focus on how performance is affected by the anticipation that one will be compared to others by a peer and compare this to cases where no such social ranking takes place.

Little is known about the consequences for gender inequality of the status-ranking dimension of competition as such. In natural environments rivalry for resources and status ranking are typically interlinked. However, there are many instances in which the status ranking is much more salient than in others. This holds, for example, in professions that are at least partially protected from the market like the judiciary, the military, NGOs, the churches and universities. In these organizations people in high positions typically enjoy high status, whereas the payoff differences with respect to people in lower positions are often not large.Footnote 2

Even if one could not distinguish in the field between rivalry for resources and status ranking, the two are in principle distinct phenomena that can have differential effects and, hence, could affect men’s and women’s behavior differently. A better understanding of gender differences in performance therefore requires an analytical distinction between the two dimensions. Aside from improving our understanding, it is important to note that this distinction is also highly relevant from a policy perspective. The likelihood of success of any policy aiming to diminish the gender gap may depend very much on whether a rivalry for resources or status ranking are causing performance differences. Consider a policy aiming to reduce the rivalry for resources while maintaining the social ranking of performance. An organization could, for example, reduce the restrictions on the number of promotions and base them solely on merit, independently of how many others are promoted. Being promoted will then increase one’s social status without reducing other high performers’ chances, i.e. without there being any rivalry for resources. If the cause of the gender gap lies in the effects of social ranking, then this policy is likely to fail.

It is well documented that attitudes towards status differ across gender, with men usually found to attribute more importance to status than women (Frank 1999; Carlsson et al. 2009; Mujcic and Frijters 2013), though the reverse has also been reported (Johansson-Stenman et al. 2002; Alpizar et al. 2005). Here, we do not focus on the importance attributed to status per se. Instead, we address the complementary matter of gender differences in performance given that one knows that a performance comparison will take place that will reflect one’s status ranking. The anticipation of a social ranking has been shown to affect performance (De Botton 2004; Wilkinson and Picket 2010), but gender differences in this effect have not been addressed. What we do know is that, when men and women are ‘forced’ to compete for resources and the ranking of performance is not made salient (i.e., the status dimension of competition is not obvious), then performance differs across gender for some environments but not for others (Niederle and Vesterlund 2011). This leaves open the question whether the performance differs between men and women when the status-ranking dimension is more salient.

We use laboratory experiments to isolate the effects of status ranking. Our design makes it possible to hold constant the rivalry-for-resources dimension of competition and to vary the dimension we are interested in. Our experimental design has two treatments, differing only in the second of three parts.Footnote 3 For both treatments, part 1 consists of a task where participants’ monetary payoff is based purely on the individual score (i.e., performance), so that there is no competitive aspect to the incentive scheme. There are two groups of participants. One group does the task of part 1 and then skips part 2. Their performance on the task serves as a benchmark to which we compare that of the participants in the other group. The participants in the other group also do the task in part 1 and then in part 2 have to report their scores to a peer seated in a separate office. This peer does not know what task was undertaken.

We conduct two treatments in a between-subject design. In the ‘Status Ranking’ treatment (SR), each participant in part 2 individually and privately reports to the same peer and (truthfully) reads aloud his/her score as well as the ranking among the other participants in the group. This allows the peer to compare performances in the task. This particular way of making the rank public (‘social’) aims at creating social recognition by making it salient and tangible to participants. As argued above, status ranking does not stem from receiving feedback about relative position; it is the recognition of one’s ranking by others that creates a social status.

In the ‘Conformity’ treatment (CF), each participant reports to a different peer and (truthfully) reads aloud the score, but not the rank. This treatment distinction uses the fact that status is inherently positional to isolate the mere effects of having to report one’s result to a stranger from the effects of social-status ranking, i.e., being compared to others by a stranger (a possibility pointed out by Heffetz and Frank 2008). Importantly, in both treatments all participants who have to report to a peer are informed about this before starting on the summation task.

Our results show markedly distinct outcomes for men and women. For those participants who do not have to report to a peer and for those under conformity (CF), gender differences in performance are small and insignificant. In contrast, under status ranking (SR) men attempt many (and significantly) more summations and solve many more correctly than women. In this sense men perform better than women. Moreover, when women know beforehand that a social ranking of their performance will take place, they reduce the number of attempted summations. We can unequivocally attribute the observed performance differences to the social ranking, because no gender difference is observed when they do not report their score, nor under conformity, where participants report their score to a third party who cannot compare this score to that of others.

The remainder of this paper is organized as follows. The next section briefly reviews the literatures on gender differences in preferences for competition, stereotype threat, and status rankings and relates them to this study. Section 3 presents our experimental design and procedures, and Sect. 4 describes our results. A concluding discussion is offered in Sect. 5.

2 State of the art

There is by now an extensive literature on gender differences in behavior in relation to competition (for overviews, see Croson and Gneezy 2009; Niederle and Vesterlund 2011). This has addressed both performance differences when men and women compete and gender differences in the willingness to enter a competitive environment. In this literature, the focus is on the rivalry-for-resources aspect of competition. A competitive environment typically involves one or a few of the best performers obtaining a monetary prize, whereas the other participants do not earn anything.

Regarding gender differences in behavior under competitive pressures, a first influential study (Gneezy et al. 2003) shows for a maze-solving task that when forced to compete for resources women do not perform better than in a non-competitive environment where earnings are based solely on individual performance. In contrast, such competition strongly improves performance by men. This result is only observed when men and women participate in a mixed-gender competition, however. A similar effect is observed when 10-year olds compete in running contests (Gneezy and Rustichini 2004).Footnote 4 With respect to the issue of gender differences in attitudes towards competition, the seminal work by Niederle and Vesterlund (2007) establishes that women have a lower willingness to enter competition than do men.

The first studies linking experimental measures of competitiveness to actual education and labor market outcomes have only recently started to appear. These show that (differences in) competitiveness help explain why women sort out of jobs with competitive compensation regimes (Flory et al. 2015); predict whether Chinese students choose to participate in a competitive entry exam for prestigious universities (Zhang 2013); predict future salary expectations of American college students (Reuben et al. 2017); and can partially explain gender differences in academic career choices of Dutch high school students (Buser et al. 2014).Footnote 5

To the best of our knowledge, nothing is yet known about the differential gender impact of the status-ranking dimension of competition. In previous studies, the status ranking aspect of competition was in a sense ‘hidden’, with the focus being primarily on the possibility of winning a monetary prize by being amongst the best performers.

The psychology literature has offered various explanations for effects of status ranking per se and for gender differences in these effects. We briefly discuss the related concepts of ‘social evaluative threats’ and ‘stereotype threat’. A conceivable effect of status ranking for both men and women is that it creates anxiety about an anticipated comparison. This anxiety can be caused by ‘social evaluative threats’, i.e., situations where the social self in humans is endangered. Such threats give rise to large levels of individual cortisol responses due to a fear of failure in the eyes of others (Dickerson and Kemeny 2004). There is, however, no evidence that these physiological responses are gender related.Footnote 6 This, and the lack of previous studies on gender-specific performance effects of (social) status anxiety are somewhat surprising, because there is ample evidence of gender differences in ‘stereotype threat’, i.e., cultural beliefs about gender-specific performance. Such stereotype threats can cause distinct social evaluative threats for men and women and may therefore differentially affect performance. Indeed, stereotype threat is considered to be an important cause of gender differences in self-assessment of ability and career aspirations (with men scoring higher in both; Correl 2004; Thébaud 2010; Reuben et al. 2012).

Stereotype threat may lead to evaluation anxiety when conducting tasks that are considered to be negatively associated with one’s gender (Steele 1997). Simply knowing that a negative gender stereotype exists may be sufficient to cause anxiety (Goffman 1963; Howard and Hammond 1985; Steele and Aronson 1995), which inhibits performance (Sarason 1972; Hunt and Hillery 1973; Michaels et al. 1982; Wigfield and Eccles 1989; O’Brien and Candall 2003). Hence, stereotype threat could conceivably cause gender differences, both in the performance under competition for resources and in the effects of anticipated status ranking. In our study, we exclude this possibility. We are careful not to prime stereotype threat. Furthermore, our design allows us to isolate any effects of pre-existing stereotype threats related to gender. The results, however, indicate no evidence of such effects.

There are a few non-laboratory studies that look at the effects of giving ranking information to workers without pecuniary consequences. Blanes-i-Vidal and Nossol (2011) study data from personnel records for warehouse workers of a German wholesale and retail organization, in which workers were paid piece rates and received private ranking information on their pay and productivity. Using a quasi-experimental research design they find that providing this information leads to a large increase in workers’ productivity. In contrast, Barankay (2012) finds a negative effect of providing ranking about feedback. He presents the results from a randomized control trial with furniture salespeople who are privately informed about their performance rank. He finds that privately giving rank information without any pecuniary consequences decreases sales considerably for men, but not for women. Note that the private nature of the ranking information in both of these field experiments means that they do not measure the impact of social-status ranking that we are interested in.

3 Experimental procedures and design

The experiment was run at the laboratory of the Universitat Pompeu Fabra (UPF) in Barcelona between April 2014 and May 2016. There were six sessions with 13 and six with 18 participants, for a total of 186 participants; 144 were ‘active’ participants (A- and B-players; see below), while the rest had a passive role (C-players; see below). All participants were recruited on a voluntary basis from the UPF subject pool using the ORSEE recruitment software (Greiner 2004). If more volunteers showed up than needed for the session, participants were randomly selected and the remainder was sent off with a €7 show-up fee.

The experiment was partly computerized.Footnote 7 Instructions were handed out on paper and are reproduced in part I of the Electronic Supplementary Material (ESM). The experiment consists of three parts. In part 1 (computerized), participants undertake an individual task. In part 2 (not computerized), some active participants are required to report their result to otherwise inactive players. Part 3 (computerized; discussed in ``Appendix 2'') involves pairs of participants playing dictator games. Instructions for parts 2 and 3 were distributed after completion of the previous part.

Sessions lasted approximately 50 min. At the end of each session, participants were paid their earnings (which were contingent on their decisions in parts 1 and 3) in private. For active participants, average earnings including the €7 show-up fee were €23.47 (€24.08), ex(in)cluding two outliers (as explained below). Inactive participants received a €20 participation fee.

3.1 Player types

Before entering the laboratory, participants are randomly allocated to the three types of players, denoted by A, B and C. Only types A and B enter the laboratory and do the tasks described below. C-players are taken to separate rooms and remain inactive throughout the experiment. In every session there are six A-players and six B-players. Depending on the treatment (see below), there are either six or one C-player.

3.2 Task

Part 1 is the same in all sessions and is taken from Weber and Schram (2017). Participants are presented with a sequence of pairs of 10 × 10 matrices filled with two-digit numbers. These matrices appear at the lower half of their computer monitor (Fig. 1).

Fig. 1
figure 1

Screenshot part 1. Notes The instructions inform participants that the numbers in the cells were ‘randomly generated’ (cf. SM). Drawing from a uniform distribution would have led to a high probability of very high sums. To avoid this, for each cell, we first drew a random number between 40 and 99, say X. Then, we drew a random number (uniformly) between 10 and X. This gives a far lower probability of high numbers (the chance of a number being 75 or more is approximately 0.06)

For each pair of matrices each participant has to individually search to find the highest number in the left matrix and the highest number in the right matrix and to calculate the sum of these two numbers. This sum must be entered in the window at the center-top of the monitor.Footnote 8 A correct answer yields one euro. We apply this piece-rate remuneration in all of our treatments. After a number has been entered, two new matrices appear, regardless of whether the sum was correct or not. The task continues for 15 min. The piece rate remuneration that we apply aims at minimizing the rivalry for resources in all of our treatments. Any treatment differences that might occur can then be attributed to the social status dimension of competition.

B-players are instructed about the summation task and perform the task without further interaction with other players. A-players are informed before the task that they will be required to report their performance to a C-player after completion. Performance is measured as the number of correct summations. The A-player instructions also emphasize the importance of doing well in this task by mentioning that it has been shown to correlate positively with success in professional life.Footnote 9 Participants were told that we would provide evidence of this claim upon request after the experiment. For this purpose, we had available copies of Koedel and Tyhurst (2012), which is a resume study linking math skills to labor market outcomes.

After finishing the instructions, each A-player is individually taken to a C-player and reads aloud a text stating that s/he will return after the task to report her/his score (i.e., performance). This is done to create the anticipation of having to later report to the C-player. The text used is given in SM. The experimenters taking the A-players to see the corresponding C-player were always a man and a woman.

3.3 Treatments

We start with the distinction between two treatments that differ only in whether C-players are able to compare the performance of A-players. These are denoted as the ‘Status Ranking’ (SR) treatment and the ‘Conformity’ treatment (CF-NR, which denotes ‘Conformity-No Ranking’). In SR, there is only one C-player. In part 2 of the experiment, each A-player reports (one at a time) to this C-player and reads aloud the number of correct summations and the own rank amongst the A-players (cf. the upper panel of Fig. 2).

Fig. 2
figure 2

Experimental design. Notes A- and B-players individually do the summation task. Then A-players report privately to C-player(s) (indicated by arrows). Panel A shows the Status Ranking (SR) treatment where each A-player individually goes to the (same) C-player and reports his or her own score and rank amongst A-players. Panel B shows the Conformity (CF) treatment where each A-player individually goes to his or her ‘own’ C-player and reports the score

The conformity treatment was designed with the idea that simply reporting one’s score to a peer might already induce social evaluative threat and affect behavior. In Sect. 3.3 of their excellent overview of the literature on status, Heffetz and Frank (2008) write: “Indeed if we assume that status depends on actions, status-seeking individuals are expected to change their behavior in predictable ways depending on whether their actions are visible to others. The observation that they often do, however, is consistent not only with preferences for status, but also with any preferences where others’ opinions are important (e.g. because of considerations of reputation, shame, fear of punishment, etc.). This should be borne in mind when interpreting the evidence below”. In other words, our status-ranking treatment may confound the effects of social status with other effects related to a wish to ‘conform’ to a peer’s opinions.Footnote 10 To study the status effect in SR, we use CF to isolate such other effects.

To control for such ‘conformity’ effects, we use the CF-NR treatment, where there are six C-players, each seated in a separate room. Each A-player in this treatment reports (one at a time) to a different C-player and reads aloud the number of correct summations, but does not report anything related to the player’s ranking (see the lower panel of Fig. 2). When reporting, A-players use printed (truthful) texts provided by us (cf. SM). In both SR and CF-NR, B-players do not report to C-players. Their performance serves as a behavioral benchmark of isolated play without reporting.

Note at this stage that there may be two differences between the CF-NR and SR treatments. In SR, the social ranking is not only known to others (i.e., C-players), but also to the A-players themselves. In CF-NR, A-players do not know (and, hence, cannot report) their social ranking. To separate the effects of reporting and knowing the own social ranking, we add a treatment in which each A-player is informed about her own rank but knows that every A-player will report to a distinct C-player, i.e., there is no social ranking. We denote this treatment by CF-PR (‘Conformity-Private Ranking’).

In all treatments, C-player instructions inform them that they will be told the result of either one (CF-NR/CF-PR) or six (SR) participants. They are not informed about the task, but are told that high scores indicate better performance than low scores.Footnote 11 A-players know that the C-players do not know the task. After all A-players have reported their scores, C-players are paid €20 and dismissed.

The choice to induce social ranking via a face-to-face encounter with a peer deserves further discussion. Of crucial importance is that—as argued above—social status requires that the ranking is public (i.e. socially recognized).Footnote 12 An alternative would have been to organize the interaction between the A- and the C-player through the computer. This would have, however, seriously reduced the saliency of the social aspect of status in the SR treatment. A disadvantage of our approach may be that face-to-face interaction introduces various possible channels through which our main results might emerge. We hope to have diminished the number of channels by introducing only minimal contact between the two participants involved (the A-player reads aloud a one-line text prepared by us and the C-player is not allowed to respond). We consider a further investigation of possible channels by which this face-to-face encounter might cause treatment effects an interesting topic for future research.Footnote 13

3.4 Pilot

Before running the 12 sessions of this experiment, we organized four pilot sessions (in March 2014). These differed from the final experiment on two accounts. First, participants were given 10 min instead of 15 min to do the summation task. We increased the amount of time given to create more leeway for differences in performance. Second, A-players did not go to the C-players between reading the instructions for part 1 and starting the summation task. We introduced this to make the reporting of their result to a peer more prominent.

4 Results

Our presentation of the results focuses on gender differences in performance in the various treatments, distinguishing between attempted summations and performance (i.e., the number of correct summations). Because all tests reflect pairwise comparisons between independent samples of individuals, we apply (two-sided) permutation (a.k.a. randomization) t tests using Monte-Carlo resampling with 5000 repetitions (henceforth, PtT) throughout the analysis.Footnote 14 PtT do not make assumptions about the underlying distributions and the number of observations needed for trustworthy inference is (much) lower than for the tests more commonly used in experimental work. For example, Moir’s (1998) study in this journal already shows the success of these tests with as few as eight observations per treatment cell. Our numbers of observations per cell vary between 16 and 52 (note that by design we have more observations for players of type B) and all tests of our main hypotheses are based on 26–72 observations. We provide a further discussion of our tests in ``Appendix 3'', which also provides supportive evidence for our results using data from related experiments in Amsterdam (cf. fn. 6).

In presenting our results, we first investigate whether privately knowing one’s own rank has an effect on the numbers of attempts and performance. We then continue with considering the effects of social-status ranking on the number of attempts and performance. An overview of our summary statistics is presented in “Appendices 1 and 2” reports the effects of experienced status ranking on choices in the dictator game.

4.1 The effects of private ranking information

To check whether knowing one’s relative position (without anyone else knowing) has an effect, we compare the CF-NR and CF-PR treatments. Figure 3 compares attempts and performance across gender for these two treatments. It shows that the ordering between men and women on both measures is reversed when subjects know that they will be privately provided with information about their ranking amongst the A-players. Differences are small, however. None of the within-gender differences in attempts or performance between CF-NR and CF-PR are statistically significant (PtT; all p > 0.24; N = 20 for women, N = 16 for men). More importantly, there are no significant gender differences in attempted summations or performance for either conformity treatment (PtT; attempts: in CF-NR p = 0.374, N = 18; in CF-PR p = 0.292, N = 18; performance: in CF-NR p = 0.242, N = 18; in CF-PR p = 0.509, N = 18). For this reason, we pool the data for the CF-PR and CF-NR treatments from here onwards, unless indicated otherwise.

Fig. 3
figure 3

Attempts and performance in conformity treatments. Notes Bars show number of attempts at calculating summations (left) and performance (number of correct summations, right), separately for women and men. CF-NR: Conformity treatment without knowing own rank; CF-PR: Conformity treatment with knowing own rank. Error bars show 95% confidence intervals

4.2 The effects of anticipated status ranking

When further analyzing the data, we leave out two outliers in the SR treatment with more than 100 attempted summations (see SM, part II). Including them would further strengthen our results. Figure 4 presents the main results of this paper. The results for type B show that women make insignificantly more attempts and have insignificantly lower performance than men when they do the summation tasks without having to visit a C-player (PtT; p = 0.757 for attempts, p = 0.887 for performance; in both cases N = 72). This is an important benchmark indicating that for this task our participants experience no unaccounted-for stereotype threat related to gender (cf. Sect. 2).Footnote 15

Fig. 4
figure 4

Attempts and performance. Notes Bars show the number of attempts at calculating summations (left panel) and performance (number of correct summations, right panel), separately for women and men. CF Conformity (CF-NR and CF-PR pooled); SR Status Ranking. Error bars show 95% confidence intervals

In the conformity treatments—i.e., when participants know that they will report their result to a peer but also know that this C-player will not be able to compare this result to others’ performance– the differences between men and women are very small and statistically insignificant (PtT; p = 0.951 for attempts, p = 0.658 for performance; in both cases N = 36).

The most remarkable result is observed for the treatment where A-players report to a C-player and know that this peer will be able to compare their performance to others (SR). Here, women make many fewer attempts and have much lower performance than men and these gender differences are highly significant for both attempts (PtT; p < 0.001; N = 34) and performance (PtT; p < 0.001; N = 34). The observed gender difference in performance in SR is a direct consequence of the difference in attempts because the fraction of attempted summations that is correct does not differ between men and women in SR (PtT; p = 0.789; N = 34).Footnote 16

The ‘dif-in-dif’ result shown in Fig. 4 is a direct consequence of the difference in the way men and women react to the introduction of conformity or status ranking. When introducing conformity in CF (having to report to others without being compared), women slightly increase their attempts but have lower performance (compared to the behavior of B-players who do not report). These differences are far from statistically significant, however (PtT; p = 0.548 for attempts, p = 0.692 for performance; in both cases N = 72).Footnote 17 Men (slightly) increase their number of attempts and have almost the same performance; again these effects are statistically insignificant (PtT; p = 0.496 for attempts, p = 0.933 for performance; in both cases N = 36).Footnote 18

When introducing social-status ranking in SR, a comparison to the ‘non-reporting’ B-players shows that women reduce their number of attempts and performance, while men strongly increase attempts and performance. For women, the first effect is statistically significant (PtT; p = 0.044, N = 68) while the effect on performance is insignificant (PtT, p = 0.172, N = 68). For men, both effects are statistically significant (PtT; p = 0.003 for attempts, p = 0.050 for performance; in both cases N = 38). These results allow us to conclude that the gender difference we observe in a situation where anticipated status ranking may affect behavior is caused by men increasing the number of attempted summations and women decreasing it.

To investigate the effect of status ranking within gender, the most direct comparison is between our treatments CF-PR and SR. Recall that the only difference between these two is that each participant reports his or her score to a different peer in CF-PR while all six participants report to the same C-player in SR. The effects we find are remarkable. An anticipation of status ranking makes women significantly reduce the number of summations they attempt (PtT for attempts, p = 0.010, N = 26). The reduction in performance is not significant (PtT for correct, p = 0.221, N = 26). For men, the numbers of attempts and performance both increase significantly (PtT, p = 0.047 for attempts, p = 0.028 for correct, in both cases N = 26).

5 Conclusions

Our experimental study abstracts from rivalry for resources and focuses on the effects of social status resulting from the social ranking of performances. We find that men make more attempts and increase their performance in anticipation of status ranking. Women, on the other hand, make fewer attempts and perform more poorly when they know they will be compared to others. This results in a large and statistically highly significant gender gap.

Our findings suggest that anticipated ranking of social status alone is an important element in observed gender differences in real-world competitive environments. Previous studies have shown that women tend to ‘opt out’ of competitive situations (Niederle and Vesterlund 2007). Our results imply that finding ways to make women ‘opt in’ may not suffice to bridge the gender gap. In fact, our study shows that—if the status ranking inherent to competition is salient—forcing an opt-in will make women slow down in trying to perform their task and will make men excel.

Though being compared to others is particularly disadvantageous to women, the aggregate effect across men and women may not be negative. In our experiments, total productivity (measured by the total number of correct summations for one man and one woman) is on average 22.8 for participants who do not report to anyone, 22.5 for those in conformity and 23.7 for those anticipating status ranking. This suggests that such ranking has negative effects on gender equality without negatively affecting economic efficiency. Efficiency and equity could both be enhanced if one could diminish the effect of social-status ranking on women while maintaining the stimulating effect it has on men.

Our focus in this paper has been on generating causal evidence on the gender effects of social-status ranking. The question arises what are the mechanisms underlying this phenomenon. Tentative interpretations of our findings are that either women choke under status pressure, or that status ranking with peers demotivates women. The fact that in anticipation of status ranking women make fewer attempts hints at the latter, though it raises the follow-up question of why this demotivation occurs. It is also possible that women simply become more careful in performing their task, in the sense of pondering their decisions more before submitting them to the computer. Finally, the observability inherent in social status comparison might induce in women a desire to conform to a gender norm (similar to the ‘acting wife’ phenomenon reported in Bursztyn et al. 2017). At this stage it is unclear, however, why this would appear in our Status Ranking treatment and not in Conformity. A solid explanation of the effects we find is beyond the scope of this paper, but it deserves further investigation in future research.

Given the increasing labor participation of women, such gender differences and the ‘hidden’ factor of social-status ranking under competition need to be addressed. A first step would be to reduce for women the performance comparison with others in working environments. This can be done, for example, via fixed promotion standards based on individual performance without comparison to peers. An example of this practice is that in many North American universities, tenure decisions are not made in direct comparison to other candidates who are simultaneously up for tenure, but to a set of standards expected for a tenured position. Our results suggest that that if this procedure reduces the salience of status ranking, it lead to better performance by women than in universities where they have to apply and compete for vacant tenure positions (as is often the case in Europe).

Finally, to the best of our knowledge, this is the first study that isolates the effect of social-status ranking from the rivalry dimension of competition. More research is needed to establish the consequences of what we find here, that even when rivalry for resources is held constant, simply being compared to others has an opposite effect on men and women, leading to gender differences in performance and resource allocations. An interesting direction for future research would be to make both dimensions of competition salient. Our hunch is that they would reinforce each other in creating more advantageous environments for men than for women.