Computer analysis of face beauty: a survey

. The human face conveys to other human beings, and potentially to computer systems, information such as identity, intentions, emotional and health states, attractiveness, age, gender and ethnicity. In most cases analyzing this information involves the computer science as well as the human and medical sciences. The most studied multidisciplinary problems are analyzing emotions, estimating age and modeling aging effects. An emerging area is the analysis of human attractiveness. The purpose of this paper is to survey recent research on the computer analysis of human beauty. First we present results in human sciences and medicine pointing to a largely shared and data-driven perception of attractiveness, which is a rationale of computer beauty analysis. After discussing practical application areas, we survey current studies on the automatic analysis of facial attractiveness aimed at: i) relating attractiveness to particular facial features; ii) assessing attractiveness automatically; iii) improving the attractiveness of 2D or 3D face images. Finally we discuss open problems and possible lines of research.

confirmation ( [26], [178], [170], [97]). For expressionless faces, the problem is essentially one of 3D object recognition, at least on short time intervals. Unrestricted identification may require the analysis of additional elements such as expressions and aging, involving much more complicated face models and a multidisciplinary approach.
Other face analysis problems are intrinsically multidisciplinary and strictly related to human sciences and medicine. The most important are estimating age and modeling face aging, capturing and understanding human expressions, and analyzing face attractiveness. Face age synthesis and estimation, surveyed in [59], has possible application in entertainment, forensics, security controls, and cosmetology. Computer analysis of human expressions is a much studied problem. A currently well-established application is capturing human expression in order to animate the faces of virtual characters for entertainment or to reduce video-transmission bandwidth. A much more challenging problem is interpreting facial expressions, i.e. mapping expressions onto emotional states ( [46]). One difficulty is that there is no full agreement in psychophysiology about a model of the human emotions and of their effects on facial features ( [128], [53]). The computer analysis research in this area has been surveyed in ( [177], [120], [128], [53]). Today, research is mostly focused on affective computing, i.e. investigating new affectsensitive paradigms of man-machine interaction.
The computer analysis of face attractiveness is an emerging research area. What produces the human perception of beauty is a long standing problem in human sciences and, more recently, in medical areas such as plastic surgery and orthodontics. In the last few decades, several thousand papers and books on this subject have been published. A survey of the recent research on attractiveness in human sciences can be found in [89]. The researchers involved in these studies are social and developmental psychologists (in relation with the effects of attractiveness on human interaction), cognitive psychologists and neuroscientists (which investigate the mechanisms we use in assessing attractiveness) and evolutionary psychologists and biologists (which study the connection between the morphological characteristics connected with facial attractiveness and other human qualities such as health, fitness and, based on Darwin's theories of natural and sexual selection [35] [36], reproductive value; these works are surveyed in [63] [99]). The human perception of attractiveness is also related to face identification. Experiments have shown that the recognition rate is better for attractive and ugly faces, and lower for attractively average faces [178].
Studying beauty with pattern analysis and computer vision techniques is a relatively new research field. The purpose of this paper, which extends the preliminary material presented in [16], is to survey the rationale, techniques, results, applications, open problems and new possible lines of research in this emerging area. To the best of our knowledge, these topics have only been surveyed briefly before in [66].
The content of the paper is as follows. In Section 2, we summarize some results about facial attractiveness presented in human sciences and medical areas. Particularly relevant are the results showing that the perception of human beauty appears largely shared by people different in culture, ethnicity and age, and thus supposedly data-driven.
These findings are a rationale of computer techniques attempting to emulate the human perception on the basis of objective facial features. In Section 3, we present the applications of automatic beauty analysis. In section 4, we briefly survey some issues relative to the representation of faces and the extraction of facial features. In section 5, we survey the recent research on computer beauty analysis, and in particular: i) relating attractiveness and facial features; ii) automatically assessing face beauty; iii) improving attractiveness of 2D face images or 3D face scans.
Finally, in Section 6, we discuss new areas of research and the open problems. 2 Beauty in human sciences and medicine

Research on attractiveness: a short history
What is beauty? Philosopher, scientists and artists have debated the problem for centuries. A controversial long lasting question is, according to an often quoted sentence of the writer Margaret Wolfe Hungerford (1878), if "Beauty is in the eye of the beholder", i.e. if beauty is purely subjective or not. Important personages, such as Immanuel Kant (1790), have supported the former thesis or, as David Hume (1741), the latter.
In any case, from ancient Greek culture to our times, no one denied the strong influence of beauty on human life. A number of recent studies, as well as everyday common experience, show that face and body harmony is extremely important in general social life ( [24][4] [47]). In a society that is virtually obsessed by beauty, looking unpleasant or different can deeply affect self-esteem and result in social isolation, depression and serious psychological disorders ( [24][133] [21][127] [101]). Attractive people are also likely to be regarded as better than unattractive people in a broad social sense. For instance, experimental research shows that better-looking candidates are more likely to be hired than equally qualified but less attractive people ( [104][77] [30]). Thus, it is not surprising that more money is spent annually in the US on beauty related items or services than on both education and social services [4].

Beauty Canons
Since ancient times, the supporters of the objective and measurable nature of beauty have attempted to state ideal proportions, or beauty canons, for the human body and its parts. The Greek sculptor Polycleitus was the first to define aesthetics in mathematical terms in his "Kanon" treatise. He also pointed out the importance of symmetry in the human shape. In order to define an aesthetically pleasing face, Marcus Vitruvius, a Roman architect, introduced the concept, still largely used in medicine and anthropometry, of facial trisection, or facial thirds, in which a face can be divided by horizontal lines passing through the hairline, the glabella, the subnasale and the menton (Fig. 1, left). Renaissance artists, such as Leonardo da Vinci, Leon Battista Alberti, Albrecht Duerer and Piero della Francesca, reformulated and documented the classic canons. (Fig. 1, right). The classic and neoclassic canons ([13] [166][51] [9]) have been used for centuries by sculptors, painters, and are a rough working guide for plastic surgeons. The same idea is at the base of some of today's beauty assessing techniques, which are merely based on geometric features computed from the position of 2D/3D facial landmarks.

The Golden Ratio
A long lasting idea, also stemming from the classic concept of ideal proportions, is the relevance of the golden ratio to facial beauty. The golden ratio is an irrational number, approximately 1.618, related to geometric entities, such as pentagons, and to mathematical entities, such as the Fibonacci sequence [100] [44]. From ancient times, it has been used explicitly, or claimed later to have been used, by a score of sculptors, painters, architects and composers, ranging from Fidia to Le Corbousier, Dalì, Mondrian and Bela Bartok, to construct aesthetically attractive shapes and even sounds [100]. Nowadays, the idea of a universal standard of beauty based on the golden ratio (Fig. 2) [106]). For instance, it was found that patients who were considered to be more attractive after orthognathic surgery were equally likely to move away from or towards the golden proportions [10] and a wide range of cephalometric values was found in a 3D research on professional models [115].   In this section we present some result relevant to the computer analysis of attractiveness, and in particular to the construction of computer systems able to emulate the human perception of attractiveness.
Three kind of results support the thesis that the human perception of attractiveness is largely shared and appears to have a biological, "hard-wired" basis.
• A large number of empirical tests showing high beauty rating congruence over ethnicity, social class, age and sex; • Brain activity patterns due to explicit attractiveness judgments, showing a strong correlation with beauty and ugliness; • The apparently innate capability of young babies to appreciate attractiveness.

Cross cultural consistency of beauty judgments
A large number of experiments, based on various groups both of human raters and subjects, investigated the crosscultural consistency of the judgments of attractiveness. For instance, Pearson correlations greater than 0.9 were obtained in [33] for groups of Asian, Hispanic, Black and White Americans, male and female, both as subject and judges. Similar close agreements were obtained with Greek men as subjects and European and Asian females as raters [157], male/female White and Cruzans (Native of the Virgin Islands) as raters and Cruzans as subjects [103], Asian-American and Caucasian female [168], South African and American males and females [114], and Black and White judging males and females of both groups [32]. Other experiments used synthetic faces, [125][124], once again obtaining cross-cultural congruence of judgments between Japanese and Caucasian. In addition, high correlation of intra-class ratings between different professional groups was recorded. For instance, in [87], the correlation of ratings of clinicians specialized in orthodontics and normal hospital clerks was, on average, 0.73 for female subjects and 0.93 for male subjects, with a significantly higher correlation between ratings of professional evaluators. The temporal consistency of raters was also tested. An overall correlation between ratings given at 4 weeks interval was on average 0.86 for female and 0.83 for male subjects, and, again, significantly higher for professional groups.
The conclusions of these and many other empirical studies is that substantial beauty rating congruence exists over ethnicity, social class, age, and sex. Beautiful faces of different groups can have quite different shapes, but they are largely recognized as beautiful by individuals of other groups. Furthermore, it has been found that rating congruence is stronger for very unattractive and very beautiful faces ([125] [67]), in agreement with the analysis of brain activity patterns. These findings, although very largely shared, are in part contested in [72], where the thesis is that, due to systematic misinterpretation of experiments, individual tastes matter almost as much as shared tastes.

Brain activity patterns
An important result in psychophysiology and neuropsychology is the detection of the brain areas where the assessment of facial beauty is processed. Activity patterns related to an explicit attractiveness judgment of 2D face images have been measured with MRI (Magnetic Resonance Imaging) techniques and correlated with the beauty score of the faces. Brain patterns showed a non-linear response profile, with a greater response to highly attractive and unattractive faces. In addition, the response is greater for beautiful faces of the opposite sex ( [174][2] [76]).
These experiments have been substantially confirmed using near-infrared spectroscopy (NIRS) [111] and eventrelated brain potentials (ERP) [141]. These results, although preliminary, are important, since they confirm those of the previous section, and could supply in perspective "objective" measures of the intensity of the perception of attractiveness, without the need of marks and scales whose levels are necessarily ill defined.

Very young babies prefer beautiful faces
Another empirical finding seems to indicate that appreciating beauty is an innate, "hard-wired" human capability.
Newborns were found able to distinguish between faces previously rated as attractive or unattractive by adult raters analyzing the time spent by the babies in looking at each face ( [91][139] [42]). These preferences are not likely to have been produced by stereotypes of contemporary culture, contrary to those shown among children toward their peers from 3 or 4 years of age [41].

Applications of machine beauty analysis
The ability of automatically ranking attractiveness, and suggesting how to improve the attractiveness of a particular face, are at the basis of many applications in scientific, professional and end user areas. Some of these applications, as other face analysis applications such as identifying emotional and health states, could raise ethical concerns.
Human sciences. Much research in human sciences requires rating face attractiveness. Automatic ratings can avoid using human panels, cumbersome to set up and manage.
Social life. Several Internet sites already offer beauty ranking, or beauty ranking programs. Choosing the best photographs for home albums, social networks or personal Web sites, are possible applications in this area.
Professional applications. Include automatically retouching and deblemishing images for advertising, magazine covers, motion pictures and special effects, preparing professional portfolio or CVs, and screening applicants for specific jobs such as entertainment and modeling where attractiveness is a basic requirement.
Selecting make-up and hairstyle. We have underlined the large amount of money currently spent for improving female attractiveness. Then, applications able to suggest make-up and/or hairstyle not only fashionable, but also fitting a particular face, could be very successful.

4
Face feature extraction and the face space paradigm In this section we will briefly review some concepts related to face image analysis that are also relevant to the case of attractiveness analysis.
Holistic and feature based facial data. The computer approaches to facial feature extraction can be roughly divided into holistic and feature based. Both approaches are aimed at extracting, for a given problem, the most useful data from the huge amount of information provided by 2D images or 3D scans. The difference is that holistic techniques perform an automatic extraction of the data, usually from the whole face, on the basis of some general rule (PCA, LDA, Gabor wavelets, etc.). The precise meaning of the data obtained, a complex combination of the original image data, is not intuitive, and difficult to relate to the usual facial features. In the feature based approach the significant features are selected a priori (e.g., nose width, intraocular distance).  [142].
The face space. A useful tool for face analysis applications is the "face space" paradigm, used in both human science, for modeling human perception, and in computer analysis. In the computer science area it was first introduced in [150]. The general idea proposed was that, after proper normalization, the pixel array of a face image can be represented as a point, or a vector, in an image space. The same concept applies to other 2D or 3D face data spaces, such as depth maps, 3D textured points and deformable faces. A vast portion of vectors in these spaces does not represent faces. Human faces, constrained by symmetry and general structure, belong to a manifold, called the face space, or face manifold, whose dimensionality is much lower than that of the representation space, but whose shape is highly nonlinear [97]. Investigating the properties of the face space in relation with various face analysis problems requires techniques referred to as manifold learning ([156] [164][39] [163]). The relation of manifold learning with computational attractiveness will be discussed in more details in Section 6.
Human science researchers, and perceptual psychologists in particular, suggest, in the perceptual face space, norm-based models ( [162][63]), where each face can be described encoding its differences from a prototype face, obtained as the average face. Such representations have been used for modeling the perception of identity ( [135]), expressions ( [116]) and attractiveness ([129] [130]). Since computer face models, such as that of Blanz and Vetter [15], are currently used to populate the perceptive face space, and various quantitative techniques such as MDS (multidimensional scaling) are applied to data analysis, the ideas of face space of computer scientist and psychologists are currently converging ( [129], [25], [74]).

Computer-based beauty analysis
The recent papers on attractiveness that make use of image processing, computer vision and pattern analysis techniques surveyed in this section are divided into three main groups: papers aimed at relating attractiveness to general facial features, such as texture, shape, symmetry, averageness and sexual dimorphism; papers essentially proposing automatic beauty rating systems aimed at emulating human judgment; -papers describing techniques to beautify 2D or 3D face images.

Shape and texture
Shape and texture convey different information and several studies have been aimed at investigating their relative relevance to attractiveness. It should be observed that facial texture also supplies to the human vision system 3D shape information, since reflected light also depends on the surface normal. In [56], different skin textures obtained from photographs of 169 women were applied to a common 3D face model and rendered with the same illumination in order to (partially) decouple texture from original shape. Experiments showed that the 3D models textured with the images of younger subjects were rated as more attractive than those textured with images of older subjects. Several other results support the importance of skin color texture for attractiveness [57][80], especially in intersex evaluation, a thesis also put forward by Darwin [36]. An interesting, although obviously extreme, example of the importance of texture is reported in Fig. 3, where the same 3D surface can completely change its 2D appearance when different textures are applied. In any case, we recall that both the textures retain clues of the original 3D shapes.
A consequence of these findings is that simply relying on geometric dimensions related to landmarks appears insufficient for a full attractiveness analysis. This also questions the effectiveness of canons or golden ratios as efficient beauty predictors ( [48], [144]).

Symmetry and averageness
According to evolutional theories, symmetry may reflect the potential of an individual's genome to resist disease and maintain normal development in front of environmental perturbations, thus being a potential element of sexual mating selection [126] [153]. A pioneer in these studies was Sir Francis Galton, Darwin's cousin, who in 1879 created photographs where the images of different faces were superimposed ( [60]). Today researchers use image processing techniques to find the sagittal (symmetry) plane, locating facial landmarks, measuring asymmetry, and creating artificial symmetrical, morphed and average faces.
Faces are more or less asymmetric around the sagittal plane (see Fig. 4). Usually, their Total Asymmetry (TA) is decomposed into a Fluctuating Asymmetry (FA), concerning random deviations, and a Directional Asymmetry not seem to affect attractiveness and some research even found a negative correlation between symmetry and attractiveness [153].
The effect of averageness on attractiveness perception is a much studied problem, but the results presented are controversial. According to theories of evolutionary biology and cognitive psychology, evolutionary pressure operates against the extremes of the population, and average facial prototypes should be preferred by conspecifics faces, the ratings of composite faces were better than those of the original faces. However, as pointed out in [14] and [5], composites are more symmetrical and fairly free of facial blemishes. For male faces, composites were found to be less attractive than normal faces ( [62] In [179], using the 2D normalized positions of 68 landmarks, the average face of about 500 samples was computed. Then, each test face was warped both toward and in the opposite direction of the average face. Human reaters preferred faces closer to the average face, and moving faces far from the average towards the mean point was found to be an effective beautification technique [28]. However, this result is questioned by [5] and [125], according to which average faces are attractive, but very attractive faces are not average. Attractive composites were found more attractive by exaggerating the shape differences from the sample means ( [125]).
A 3D analysis of the influence on attractiveness of averageness of both 3D shape and 2D texture, using the Blanz and Vetter morphable model technique [15], is described in [118]. The head shapes and the face textures of 100 young adult males and females were separately averaged, and artificial face images were created in two different ways, first by texturing the individual heads with the average texture, and then morphing individual textures onto the average 3D head. Renderings of the original, the texture-normalized and the shape-normalized 3D models were rated by a human panel, showing higher attractiveness scores than those of original faces for texture-normalized and even higher for shape-normalized images.

Sexual dimorphism
Sexual dimorphism is the difference in facial features due to sex. While there is widespread agreement in human science that for female faces femininity is attractive, contradictory results have been presented for male faces (see [138] for a list of relevant papers). However, some recent papers using computer techniques provide results which appear to clarify this controversial point. In [145] 2D faces were morphed from masculinity to femininity according to the eigenvectors, derived from 128 facial landmarks, more significant for sexual dimorphism. The attractiveness of a set of male samples rated by a human panel was found not related to masculinity. On the contrary, significant negative correlation was found for a set of female samples. In addition, the skin color was significantly correlated with male attractiveness, showing the different role of shape and texture as attractiveness clues.
A further investigation ( [138]) was performed in a densely populated 3D face space with 50 dimensions, 25 related to shape and 25 to surface reflectivity. The ratings of four thousands synthetic male and female faces were used to build a non linear attractiveness regressor, which showed predictions in agreement with the human panel.
Using as the sexual dimorphism direction the vector joining the average male and average female faces, i.e. the direction of masculinity, or, equivalent, as the opposite vector, i.e. the direction of femininity. The gradient of female attractiveness was found to be almost parallel to the direction of femininity, while for male the direction was almost orthogonal. A further analysis, carried on separating shape and reflectivity, showed that for female the gradients due to both features pointed to the direction of femininity, while for male the reflectivity components pointed towards masculinity and the shape components towards femininity. This finding might explain the contradictory effects of masculinity on male attractiveness reported in previous work.

Assessing beauty
Several papers have been aimed at automatically rating face attractiveness. Since these systems compute a single attractiveness score, their purpose, even if not explicitly stated, is to approach the average score supplied by human raters, or in other words, the "average" beauty. Till now, little investigation on personal preferences is reported ( [173], [6]).
The general approach is: i) collecting a training set of images rated for attractiveness by human panels, ii) extracting from the images, with various techniques, data relevant to attractiveness, and iii) using this data to construct an automatic rater, which is then compared on a test set with human ratings, assumed to be the ground truth.
Attractiveness estimation in principle can be considered as a classification or a regression problem. Human raters are asked to rate face attractiveness with some integer number. Each number can be considered as the label of a class, thus making attractiveness estimation a classification problem. Classification accuracy can be estimated as the percentage of the test samples assigned to the classes chosen by human raters. Since usually, the attractiveness ratings of the human judges are not coincident, the ground truth class is the average or median score. Accuracy of estimation can be also evaluated as the Pearson correlation between automatic and human ratings over the test set, or as the Mahalanobis distance between ground truth and predicted class.
On the other hand, the beauty level can be seen as the dependant variable of some kinds of regression, where the feature vector components are the independent variables. Accuracy can again be evaluated as the correlation with human ratings or through the coefficient of determination R 2 . The relevance to attractiveness of the different features used can be studied in various ways, such as evaluating the explanatory power of independent variables, or testing the correlation with human judgment of different variable sets.
In the following discussion, the approaches presented have been divided into three groups. The first two are the approaches working with a relatively small dataset and based on, respectively, geometric, holistic or mixed features. The last group includes the approaches working with large datasets collected over the Internet.
Most relevant data on several of these papers are summarized in Table 1, namely: • dataset used • score levels, i.e. levels of attractiveness used by human raters • number of raters • facial feature used • classification/regression techniques • validation method (training and test sets) • comparison with human raters Some papers also present analyses of the rating consistency. For instance, in [67], using a ten point scale, a standard deviation of 1.6 was found for the average ratings distribution on a particular subject. Lower spreads resulted for high and low marks. In [82], an average correlation of 0.92 was found by dividing at random many times the ratings into two groups, and similar results were obtained in [45]. The consistencies are in good agreement with those found by human science researchers and reported in Section 2.2. of the referees and some classical beauty canons was also analyzed. For instance, in [68] it was found that female scores are likely to be higher for male faces and that vertical face proportions play a more significant role than horizontal proportions. In [48], the authors reported that the faces obtaining the top scores did not respect the Golden Proportions and, in [144], neither classic geometrical rules nor symmetry were found to be effective beauty predictors.

Small data-bases: geometric features
Other approaches, such as [81] and [102], relied on geometric facial measures only. Some feature dimensions, such as lower lip thickness, were found in [81] to be positively associated with attractiveness, while others, such as nose size, negatively. However, the results obtained were not fully consistent across different contributive analysis algorithms. In [102], it was also found that, classifying images into 4 categories, the highest attractiveness level reached very high accuracies (96.8% correct classification with respect to ground truth scores) and the other levels a lower or much lower one, which relates with the highest labeling confidence of human raters towards highly attractive and unattractive faces.

Small data-bases: mixed or holistic features
Other attractiveness rating approaches used data extracted with holistic or mixed techniques. Skin textural The results presented in these papers about the relative relevance of geometric, textural and holistic attributes to attractiveness are rather controversial and do not contribute a shared view about this point. Geometric based beauty prediction performed better than holistic in [45] and texture based in [22], while textural features outperformed geometric ones in [29]. The integration of features of different natures, textural and geometric, provided better accuracies than the individual features ([45] [29]). Some attempts to analyze the relative relevance of data extracted with holistic techniques were performed. In [45], it is reported that the eigenfaces showing higher significance for the human attractiveness ratings do not correspond to the highest eigenvalues, providing a general description of hair and face contours, but to the intermediate and smaller ones, which contain clearer details of facial features like nose, eyes and lips. On the contrary, the PLS factors found more relevant in [22] appeared related to averageness, symmetry and sexual dimorphism.

Internet collected images
Several researchers have attempted to use larger training databases, collecting images from the Internet. In this case, a major challenge to face is related to the large variability and low quality (i.e., different resolutions, orientations, illuminations and expressions) of the images used. Another relevant issue is obtaining human ratings for such a large numbers of images. To solve this problem, researchers often relied on images taken from hotornot.com, a site that allows users to rate, on a 10 point scale, the attractiveness of photos submitted voluntarily by others. In particular, a selection and rectification of the best images between 30.000 rated samples downloaded from this site, was performed and described in [172] This set, contains 2097 female and 1527 male images and, among the others, was also used in [37] for constructing an average face model that evolves as a function of attractiveness score and allows to analyze the differences of the facial traits at different beauty levels.
As for small databases, different approaches to assessing attractiveness have been proposed, often resulting in contrasting findings. In [152], geometric ratios and eigenfaces were experimented in different classifiers, showing a better accuracy for eigenfaces. Different data, including eigenfaces, edge information and data extracted by various layers of local filter banks were used in [64] to train various regression models. Top correlation with the attractiveness scores was obtained using multilevel filter banks, outperforming by far eigenface results. Another study on female face portraits ( [34]), analyzed different facial features (i.e. geometric features, color characteristics, and non-permanent traits as make-up and expression) and photographic aesthetics (i.e., image format and resolution, illumination and so on). Face shape and clues of the person's weight and baby-faceness were found more relevant that non-permanent traits, while image quality was marginal.
In [173], rather than attempting to develop a "universal" attractiveness predictor, the authors tried to build an    (N)o), number of score levels used by raters, number of raters of the Human Panel, and short description of the features used, the classification or regression algorithm, its validation method and results achieved.

Discussion
In general, the computer scores presented appear to relate fairly well to human average scores, and interesting results have been obtained, mainly with regression techniques, on the relevance to attractiveness of particular facial features. However, some comment on the limits of these studies should be made. Consider Pearson correlation for instance. Since consistency of human judgment has been found to be low for average faces, and high for very beautiful or very ugly faces, a test set with many average faces would produce for these faces "noisy" judgments, and thus lower correlation than a test set containing mostly extreme cases. Similar arguments apply to other figures of merit. It should also be noted that, due to this and other reasons, not only the accuracy measures, but also the usual statistical significance tests presented by some authors have a rather limited meaning. Finally, it could be interesting to observe that problems due to heterogeneous databases also affected the first research on expression analysis, as stressed in [121].
-Panel scores are intrinsically "noisy". A limit to computing the "average" beauty score is the "noisiness" of human judgments. This is due to not only to human preferences, but also the ill-defined meaning of the various levels of the scales of beauty. This also shows that the computer analysis of individual preferences is not easy, since their effects are difficult to tell from those due to ill-defined beauty levels.
-Test sets lack beautiful faces. The lack of faces rated at top beauty levels is a problem that affects most of the training and test sets used. In [82] seven attractiveness levels were used, but the top score was only 5.75 and in [22] only 4.77 for male and 5.02 for women, again on seven levels. In [144] and [102], to populate top beauty levels several movie actors were added to the database. Also observe that, where explicitly reported, as in [1] and [102], top accuracy is obtained for top beauty levels. Average attractiveness judgments are much more uncertain both for automatic and human ratings, in agreement with brain activity patterns.
-Most papers investigate female faces. Most of the databases used contain only female images. As a matter of fact, according to various results presented in human sciences, such as those stating that qualities like averageness and symmetry are more closely related to female than male beauty, computer analysis of female beauty is likely to be easier than that of male beauty.
-Training sets are too small. As already mentioned, results from human sciences point to a large number of beauty prototypes, especially for male faces. This is in agreement with the results presented in [45], where the accuracy of classification is reported to increase without saturation with the cardinality of the training set. It seems that most training sets were too small, in particular for top beauty levels. Internet collected images, although potentially countless, are shot in uncontrolled orientation and lighting, thus image normalization, a necessary preliminary for most feature extraction techniques, introduces more "noisiness" into the process.
Concluding, given the above reasons and the fact that the material used were only 2D images, the research aimed at computing an average beauty score appears rather preliminary.

Enhancing attractiveness of face images
As discussed in Section 3, enhancing attractiveness is one of the most important application area of beauty analysis. Here we survey the papers aimed at improving the attractiveness of 2D images and 3D scans. Summary data on some of these papers are presented in Table 2.

2D images
One idea for automatically enhancing the attractiveness of 2D facial images, in agreement with the relevance of texture to attractiveness, is to correct face pixel colors. For instance, in [7] the authors describe a system aimed at beautifying face images by removing wrinkles and spots, while preserving natural skin roughness, using a bank of non-linear filters.
Another approach to 2D beautification uses geometric data to warp the image texture toward more attractive shapes. This approach is effective for beautifying faces by making relatively small changes, without affecting the identity perception, and it is aimed at professional retouching for preparing posters, magazine covers and fashion photography services. A system based on this idea, working on frontal color photographs, has been presented in [96]. The application performs a face triangulation starting from 84 landmark points, obtaining a representative vector of 234 normalized lengths. The vector of the face to beautify is compared using various techniques with the vectors of the beautiful faces. These comparisons suggest how to warp the triangulation of the original face toward those of beautiful faces that are more similar to the original (see Fig. 6). The resulting face beauty scores show better results for beautification of female faces (the raters found all the modified female samples, but only 69 percent of the male samples, more attractive), probably due to their larger training set. Other 2D face images beautification systems using similar techniques are described in [109] and [151]. Similar ideas for reshaping the face and filtering its texture are at the basis of the commercial PortraitProfessional software.

3D scans
Three systems for automatic beautification of face 3D scans have been proposed, based on three different approaches. The system described in [85] attempts to enhance attractiveness by restoring local or global facial symmetry. Significant preference of human observers toward the symmetrized faces is reported.
Another system for the global enhancement of 3D scans is aimed not only at correcting asymmetry, but also to approach the proportions suggested by Neoclassic Canons, golden ratio and aesthetic criteria for the face profile ( [98]). Human raters preferred the enhanced 3D faces in 78.33% of the cases. A result reported is that asymmetry correction is more effective than approaching frontal or profile ideal proportions, although it is their combination that produces the best results.
The previous systems perform a global 3D face warping and have possible applications in areas such as computer graphics and animation, avatar modeling, interactive 3D e-commerce, tele-conferencing and entertainment. However, other important application areas such as plastic surgery or orthodontics deal with real faces, and can only correct locally face features. A system for planning plastic surgery that automatically chooses the shape of the feature to be modified that aesthetically fits the patient's face best, is described in [18]. The basic idea of the paper is that, in general, there is not a unique prototype of a beautiful facial feature (e.g. mouth, nose, chin), but there are different shapes that are perceived as attractive, depending on their integration with the rest of the face [94]. The system searches, in a database of beautiful faces and with anisotropic ICP, for the most similar scan, excluding the feature to modify. Then, it suitably applies the corresponding feature of the selected beautiful face to the patient's scan. Plastic surgery treatments, relative to chin, nose and mouth, have been simulated. Their evaluation by a human panel shows not only an obvious attractiveness improvement, but also that the more similar to the patient the database scan selected, the better its feature fits aesthetically. An example of simulated rhinoplasty is shown in Fig. 7.

Selecting make-up and hair style
The first approaches presented in this potentially very popular application area were aimed at merging an image of the subject with another image presenting a particular make-up/hairstyle. In [40] a system for automatically fitting a given hairstyle to a face is described. Another paper ( [69]) deals with the automatic transfer of cosmetic styles from one face to another. However both the initial choice and the evaluation of the result are left to the user.
Scherbaum et al. [143] recently proposed a system that automates the task of choosing the make-up best fitting a given face using a learning-by-example approach (Fig. 8). The system relies on a reference database of 3D scans of 56 female faces without make-up and with a professional make-up. For each of these samples, its appearance (defined as the collection of geometric, textural and surface material information) and its make-up (the change of appearance after makeup) are computed. These data are then used to perform several operations, like determining the best fitting make-up for a new subject, make-up transfer, automatic rating of make-up and even the generation of the face shape best fitting a given make-up (which could be used for didactic purposes). Perceptual studies shows that the computer-suggested make-ups are appreciated by a panel of human raters as much as the professional make-ups.  In the following subsections, we discuss some open problems and potential areas of research.

Local analysis of facial details could improve attractiveness analysis
Several of the reviewed papers analyzing or rating beauty are based on identifying facial landmarks and constructing some representative geometric feature vector. This technique appears convenient for capturing the general harmony of the face, a prerequisite of attractiveness, but facial texture and small details, important elements of beauty, are essentially lost. Holistic techniques appear better suited to capture the texture. However, neither usual facial landmarks nor holistic techniques are able to efficiently capture small shape details of particularly important areas, such as mouth and eyes. Detailed local analysis of particularly significant areas are considered important for other face analysis problems, such as face and intention recognition (see for instance [112]), and could substantially improve attractiveness analysis.

3D research is required
The papers surveyed are mostly based on 2D frontal images of individuals, sometimes monochromatic and of medium quality. Most of the 3D research performed used synthetic faces. In 2D images much valuable information relevant to attractiveness is lost, for instance the exact 3D shape of the nose and the chin. Moreover, important applications, such as supporting plastic surgery, are essentially 3D and must deal with 3D face scans. We believe that significant advances in facial beauty analysis require research performed on 3D scanned faces, as in other face analysis areas. For instance, using very high resolution 2D images or 3D medium quality scans, the FRVT 2006 face recognition contest showed an increase of one order of magnitude in recognition efficiency compared to FRVT 2002 [58]. A review of the currently available 3D face databases, along with available information regarding scanning device, resolution, precision and annotations, is reported in [181].

Beautiful face databases should be constructed
Most automatic beauty rating systems, as well as those attempting to beautify faces, rely, for training or comparisons, on samples of faces rated at different levels of attractiveness. Therefore, a problem for effective 2D or 3D beauty research and beautification techniques is the lack of databases also containing such samples. In particular, few generic 3D face databases exist, containing relatively small numbers of elements. For instance, the 3D medium quality FERET 2006 database contained scans of 330 different subjects, while the 2D low resolution database contains images of 36,000 subjects. A further problem is that most of the subjects are of average attractiveness. Hence, in order to carry out further studies on attractiveness, 2D and 3D databases populated by faces rated at all levels of beauty should be constructed.

Assessing beauty: setting up a standard test protocol
As discussed in Section 5.2, it is impossible to compare the different approaches for automatically assessing beauty. Similar problems affected other face analysis applications. In the field of identity recognition, to compare identification accuracy it was necessary to set up the FERET contest, where a number of software producers volunteered to comply with a standard test frame. We believe that a necessary step forward in computational attractiveness is to set up a standard test protocol, based on a common database of samples rated on a common scale, including high quality 2D images and 3D scans. Anyway, this would not solve completely the problem, given the ill defined meaning of the scale levels.

Beauty analysis as a manifold learning problem
Most beauty analysis research and applications require an adequate sampling of faces rated for attractiveness, or samples of beautiful faces only. If the sampling is not sufficiently dense, these applications could fail, since the face to rate or to beautify might fall in zones that are under-sampled or not sampled at all. This raises another problem: how many samples are required for an adequate sampling of the face space from the point of view of attractiveness?
We already noted that results of the 2D research hint at an under-sampling of the face space, in particular for very beautiful faces. We can forecast that sampling the 3D face space for beauty analysis could require a rather large number of samples. This appears in agreement with the idea of many beauty prototypes, which possibly form unconnected manifolds and span many populations of different age, sex, and ethnicity. Results such as those of Perret et al. [125] also point to the need of many samples in order to cover unusual but attractive faces.
The problem of the adequate sampling is part of the more general problem of learning the beauty related face manifolds. Manifold learning techniques have also been found useful for other face analysis, from identity recognition to human age estimation by regression in a low-dimensional subspace [70]. It should be observed that different face analysis problems refer to different manifolds in the face space. For instance, face identification deals with the manifolds relative to particular individuals. Determining ethnicity from faces deals with a few multiindividual manifolds that are likely to have low intrinsic dimensionality.
Analyzing the face elements relevant to attractiveness could be reformulated as the problem of learning these manifolds, i.e. understanding their Intrinsic Dimensionality (ID, [163]), and finding data reduction techniques able to transform the face space into a lower dimensionality subspace that preserves attractiveness distances. As for the The ID estimated was around 10 ( [17]) and the discrimination of the two manifolds was effectively performed with data that was reduced to the ID with various techniques.

Considering expressions in attractiveness research
Up to now, most attractiveness research has been carried out on expressionless face images. However, the face is highly deformable and, in other well-established areas of human face image processing, e.g. face identification, the expressions, or the 2D and 3D changes in face geometry, have been considered very relevant. Recent research in human science have found that expressions are relevant to attractiveness perception too ([113] [158]), although the results presented in these papers for male and female expressions are not consistent. Also the analysis of the brain activity patterns has shown that attractiveness is affected by expressions ( [117]). Then, extending computer attractiveness analysis to facial expressions appears a new promising area of research.

Studying dynamic beauty
Human attractiveness is also related to movements, since static and moving stimuli convey different types of information that can lead to different attractiveness ratings. Recent studies in human science dealt with the relation with attractiveness perception of body ( [75]) and face motion ( [88]). To the authors' knowledge, no sound attempts have been made yet to perform an automatic analysis of the attractiveness of facial movements.