Scientific Abstract | Fertility and Sterility

Clinical evaluation of a machine learning model for embryo selection: A double-blinded randomized comparative reader study

October 14th, 2023


Objective: To evaluate the performance of a machine learning model for ranking blastocyst stage embryos for transfer using a double-blinded randomized comparative reader study.

Materials and Methods: In previous work, a machine learning model was developed that predicts the likelihood of clinical pregnancy using embryo morphology grades assigned by embryologists using Gardner classification and day of development (5, 6, or 7). This model was trained on data from over 12,000 single-blastocyst transfer cycles from multiple U.S. IVF clinics performed between 2014 to 2021. To independently test the model, a retrospective, double-blinded, randomized comparative reader study was performed. The study included data from 438 single-blastocyst transfers from 10 different IVF clinics in the U.S. that were not part of previous model development or testing. Using this data, a large set of 1,257 simulated, or virtual, patient panels were created. Each virtual patient panel included between 2-5 embryos that were matched by age (18 - 29, 30 - 34, 35 - 37, ≥38), race (white, non-white, and unknown) and PGT-status (untested or euploid transfers). A group of 5 embryologists (readers) with varying levels of experience were then asked to select their top embryo for transfer for each virtual patient panel (control arm), and the machine learning model was also used to select a top embryo for transfer from each patient panel (treatment arm). The clinical pregnancy rates (CPR) of the top-selected embryos were calculated and compared using a comparison of proportions (clinical pregnancy rates), using a 2-sided type-1 error rate of 5% (α=0.05).

Results: The average CPR of the control arm (embryos selected by embryologists) was 61.0% (individual rates of 58.9%, 59.6%, 61.5%, 61.6%, and 63.3%), and the CPR of the treatment arm (embryos selected by machine learning model) was 62.1% (demonstrating non-inferiority with p<.001). In 35% of cases there was inter-embryologist variability in the top embryo selected for transfer, and when there were 3 or more embryos to choose from the variability increased to 44%. When all 5 readers agreed (65% of the time), the machine learning model also selected the same top embryo in nearly all cases (99% of the time), showing very high concordance with the group consensus.

Conclusions: The machine learning model was non-inferior to manual embryo selection overall. When there was group consensus on the top embryo for selection, the machine learning model agreed in nearly all cases.

Impact Statement: The machine learning model is able to select the top embryo for transfer with similar performance to experienced embryologists. Such a model could allow for objective and consistent embryo selection resulting in potential standardization across laboratory practices and networks.