There is a global need to identify chemicals of emerging concern (CECs) to reduce chemical harm and prevent future environmental damage. We have developed a methodology to search for emerging concern indicators in the up-to-date scientific literature using Natural Language Processing, the first time AI has been used to alert stakeholders that a product may be more harmful than previously thought. Candidate databases were evaluated based on criteria including the relevance and comprehensiveness of research covered, practical considerations for data extraction (e.g., API availability), and licencing considerations. Metadata are downloaded from the selected database and used to locate publishers’ landing pages, from which abstract texts are extracted. To ensure that chemical name variants/synonyms are captured to provide confidence that literature pertinent to a given chemical is captured, we iteratively encompassed as many synonyms as possible. To identify emerging concern, texts are therefore evaluated to determine whether they satisfy two principal criteria (for each chemical discussed):
The proportions of records satisfying each of the criteria are used to calculate an
‘emerging concern score’, relative to chemical alternatives. Our results show that
the huge numbers of publications released each day can quickly be scanned and
scored, and we have integrated the emerging concern dimension into an existing
chemical management framework. Validation shows that the results do reflect
the concern inferred, i.e., confirming that emerging concern is not coincidental
and results correctly identify known CEC. This new methodology overcomes the overwhelming, if not impossible, task of manually vetting an ever-increasing influx of new literature.