Merci de désactiver le bloqueurs de pub pour visualiser cette vidéo.
The AMF would like to thank all those who took part in its Entity Name Matching challenge on Codalab
21 April 2021

The AMF would like to thank all those who took part in its Entity Name Matching challenge on Codalab

From 13 January to 12 March 2021, the AMF issued a challenge to coding enthusiasts to come up with an effective method for identifying market participants, via natural language processing techniques. The winner of the challenge was Robert Stanca, an IT student from the Politehnica University of Bucharest.

On 13 January 2021, the AMF launched its Entity Name Matching challenge on Codalab, an open-source platform specialised in data science competitions.

The challenge required participants to propose an efficient method for ascertaining a market participant’s identity. They had to find the unique identification code, the Legal Entity Identifier (LEI) of a participant, when its name was mentioned or alluded to in a document. The solutions proposed could enable the AMF to compare the various data sources more effectively to obtain a single consolidated view.

The challenge ended on 12 March 2021. The AMF would like to thank all those who contributed, and in particular the winner Robert Stanca. Using natural language processing (NLP) techniques, he proposed an approach that made it possible to calculate the similarity between the name of the entity mentioned in a document and data from the Legal Entity Identifier global database.

To calculate this similarity, the text (such as the entity’s name) must first be transformed into numerical values. This is called encoding. This stage must however not be done randomly: in reality, the text is transposed into a vector space so that it retains its properties. For example, the vectors associated with the words "house" and "flat" must be fairly close to each other because their meaning is related. The words "participate" and "participant" are also fairly close because they share a common root.

The algorithm thus detects within the repository the row where the value closest to the name of the participant sought lies. Since the repository also contains the related LEI, this effectively does the trick.