Nowadays, scholarly knowledge is evolving and research findings are subject to adjustment or enrichment daily. Wikimedia Projects, particularly Wikipedia and Wikidata, trying to capture a significant set of human knowledge are facing difficulties in covering major aspects of notable information due to the limitations of human efforts in this context. This fact is worsened by the lack of skills in library and information science such as reference support and the biased representation of the Wikimedia Community, disregarding females and people from the Global South. Bibliographic databases such as PubMed provide large-scale data about millions of scholarly publications, covering various aspects of scholarly knowledge. Thus, these resources can be leveraged to capture insights into human findings using a variety of techniques ranging from natural language processing and knowledge engineering to machine learning and graph embeddings. In this presentation, I will highlight our research project as the Data Engineering and Semantics Research Unit in Tunisia and the Sisonkebiotik African Biomedical Machine Learning Community to use PubMed, a bibliographic database for the biomedical domain curated by NCBI and NIH, to verify, enrich and validate clinical information in Wikidata using intuitive algorithms that consider the bibliographic metadata of scholarly publication when retrieving information to update Wikidata in the biomedical context. These algorithms are driven by Biopython, a Python Library for managing PubMed Entrez API, and PyTorch and Scikit-Learn as Machine Learning Python Libraries.
Agenda available
here.