Chào các bạn,
Thân mời các bạn tham dự buổi trình bày về chủ đề “Text Mining Biodiversity Literature” của TS. Nguyễn Thị Hồng Nhung (Bộ môn Công nghệ Tri thức, đang làm post-doc tại Anh).
Thời gian: từ 14g đến 16g ngày 29/03/2017 (Thứ tư)
Địa điểm: Phòng B11A
Ngôn ngữ trình bày: tiếng Việt.
Rất mong các bạn sắp xếp tham dự.
Text Mining Biodiversity Literature
Biodiversity, a synergy between biology and diversity, is concerned with the study of the various levels of living entities on earth, from genes to ecosystems. It plays a central role in our daily lives, given its implications on ecological resilience, food security, species and subspecies endangerment and natural sustainability. To support the advancement of biodiversity research, several efforts aimed at storing and sharing biodiversity knowledge have been undertaken over the past few years, resulting in the creation of digital resources such as the Biodiversity Heritage Library (BHL), the Catalogue of Life, the Encyclopedia of Life, and the Global Biodiversity Information Facility. BHL is an open-access repository containing millions of digitised pages of legacy literature on biodiversity. Currently, BHL holds nearly 100,000 titles and over 170,000 volumes in many languages, accounting for a huge amount of textual content with over 150 million species mentions. The English subset alone, for instance, amounts to more than 24 million pages of text.
In this talk, I will present recent text mining results in the domain of biodiversity, conducted at National Centre for Text Mining, University of Manchester, United Kingdom. Firstly, I will describe the automatic construction of a biodiversity terminological inventory. This inventory was created by applying distributional semantic models to the English subset of BHL. It contains a total of over 288,000 species names. For each species name in the inventory, the 20 topmost semantically related names are provided, together with their corresponding similarity scores. In order to evaluate the inventory in a more practical point of view, we implemented a visual search interface incorporating our term inventory to enable automatic query expansion. Secondly, I will show our approaches to construct a knowledge repository on Philippine biodiversity. The repository will be a synergy of different types of information, e.g., taxonomic, occurrence, ecological, biomolecular, biochemical, thus providing users with a comprehensive view on species of interest that will allow them to (1) carry out predictive analysis on species distributions, and (2) investigate potential medicinal applications of natural products derived from Philippine species.
Dr. Nhung Nguyen is currently a research associate at National Centre for Text Mining, University of Manchester, United Kingdom. She obtained her PhD in Information Science at Japan Advanced Institute of Science and Technology in 2014. Her main topic is to extract relations between entities in biomedical and biodiversity literature based on predicate-argument structure patterns. She has also worked on automatically constructing a terminological inventory by using distributional semantic models.