8.22 MSR Talk & 组会笔记

less than 1 minute read

Published:

Microsoft Research Talk

Machine Reading for Precision Medicine

Annotation Bottleneck

Document-Level N-array Relation Extraction

Self-supervised Learning

Knowledge Base

Distant supervision: Relation from database/knowledge graph as noisy labels.

Knowledge -> Probabilistic Logic -> Distant Supervised Label -> Deep Learning

Neural Architecture

Graph LSTM

Graph -> Exploit rich linguistic structures

Multi-scale Representation Learning

for Document-Level relation extraction

Input Text -> Mention-level representations -> Entity-level representation -> final predictions

“Document-level n-ary relation extraction with Multi-scale representation learning” - NAACL-19

Application

Machine reading for assisted curation. curator

  • molecular tumor board
  • EMR: 60%-80% in unstructured text

Some discussion

BERT is too general. Prior knowledge is very useful and may be added during training BERT.

CoAI Lab Disscussion

Pre-training Language Model

Auto-regressive language model behave as well as non-autoregressive auto-encoder in pre-training methods.