Automatically assigned DDC number: 0063

Manually assigned DDC number: 00635

Number of references: 0

Title: Selective Sampling In Natural Language Learning



Subject: Ido Dagan,Sean P. Engelson Selective Sampling In Natural Language Learning

Description: Many corpus-based methods for natural language processing are based on supervised training, requiring expensive manual annotation of training corpora. This paper investigates reducing annotation cost by selective sampling. In this approach, the learner examines many unlabeled examples and selects for labeling only those that are most informative at each stage of training. In this way it is possible to avoid redundantly annotating examples that contribute little new information. The paper first analyzes the issues that need to be addressed when constructing a selective sampling algorithm, arguing for the attractiveness of committee-based sampling methods. We then focus on selective sampling for training probabilistic classifiers, which are commonly applied to problems in statistical natural language processing. We report experimental results of applying a specific type of committee-based sampling during training of a stochastic part-of-speech tagger, and demonstrate substantially improv...

Contributor: The Pennsylvania State University CiteSeer Archives

Publisher: unknown

Date: 1995-08-22

Pubyear: 1995

Format: ps



Language: en

Rights: unrestricted


<?xml   version="1.0"   encoding="UTF-8"?>


      <rec   ID="SELF"   Type="SELF"   CiteSeer_Book="SELF"   CiteSeer_Volume="SELF"   Title="Selective   Sampling   In   Natural   Language   Learning">

            <identifier   Org="ISBN:3790814369"   Paper_ID="SELF"   Extracted="3790814369"   DDC="006.3"   Normalized_DDC="0063"   Normalized_Weight="1.0"   />