%0 Journal Article
%@ 18650929
%A Zamin, N.
%A Oxley, A.
%C Kuantan
%D 2011
%F scholars:1993
%J Communications in Computer and Information Science
%K Computational costs; Document Representation; Entity identification; Essential component; Filtering technique; Gazetteer; Information Extraction; Named entity recognition; NAtural language processing; Rule based; Stumbling blocks, Computational linguistics; Learning algorithms; Software engineering, Natural language processing systems
%N PART 2
%P 73-80
%R 10.1007/978-3-642-22191-0₆
%T Building a corpus-derived gazetteer for named entity recognition
%U https://khub.utp.edu.my/scholars/1993/
%V 180 CC
%X Gazetteers, or entity dictionaries, are an important element for Named Entity Recognition. Named Entity Recognition is an essential component of Information Extraction. Gazetteers work as specialized dictionaries to support initial tagging. They provide quick entity identification thus creating richer document representation. However, the compilation of such gazetteers is sometimes mentioned as a stumbling block in Named Entity Recognition. Machine learning, both rule-based and look-up based approaches, are often used to perform this process. In this paper, a gazetteer developed from MUC-3 annotated data for the 'person named' entity type is presented. The process used has a small computational cost. We combine rule-based grammars and a simple filtering technique for automatically inducing the gazetteer. We conclude with experiments to compare the content of the gazetteer with the manually crafted one. Â© 2011 Springer-Verlag.
%Z cited By 5; Conference of 2nd International Conference on Software Engineering and Computer Systems, ICSECS 2011 ; Conference Date: 27 June 2011 Through 29 June 2011; Conference Code:85603