东北大学学报(自然科学版) ›› 2007, Vol. 28 ›› Issue (1): 44-48.DOI: -

• 论著 • 上一篇    下一篇

WWW网站分类体系包装器WCSW

高克宁;王波;张斌;游镇;   

  1. 东北大学信息科学与工程学院;东北大学信息科学与工程学院;东北大学信息科学与工程学院;东北大学信息科学与工程学院 辽宁沈阳110004;辽宁沈阳110004;辽宁沈阳110004;辽宁沈阳110004
  • 收稿日期:2013-06-27 修回日期:2013-06-27 出版日期:2007-01-15 发布日期:2013-06-24
  • 通讯作者: Gao, K.-N.
  • 作者简介:-
  • 基金资助:
    国家“十五”科技攻关项目(2004BA721A05)

On the WCSW: Website classification system wrapper

Gao, Ke-Ning (1); Wang, Bo (1); Zhang, Bin (1); You, Zhen (1)   

  1. (1) School of Information Science and Engineering, Northeastern University, Shenyang 110004, China
  • Received:2013-06-27 Revised:2013-06-27 Online:2007-01-15 Published:2013-06-24
  • Contact: Gao, K.-N.
  • About author:-
  • Supported by:
    -

摘要: Web网站按自身的导航体系组织信息,其导航体系中含有分类语义特征.为实现有效的Web信息抽取,针对Web网站的分类体系,提出了基于HTML页面分块算法的Web网站分类体系包装器WCSW(website classification system wrapper),WCSW将整个网站作为包装对象,以分块算法和块语义特征分析为基础,根据抽取规则对网站具有分类语义的导航信息块进行处理.实验结果表明:抽取的Web网站分类层次的准确率较高,实用性较强.

关键词: Web分类, 包装器, Web页面分块, 语义特征分析, WCSW规则

Abstract: In a website, various information is organized by its own navigation system, which involves the semantic characteristics of classification. In order to fulfill effective extraction of Web information, the WCSW (website classification system wrapper) based on HTML page blocking algorithm is proposed aiming at the classification system of websites. WCSW deals with navigation information blocks involving semantic classification in accordance to extraction rules, which the whole website as an object based on the blocking algorithm and analysis of semantic characteristics, the experimental result shows high-accuracy level classification in extracted websites with good practicability.

中图分类号: