A user (administrator) creates a catalogue structure and defines the list of categories.
Etalon documents are selected for each category / subcategory on the base of which the Categorizer forms a fingerprint. Each category corresponds to the fingerprint.
Fingerprint - is a set of main concepts (or terms) and relations between them, extracted from the etalon categories.
Documents for categorization are parsed by Linguistic Processor which extracts syntactic and semantic relations from them.
Documents presented in the form of semantic tree are compared with the fingerprints of each category. As a result of comparison, Intellexer Categorizer estimates every document proximity degree to each category.
It is determined for each document:
High relevant categories
Low relevant categories
Irrelevant categories
In order to learn the principles of how Intellexer Categorizer functions, please visit Online Demo.
Read More
To get more information about Intellexer Categorizer read also: