Intellexer Categorizer

Try API for Free
Ask for Customization

Intellexer Categorizer is a semantic tool that automatically classifies documents by content and organizes them within categories that best fit the structure of your company and processes. For example, the categories can be Human Resources, Research and Development, Finance, Customer Feedback, Newsletters, etc. The definition of categories is a free user’s choice not restricted by categorization algorithms.

Example of automatic document categorization

Intellexer Categorizer is an ideal solution for:

Intellexer Categorizer is available as Desktop Application or component of Intellexer SDK.

Intellexer Categorizer algorithms are based on machine learning technique. In this approach document classification runs in two modes: the training phase and the prediction stage.

At the training phase Intellexer Categorizer builds a classifier by learning from a set of model documents for each category. Its learning algorithm uses a wide range of semantic features extracted from document texts:

  • Words with part of speech tags;
  • Noun phrases and syntactic dependency between them;
  • Complex semantic relations detected by custom Intellexer Linguistic Processor.

At the prediction stage Intellexer Categorizer uses the vector space model for document classification. Each input text is compared with semantic features from the model category and the degree of proximity between them is calculated. The document is assigned to the category with maximum relevance value.

For evaluation experiments, we’ve used the Reuters-21578 dataset. At this collection we’ve achieved text categorization F-Measure over 87% (typical competitors’ result: 78-82%).

For developers and integrators

Use Case

Intellexer Categorizer