The named entity recognition task involves identification of proper names in texts and their classification into a set of predefined categories of interest. Most commercially available software packages detect proper names that refer to people, places and companies.
Intellexer Named Entity Recognizer successfully identifies not only personal names, names of organizations and geographical locations, but also extracts such entities as positions/occupations, nationalities, dates, ages, durations and names of events. The results of the Intellexer Named Entity Recognizer can be of great value to information end-user industries of all kinds, especially banks, finance companies, publishers and governments.
In order to provide the highest quality results, Intellexer Named Entity Recognizer combines different algorithms:
- Statistical model, which is based on Hidden Markov Model and trained on a part-of-speech annotated corpus of business transaction articles, news articles and web pages (contains more than 500 thousands token-tag pairs);
- Machine learning algorithms automatically generate named entity recognition patterns using a set of semantic dictionaries and a tagged corpus as training data;
- Expert rules are used to improve statistical algorithms results. Intellexer Named Entity Recognizer contains more than 500 rules manually created by our linguists.
For the evaluation of the effectiveness of Intellexer Named Entity Recognizer we’ve created a dataset of news articles from different domains (science, sport, business, information technology, etc.). The experiments showed that our named entity recognition technique achieves 88-97% in accuracy and can be successfully applied in various document management and analytical systems:
- Information Security solutions for tracing of suspect activity in corporate mail, social networks and other means of communication: special entities mentioning and occurrence of special entity relations;
- Marketing and PR solutions for media events and entities analysis;
- Social Media Solutions for monitoring of special entities behavior.
For developers and integrators
Use Case
Intellexer Named Entity Recognizer
Intellexer Named Entity Recognizer can be easily integrated into custom Document/Knowledge management systems using programming languages C/C++ and C#. Our SDK contains all necessary include files and import libraries for binding user applications with Named Entity Recognizer module.
Here is a C++ example of how to add Named Entity Recognizer to your application:
#include <iostream>
#include <NERCore.h>
#include <LPXml.h>
using std::cout;
using std::cerr;
using namespace NsSemSDK;
void PrintNer(NsSemSDK::INEREntityContainer* piNERSet)
{
const INEREntity *piNER;
piNERSet->Reset();
while (piNERSet->Next(piNER))
{
int nStartOffset = 0;
int nEndOffset = 0;
if (piNER->GetWordCount()) {
nStartOffset = piNER->GetWord(0)->GetStartOffset();
nEndOffset = piNER->GetWord(piNER->GetWordCount() - 1)->GetEndOffset();
}
cout << "NER: " << piNER->GetText() << " : " << piNER->GetNormalizedText()
<< " : " << piNER->GetType() << " : " << nStartOffset << " : " << nEndOffset << "\n";
}
}
int main(int argc, char* argv[])
{
try
{
char szDBPath[] = "../../LDB"; //path to ldb
char szLPluginsPath[] = "../../LPlugins"; //path to plugins
// sample sentence
char szSentence[] = "Eyal Shaked was appointed General Manager of the Optical Networks Division in October 2005.";
// provide path to the license file
SetNERLicensePath("../../ISDK_License.xml");
SetLPXMLLicensePath("../../ISDK_License.xml");
// create database interface
CInterfacePtr<INERDB> pDB(CreateNERDB());
// create extractor interface
CInterfacePtr<INERExtractor> pNERExtractor(CreateNERExtractor());
// initialize database interface
pDB->Setup(szDBPath, szLPluginsPath);
// initialize extractor interface
pNERExtractor->Setup(pDB.Get());
// Process sample text and print result
pNERExtractor->Process(NULL, szSentence, (int)strlen(szSentence));
INEREntityContainer* piNERSet = pNERExtractor->GetNERs();
PrintNer(piNERSet);
}
catch (const CSemBaseException& x)
{
// Handle exceptions.
cerr << x.what();
}
return 0;
}
As a result, you get all extracted named entities with their types in initial and normalized forms.