There is huge amount of digital text which is rapidy expanding. With the growth of Web 2.0, online communities are contributing to content on the web thereby causing an explosion of online media. Who are the consumers of this content? How can they benefit from it? Obviously, a normal web surfer would consume such information to conduct daily tasks or for infotainment purposes. Businesses can build huge amount of intelligence from textual data, both from WWW or from intranet. Information Extraction (or IE) technologies help businesses exactly to do that.
IE technologies are designed to make sense out of huge amounts of content, usually all of which is relevant to the organization. So what is Information Extraction? Information Extraction is the process of identifying relevant information where the criteria for relevance are predefined by the user in the form of a template that is to be filled. Typically, the template pertains to events or situations, and contains slots that denote who did what to whom, when, and where, and possibly why or what event has occurred and what are the details of the event. These templates are usually decided depending on the use-cases as to what the user is looking for. Once the templates are defined, the template filler has to predict what data will be of interest to the user and define its slots and selection criteria accordingly. If successful, IE delivers the template, filled with the appropriate values, as found in the text(s).
For example, you may want to keep track of all the news about your business area and automatically alert the right people in your organization. If you are a stock trading firm you may be interesting in specific events like mergers, acquisitions, product launches etc. If you are a pharmaceutical company you may be interested in new drug launches, IPR issues, legal battles in your space etc. Such use-cases can be found in every industry these days, since most of them are knowledge driven. IE engines could help solve such use-cases by providing structured information from large amounts of periodic data, such as news or blog feeds.
Simply put, an IE engine helps you to generate structured, unambiguous information which is ready for machine processing, from unstructured, ambiguous text written in human language. SETU’s IE engine does exactly this task of automatic template filling and automatic concept and event identification and the associated details that can be found in the given text. Once such templates are filled at a very large scale (millions of concepts), one can say the IE engine enables Web 3.0.
SETU’s IE Engine is currently capable of tagging millions of concepts and makes their meta-data (Ontological information) available for use in further system building. Such ontological information can be useful to link the tagged data into your organizational ontology and hence our engine fits very well with any Web 3.0 initiatives in your organization.