
Big Data. Zetta Discovery – search solution
What is big data. Why is a search solution needed?
Organisations can search both internal infrastructure and external sources for information, providing contextually relevant results.
The solution of searching large volumes of documents/information is suitable when we are talking about unstructured information, but also when the data is gathered from various sources.
It is an ambitious project, a different level of solution from the information management one, a solution specifically designed to be adapted to the Romanian language.
Although the Big Data concept has been talked about for a few years, the market is in the process of adopting such solutions. It is clearly becoming a competitive advantage, especially for companies that exploit large volumes of unstructured information (banks, insurance, utilities, telecom, media, (non-)governmental organisations, pharma, etc.).
People’s behaviour and current needs are forcing companies to offer more personalised services, to adapt to the market, to operate quickly with information to be proactive and efficient, to make a difference to the competition.
It is a powerful solution for content analysis in Romanian, using an architecture based on complex open source technologies, concepts and ideas tested for over 15 years in research labs, solutions proven in dozens of public and private applications.
Zetta Discovery – Big Data solution
The key element of the Zetta Discovery solution “from text to relevant information” is the orientation towards the search result.
More precise answers can be obtained, information can be identified that has a certain contextual link with the search itself, specific and complex problems that cannot be solved by the classical search path or the impossibility of correlating and filtering certain information structures can be addressed.
Companies can build and distribute very large knowledge bases, providing employees/partners with relevant lists of information through instant access, sorting by various contextual criteria, essential elements and links to the resulting documents.
Data sources which can be: emails, databases, websites and web resources, social networks, documents in various text formats on file servers
Main benefits of the solution
- Ability to automatically generate taxonomies of words or expressions contained in materials, based on any type of data
- Automatically identify and extract entities such as name, location, department, date, content source, specific metadata, author, etc. from document content and classify information by providing additional filters
- Filtering of results in the navigation menu is dynamic, refining the search giving a more precise and controllable meaning to the user, effective for a wide range of data sources and content
- The results are classified by entities associated with the information found. Dynamic result categories are refined by a large number of metadata
- Similar searches can be made for special terms that are used in the organisation. Provides accurate results whether documents contain diacritics or not


- Connectors can be called to retrieve information from different databases, email, shared directories
- Security solutions can be implemented to limit the search to certain predefined users. Integration with an organisation’s LDAP comes with specific credentials for data access, privacy
- Log query actions, giving the administrator tools to optimize the solution
- You can optimize the search result by adjusting the search parameters
Companies continue to produce, store information, use data in their day-to-day business, the volume increases and the investment in data management and analysis grows in complexity.
Unstructured data leads to lower company productivity
Leading analyst institutions say that 80% of a company’s data is still unstructured, content on personal computers or in centralised file systems, with the rate of growth in content volume at over 200% per year. Working with these volumes of information to make a decision, to discern between useful and relevant information, is increasingly difficult. Even the process of searching for and finding information affects employee productivity, as working with information is essential for business.
In addition to the information accumulation process, finding information from multiple sources and formats (intranet, file systems, DM/CRM/ERP applications, a number of external internet sources with business information, email system other applications) as well as located in different environments, is almost impossible or with irrelevant results and high time/resource consumption. The result is directly oriented to high costs that can be analysed and affect company efficiency, productivity, lost opportunities.
Classic scenario of working with documents/information:
- Document development, management, versioning either on local station, network or specialized applications
- Exchange of correspondence internally and with partners via email
- Structured input of data into specialised applications
- Collection of information that is useful to the business and becomes the company’s knowledge base
- Calling on external sources/media handling classified or unclassified information of value to the activity
- Specialised apps/websites providing timely information
Each information handling channel may have its own search engine to retrieve a document that is related to a particular keyword. Specialised applications offer different search filters especially for structured data. Web search engines bring relevant information to the user from the perspective of the search algorithms used.
So, different data sources, information managed in different formats, different resources.
The process itself, of working with large volumes of information, is becoming increasingly challenging for all parties involved: IT, marketing, managers, scientists. Implementing ideas to simplify the process of finding and retrieving information, suitable for the organisation, has become critical for companies and IT departments.
Fortunately there are solutions to bring efficiency in this area, to help and assist users in the search process, to give them the information they need, to speed up the decision making process.
Structuring data reduces unnecessary expenditure
IDC, an independent publication that continually surveys the market, reports that at least 20% of time is spent each month searching and gathering information. Taking a simple scenario to help a company see the potential of such a solution, in a typical company (not taking into account the specifics of some organisations where the search process is the main activity) with 100 employees and 50% (50 employees) at an average salary of 15.000 EUR/year with taxes, the cost of this process alone is 150.000 EUR/year.
Considering the studies that an efficient search solution can improve the process by at least 50%(53.4%, according to IDC estimates), this company can save EUR 75,000 annually, money that can be used for other purposes.
Where the Zetta Discovery solution can be used
Through the results it delivers, Zetta Discovery provides organisations with an efficient, powerful and controllable framework to be more effective against the competition, to operate with diverse resources and large volumes of data, to discover and analyse new insights into the relevance of intelligence gathering.

There is a large area of applicability for larger volumes of information.
Depending on the domain from which the organisation originates, more precise answers can be obtained, information can be identified that has a certain contextual link to the search itself, specific and complex problems that cannot be solved by the classical search path or the impossibility of correlating and filtering certain information structures can be addressed.
Advanced correlation techniques, event conditions are used.
There are industries that encompass huge amounts of data (telecommunications, media, utilities, healthcare, education and research, financial-banking, insurance, etc.), analysis of information from multiple data sources and the accuracy of responses can lead to improvements in existing processes, environmental factors, processing time in identifying problems or resolving them, reducing risks of fraud, abuse, transactions, monitoring the health of action plans, increasing the accountability of governance, etc.
We seek to give new meaning to the idea of leveraging data to its full potential, data from various sources, internal content, leveraging the organisation’s knowledge, content from external sources.
The Zetta Discovery solution has the ability to index millions of documents, the solution can scale up to billions of documents using an extensive hardware architecture.
It is capable of indexing over 100 types of text documents, including html, pdf, office documents and more.
There are connectors for different databases, and others can be developed, depending on the customer’s specific needs.
Act like top companies!
The Zetta Discovery solution has the ability to index millions of documents, the solution can scale up to billions of documents using an extensive hardware architecture.
It is capable of indexing over 100 types of text documents, including html, pdf, office documents and more.
There are connectors for different databases, and others can be developed, depending on the customer’s specific needs.
Enter the world of high-volume information search with ZettaDiscovery alongside leading products: Google Search Enterprise (Appliance), IBM Data Explorer, Microsoft Fast, Lucene, LucidWorks, Oracle Endeca Information Discovery, HP Universal Search!
Companies continue to produce, store information, use data in day-to-day business decisions, the volume of data is growing dramatically, data sources are varied, data management and analysis are increasing in complexity.
Zetta = unit of measurement of storage capacity (1 ZB = 1,000,000,000,000,000,000,000,000 bytes = 1 billion terabytes)
- The space on all the world’s hard drives was estimated in 2009 to be about a ½ zettabyte
- The total amount of data worldwide in 2012 was 2.7 zettabytes, up 48% from 2011 (IDC)
- In 2013 the World Wide Web reached 4 zettabytes, an increase of approx. 1.6 zettabyte/year
- By 2020 the total global volume of digital information is expected to be over 35 zetabytes
- 90% of the data that exists in the world today was generated in the last two years