This system module/technology will be used to convert unstructured data in context to structured data for analysis in the system. This technology is able to read both written data and speech and convert this into a structure the client and system can more readily use for analysis, investigations and accurate decision making. Note that this process will occur in real time and feed into the system.

As an example, if unstructured data is fed into the system from any data source, the system will run its NLP module against the data and the result will be fed directly into the system for trend analysis and predictive analytics.

Optical Character Recognition

Images and scanned documents will be processed using this module/technology to capture data embedded in the images for storage analysis and link resolution.

An example of this is where clients store document images. These images can be fed into the system and used for present and future analytics in concert with other data streams.

Data Ingestion & Validation

The platform provides a cognitive data ingestion layer. This layer handles the collection or receiving of data for analysis in the CST platform. As mentioned this data can be structured or unstructured. Also In addition to the main division of batch versus streaming data, any platform with multiple data sources accommodates diversity in other key factors, each of which plays its part in affecting the reliability and complexity of the ingestion phase:

  • Data source type—for example, databases, event streams, files, log files, web services, and external feeds.
  • Data transport protocol
  • Update semantics of the incoming data—for example, append, changeset, and replace data elements.


The CST ingestion service also provides other service functions that are often performed as part of the ingestion process: data validation, Data cleansing, transformations, and routing the data items to their destination. The ingestion stage is a critical point for ensuring the reliability of service and data quality; As a result, CST uses a cognitive entity resolution ( data resolver) service to “’ clean” and makes sense of the data BEFORE real analysis begins.

Note: The CST data ingestion module can deal with multiple terabytes of data in sub-seconds because of its use of streaming technologies that compress data streams for easy ingestion.

The idea is to provide our clients with a Data lake that holds all data as a single point of truth. This does NOT mean the client data is removed from legacy systems. Instead, the CST platform will integrate with existing data pipelines to “hoover” the data into the CST document ingestion/storage platform.

The architecture of the platform is focused on the ability to analyze predict and report on data from multiple sources. Data can be fed into the system through Queues, JSON, XML, streams, ETL and service calls. In addition, the system can PULL data from a variety of sources including relational, document-based databases, and even oil pipelines using sensor streams. Each of these sources is configured using a system adapter. Note that the preference of the platform is to pull data from different sources as this appropriates the job of data collection to the Platform and eliminates external points of failure.

The cognitive technology approach provides access to metadata across multiple channels in real time. This function is what sets the platform apart from others. It uses cutting edge artificial intelligence techniques to access data from multiple streams and can analyze and learn from this data as it comes in. This is an important distinction as the difference between analyzing of incoming data and analyzing that data when it is at rest can be the difference between thwarting a threat or crime and just learning from it after the fact.

The ability to collect data from both structured (metadata) and unstructured sources (social media, communication content and verbal reports from the field) and unify the data for creative analysis is key to the platforms support of intelligence and investigation strategy and decision-making.

The platform uses real-time data streaming technologies to access data at multiple locations and has the ability to make the data collected, indexed and searchable in real time. The advantage is that analysis can be performed across data boundaries and relationships in the data are quickly diagnosed and fed into real-time analysis.

As an example, the system can collect (pull) data from multiple communications carriers and analyze that data in real time to show relationships between communications across these carriers. It can also use natural language processing to detect patterns in communication. This function can be performed while the data is being collected into a central data lake.

The use of artificial intelligence software to automate this analysis is what sets the CST platform apart from other intelligence-gathering platforms. It is no longer enough to just receive (pushed to) data into a repository before analysis. Data can now be analyzed and value derived…while the data is being ingested…in real time.

Note that organizations will provide the legal framework for access to data from enterprise and other sources.

Link Resolution

Problem Identification: The fundamental changes in the global hydrocarbon markets drive more production and exploration to a focus on shale and other less-accessible deposits. The oil and gas industry must increase CAPEX investment to identify and extract those new deposits while simultaneously reducing the environmental, health and safety risks of bringing that resource to market.


The CST platform supports a mature reporting and analytics engine. It allows users to use a 360 view of the data to save any analysis as a template and later use that as the basis for reporting. It also allows users to set criteria for event-based reporting in the system. In effect, any analysis performed in the system is eligible for both time-based (reporting) and event based (alerting) reporting. Note that this reporting (and the data reported upon) is tied to what a user has the autonomy to see and analyze. As an example, clients may want to schedule reports across its departments while individual users will only be able to alert and report on data they are allowed to see. The autonomy of users will be completely controlled by the client.

The platform also provides granular report management and configuration tools. CST employs a team of UX designers during development to make sure that all functionality in its systems is easy to use and more importantly does not take up users’ time unnecessarily. The reporting module follows this principle and provides an easy to use configuration and management interface. The idea of the design is that users should be able to configure granular reports and schedule their delivery over time. Once the scheduling is complete, the audience specified will receive the configured reports as configured by the user. This will NOT require further human interaction. In effect, the system can run its service in the background without the need for the user to log in.

Report Management

The scheduling engine of the CST platform is proprietary. It is based on industry-standard scheduling techniques and provides the ability to configure both real-time and back scheduling of report, alerts, and data loads. This functionality will be exposed as a service for use by other client services where required. Note that in addition to being able to report on data, the system also uses its reporting engine to report on system stability, user interaction and even resource usage in the system.

The CST core platform supports the following report delivery options: Secure Email, Notifications to Mobile devices, and Queue to customer integration points.

The platform supports report auto refreshing in 3 ways:

  • A user can set criteria for auto refresh directly in the screen view of the platform. This will auto refresh the reports shown on screen in real time.
  • Also, the user can schedule a report to repeat at configurable intervals and send the report to recipients. This will refresh the report from the last time it was sent.
  • While analyzing data, the system pushes data updates to screen reports in real time. There is no need to refresh in this case.

Report Configuration

The following report export options are supported: CSV, PDF, Word, Excel, JSON, and XML. Except for JSON and XML, all other report delivery formats will contain both data and graphics. The platform will also support the ability to export these report types through an API.

Many business intelligence tools promote the use of complicated tables and tabs to create business reports. The use of tabs and pivot tables when creating a report is exactly what the CST platform aims to eliminate. These ideas are embedded in older BI tools that require a lot of proprietary knowledge to use.

Users should be able to create complicated reports on screen using widgets and screen controls and export these in any format required. These reports will not only have data but also graphics to provide an at a glance summary of the data. The look and feel of the reports can be reconfigured to suit the receiving audience. This is an important shift in BI terms as it allows the business users to concentrate on their analysis work of the data rather than spending time configuring tabs and tables.

Both drill down and drill across functionality is supported. The user can select a drill down criteria from a set of configured data to look specifically at the data in the context of other data. This is achieved by presenting the user ALL data required and allowing the user to select what drills are required both on-screen and in reports generated from the drill-down analysis.

KPI (Key Performance Indicators) reporting is supported. While OLAP is the mainstay of many BI tools, modern technology has advanced this concept to remove the need for IT to constantly monitor and change OLAP data. The CST platform supports this new approach by creating an abstraction layer of the data to make it instantly searchable and provide fast relationship-based access. KPI reporting is thus supported by allowing the users instantaneous views of OLAP data without the need for complex MDX queries. Details of what KPI measurements are required will be flushed out in detailed design and will depend on available data.

Note that NO limits are imposed by the platform regarding record limits in reports. Note though those external limitations may apply. As an example, how much data can be sent over email? The system will provide configuration and warnings in areas where these external limitations may exist.
Reports can be viewed in the system or exported in various formats for offline viewing. The architecture of the platform screens is based on this principle: That every user click or data manipulation creates a report on screen. It is this report that can then be scheduled, saved or alerted upon.

Word Report Snippet