Data analytics

CST Analytics

Robust big data analytics is fully supported. The CST platform uses data at rest and in the stream to provide real-time analysis of data trends. The idea of the platform is to provide clients with deep dive insights into data. Trend analysis forms a major part of delivering on that promise. Data from multiple sources are combined to analyze and report trends. A critical component of trend analysis is time. The platform recognizes this and deploys all its data storage formats based on time. This makes it easier for the user to see and report on trends of disparate federated data in real time.

Note that trends can be analyzed for ANY level of granularity and is completely time-based.

The analyzing, trend learning and predictive analytic capabilities are supported through the use of the system of HDFS which enables a real-time distributed analysis of data. This is also applicable to unstructured data. Unstructured data is processed by a CST adapter before ingesting the data into the systems HDFS. Both during the ingestion process using streaming technologies, and when the data is in HDFS, it can be analyzed and reported upon in the same way structured data is held and analyzed. In some cases, the CST adapter may find it necessary to language process the unstructured data for context analysis.

Performance, Scalability & Monitoring:

The platform supports caching at different application levels and components. As an example, the web application uses HTTP standard headers for caching, eliminating the need to make multiple requests for the same resources. The data storage layer also caches data to improve response times. ALL CST web services also implement caching where applicable for slow-moving data in the system. At all points of caching, monitoring is implemented to make sure no cached data becomes stale.

System resources are monitored in real time and allocated automatically. The CST platform implements a technology called “Kubernetes” which manages all resource allocations and monitors the entire platform cluster to improve resource consumption and performance. The CST platform is deployed in a distributed environment where each container leverages its own JVM. This approach to application deployment isolates the deployed containers and allows for separate resource allocation which in turn improves performance.

Platforms Cluster monitor

The CST platform stores data on disk and is loaded into memory by mapping the data files using standard operating system functions. Performance is further improved through the use of non-blocking I/O. This in-memory processing is constantly monitored and reported using thresholds with the system automatically allocating resources depending on cluster load.

All CST components leverage in-memory structures and processing where applicable. For example, components responsible for data processing and machine learning load data from the distributed physical storage into memory so data streams can be processed much faster than traditional ETL tools. Similarly, the retrieval and analytics layer will make use of various techniques (e.g. in-memory bloom filters and bitsets) to ensure query performance also meets the user’s needs.

A full monitoring dashboard is provided not just for performance monitoring but also CPU and memory and ingestion performance and status. In addition, each request and response to all application components are monitored in real time using an audit component. Performance degradation is automatically alerted to configured recipients and shown in monitoring tools. In the CST environment, JMeter is used to measure performance throughout the development lifecycle and environment deployment path.

Monitoring and automatic resolution are performed at all levels of the application.

Container Memory Monitoring

Cluster Memory Monitoring

Server Monitoring

10.240.0.5

  • CPU - 8%
  • Memory - 3.22 GB / 3.89 GB
  • Filesystem#1 5.51GB / 105.55 GB

10.240.0.6

  • CPU - 16%
  • Memory - 3.30 GB / 3.89 GB
  • Filesystem#1 4.07GB / 105.55 GB

10.240.0.3

  • CPU - 5%
  • Memory - 3.34 GB / 3.89 GB
  • Filesystem#1 3.38 GB / 105.55 GB

10.240.0.4

  • CPU - 7%
  • Memory - 3.44 GB / 3.89 GB
  • Filesystem#1 4.94 GB / 105.55 GB

Multiple CPU processing is supported out of the box with containerized deployment which isolates each container in its own JVM (java virtual machine). The CPU management is handled by the platforms deployment architecture using Kubernetes and Docker technology.

All components within the system may be load balanced where appropriate to improve resiliency. Load balancing is implemented by way of Kubernetes services, which provides a Virtual IP across a set of component instances. Clients simply access the service via the VIP.

CST Kubernetes Management of Docker Containers

Development Environment

The platform will provide an analysis environment that will allow users to analyse data. Note that the system sees the result of any user interaction on screen as a potential report. In effect, users can slice and dice the data in screen and save templates that can later be used as the basis of both reporting and alerting.

The CST platform includes an SDK that can be used to programmatically interact with the platform. The platform exposes a set of restful services of all its functionality. Data can be fed into the system in the form of XML, JSON, OLAP, DT (data Streams). In addition, the platform can use its adapters to inject data in the formats listed above from a wide range of sources including relational DBs, NoSQL DBs, Social media, and even flat files.

The platform exposes Restful services that can be used for analysis, reporting and data queries. A full specification of the exposed services will be provided along with training on how to use the services for the best benefit.

Users of the system will leverage visual tools to analyze the data. This is an important user experience point. At CST, it is important that not only our solutions provide real business benefits but it is also important that users have an unforgettable experience when using the platform…irrespective of industry.

The platform does NOT require any knowledge of query languages to interact and analyze the data. This is exactly what the platform is designed to eliminate. Users interact with visual objects and controls that allow for better granular analysis of the data. The flexibility of HOW to interact with the objects is controlled by the customer. It is understood that this is a paradigm shift from older BI technologies, however, this approach means that only knowledge of the data is required and places the burden of development of artifacts to the engine using modern tools.

A full debug screen is provided as part of the platform and all the platform services will promote robust debugging capabilities. The idea here is that CST (and indeed the first line of support) wants to know about issues before they become an issue. In this area, the platform also uses technology to continuously monitor the system for issues before users report them. In addition, the platform also has monitoring screens for application-level debugging that at a glance can show the occurrence of issues and provide notifications where appropriate and configured. Technical users are able to leverage system graphical tools to drill down into the application logs for troubleshooting.

CST Kubernetes Management of Docker Containers

A full debug screen is provided as part of the platform and all the platform services will promote robust debugging capabilities. The idea here is that CST (and indeed the first line of support) wants to know about issues before they become an issue. In this area, the platform also uses technology to continuously monitor the system for issues before users report them. In addition, the platform also has monitoring screens for application-level debugging that at a glance can show the occurrence of issues and provide notifications where appropriate and configured. Technical users are able to leverage system graphical tools to drill down into the application logs for troubleshooting.

All development is done in the CST environment and deployed to the client test environment. ALL configuration and testing should be performed in a client QA/test environment. A clustered deployment means multiple instances of the application are at play here. Any configuration and user management should be tested here along with functionality. In some cases, clients provide a UAT (user acceptance testing) environment before the production environment in order to get business users familiar with new functionality before PROD deployment. Once all tests complete, the configuration and metadata can be progressed to the PROD environment.

Note that trends can be analyzed for ANY level of granularity and is completely time-based.

Licensing

Unlimited/ CPU licensing is supported. CST has a full-featured R&D department that is constantly improving its platform. Product roadmaps are shared with clients each year to inform on upcoming changes and or upgrades. While some of these upgrades may be free to clients (e.g. security updates), in some cases CST will ask clients to pay for functional upgrades. This is used to feed into the CST R&D effort to continuously improve the offering.

There is no limit to how many users can concurrently log into the platform. The distributed nature of the platform means that the load is handled without affecting the user experience. The only limitations are those that may be imposed by infrastructure resources. In this regard, CST always uses a future proof approach to infrastructure needs and set up. Also, note that the reason for a lot of BI and CRM tools having concurrency limits has to do with the technical architecture and its limitations. A monolithic application will use the same resources for every part of its processing.

Also, the types of technology under the hood matters! The CST platform is a completely distributed application and segregates its resources by containers thus making it more able to react to load and even failures. The most recent tests of the platform were done against 150,000 concurrent users with no issues found. Load testing is part of the environment progression methodology at CST. The idea is to find and isolate any performance issues during development which is then easily isolated to the new code since the last build.

Hardware Architecture

CST R&D team has done extensive research into distributed infrastructure to determine the best infrastructure platform to accommodate today’s large data collection profile. CST uses this research as the basis for consulting on infrastructure needs to clients. The following is what CST recommends for large organizations looking to future proof their infrastructure for the near future:

Desc Hardware RAM CPU (GB) Boot Disk Additional Notes Count
Build HP DL380 256 2*12 Core 2*80gb ssd 2*480gb ssd
2*2.1.2TB 10k HDD
Test Prod DR 4
Application HP DL360 128 2*10 Core 2*80gb ssd 2*480gb ssd. 2 Prod
2DR
4
Analytics HP DL380 256 2*12 core 2*80gb ssd 2*480gb ssd
2*2.1.2TB 10k HDD
3 Prod
3DR
6
Slave Nodes HP DL380 256 2*12 core 2*80gb ssd 2*480gb ssd
2*2.1.2TB 10k HDD
3 Prod
3DR
6
Name Nodes HP DL360 128 2*10 core 2*80gb ssd 2*480gb ssd 1 Prod
1DR
2
Test HP DL380 384 2*12 core 2*80gb ssd 2*480gb ssd
2*2.1.2TB 10k HDD
1 All Test 1
Switches
Desc Requirements Notes Count
Data/Application Network 4*10GB Switch Min 16 ports 2 Prod
2 DR
4
Build/Support Network 2*1GB Switch Min 16 ports 1 in each data center 2
Firewall/Load Balancer Client Supplied Client Supplied