z/TPF Rest Service Monitoring Dashboard

Problem Statement: How do I know what services are in use on my system and how much system resources they are using?
Idea: Dashboard shows a list of services available on the system. For each service it shows: calls per second, success/failures rates, average response time, average CPU used, average existence time, average FINDs, average FILEs. The table is sortable by each column. Table is updated in real time.
IBM Solution Thoughts: The solution could leverage name-value pairs for service name, service version, and service user. As such, metrics would be available through the real time runtime metrics collection mechanism. Open source sample dashboards would be provided as a starting point for customer implementation.
Value: More easily diagnose issues with service usage and system resource usage issues due to REST services.
Question: What are the most important metrics to show?

Problem Statement: How can I learn more about a particular service?
Idea: For a specified service, a dashboard shows tables and graphs updated in real time:
date stamp of service activation/deployment. date stamp of last recorded usage.
show all components (descriptors, dfdl, etc) that make up a service, datestamp of load/deployment, loadset name, and etc. This could be sent though RTMC at loadset activation/deactivation time. This is very useful for helping operations determine what to do about an issue.
history of calls per second and other key trends.
expanded current metrics such as calls/second, success/failures rates, response time, internal vs external calls, CPU, memory usage, DF usage, etc.
a list of all users of a service and summary metrics of their usage (calls/second, CPU, etc).
a list of code packages called and summary metrics of their usage (calls/second, CPU, etc).
IBM Solution Thoughts: As before, leverage NVPC/RTMC and open source sample dashboards.
Value: More easily diagnose issues with service usage and system resource usage issues due to REST services.
Question: What are the most important details to show?

Problem Statement: How can I learn more about a particular user's usage of a service?
Idea: For a specified user and a specified service, a dashboard shows tables and graphs updated in real time:
history of calls per second and other key trends.
expanded current metrics such as calls/second, success/failures rates, response time, internal vs external calls, CPU, memory usage, DF usage, etc.
a list of code packages called and summary metrics of their usage (calls/second, CPU, etc).
IBM Solution Thoughts: As before, leverage NVPC/RTMC and open source sample dashboards.
Value: More easily diagnose issues with service usage and system resource usage issues due to a REST service used by a given user.
Question: What are the most important details to show?

Problem Statement: How can I quickly understand if a service is causing a resource issue on my system?
Idea: Dashboard shows the results of data science analysis between a the details of a service and overall system state like actual CPU, inuse ECBs, and etc. For example: a correlation analysis between service calls per second by service and the actual CPU of the system.
IBM Solution Thoughts: This builds upon the predictive analytics initiatives leveraging similar techniques on the NVPC/RTMC data in open source sample dashboards.
Value: More quickly diagnose issues with service usage and system resource usage issues due to REST services.
Question: What are the most important analytics to perform? What are the most significant metrics to analyze?

Problem Statement: How can I proactively be notified of REST service behaviors of potential interest? For example:
Service X is not being called today which is different from it's historical trends.
Service X called by user Y is experiencing a high failure rate.
Service X is using more CPU and corresponds to overall system CPU rise.
Idea: Perform various analysis and prioritization of items of interest, surface them on an alerts dashboard page. Potentially provide indications of where to look next.
IBM Solution Thoughts: This builds upon the predictive analytics initiatives leveraging similar techniques on the NVPC/RTMC data in open source sample dashboards.
Value: More quickly diagnose issues with service usage and system resource usage issues due to REST services.
Question: What are the most important analytics to perform? What are the most significant metrics to analyze?

Idea priority

High

Post comment

By clicking the "Post Comment" or "Submit Idea" button, you are agreeing to the IBM Ideas Portal Terms of Use.
Do not place IBM confidential, company confidential, or personal information into any field.

Shape the future of IBM!

Search existing ideas

Post your ideas

Specific links you will want to bookmark for future use

z/TPF Rest Service Monitoring Dashboard

Please enter your email address

RELATED IDEAS

z/TPF Rest Service Monitoring Dashboard