Announcing Pulsar Reporting: Near-Real-Time Metrics Reporting Framework

We are excited to announce the first open-source release of Pulsar Reporting.

Earlier this year, we announced http://gopulsar.io, an open-source project that included Pulsar Pipeline, a real-time analytics platform and stream processing framework. One of the frequently requested features for Pulsar has been integration with a metrics store for visualizing the near-real-time metrics. We’ve provided this feature with this release, which adds the Pulsar Reporting API and the Pulsar Reporting UI Framework under the same license terms. The public GitHub repository is https://github.com/pulsarIO.

What is Pulsar Reporting?

Pulsar Reporting is an extensible data visualization and reporting framework designed to provide real-time insights from Pulsar Pipeline. The framework includes a rich set of charting widgets and a visual reporting editor for users to easily create reports. It has a robust data query engine that can be extended to support many different types of data sources. With the Pulsar Reporting Framework, users can quickly create multi-dimensional and interactive reports that include drill-down and slice-and-dice capabilities.

Features

  • Near-real-time reports – Building reports based on near-real-time data that auto-refreshes at specified intervals
  • Visual reporting editor – Generating reports without writing any code
  • Rich charting widgets – Creating multiple chart types:   line, bar, histogram, pie, stack,  datatable, etc.
  • Reporting API – Querying data with human-friendly SQL or program-friendly structured JSON
  • Dynamic data source management – Adding or removing data sources with no down time
  • Security and permissions – Managing authentication and access control
  • Druid Kafka extension – Ingesting real-time data from Kafka into Druid
  • AngularJS-based hierarchical UI framework – Easily adding and extending reports
  • Bootstrap-based responsive design – Being able to use Pulsar Reporting on different sizes of screens

Why Pulsar Reporting?

The Pulsar Reporting Framework complements Pulsar, an open-source, real-time analytics platform and stream processing framework. Pulsar generates huge amounts of data, and visualization is the best way to provide intuitive and meaningful insights into that data. However, building dashboards and reports for big data from scratch is cumbersome and error-prone. The Pulsar Reporting Framework allows user to create reports easily and quickly without requiring complex data processing and UI logic.

Architecture

The raw events and session events from Pulsar Pipeline flow to Kafka using the Pulsar Kafka channel. The Druid cluster then ingests the raw events as well as the sessions from Kafka topics into two tables, one for sessions and one for events. Both tables are indexed in one-second granularity to enable real-time reporting. The Pulsar Reporting API provides an abstract layer to access the tables. The Reporting UI gets the data from the API to build different charts.

pulsar_reporting_architecture

Sample API requests

    • Get session metrics using the SQL API:
      Endpoint: http://<API_Server>/prapi/v2/sql
      Method: POST
      Body: {"sql" : "SELECT (count(session) - sum(retvisitor)) * 1.0 / count(session) newSessionRate, sum(sessionDuration) * 1000 totalSessionDurations, count(session) sessions, sum(sessionDuration) totalSessions, sum(totalpagect) totalPages, country, trafficSource FROM pulsar_session WHERE site=0 and country='usa' GROUP BY country, trafficSource ORDER BY sum(totalpagect) ASC limit 20",
      "intervals": "2015-10-11 03:00:32/2015-10-18 01:00:32",
      "granularity": "day"}
    • Get page views by traffic source using the structured JSON API:
      Endpoint: http://<API_Server>/prapi/v2/realtime
      Method: POST
      Body: {"metrics" : [ "pageviews" ], "dimensions" : [ "trafficsource" ], "filter" : "site=0" }

What’s next?

We have open-sourced the Pulsar Reporting Framework, and we plan to continue developing the code in the open. We welcome your suggestions and contributions. Here are some of the features we are thinking about.

  • Pathing and funnels
  • Exporting reports
  • Expanding support to additional data sources based on community interest
  • Integrating with Pulsar.js, a client-side Javascript library to generate Pulsar events for the web

Please visit http://gopulsar.io for source code, documentation, and more information.

The Team

Tracking-COE2