May 11, 2022

Using GCP BigQuery Data QnA in Twitter

In a recent blog postthe Twitter Engineering Team shared the architectural details of its in-house Qurious data analytics platform and its benefits for real-time analytics. Designed for business customers, the platform allows users to analyze Twitter’s BigQuery data using natural language queries and build dashboards. The goal of the project is to increase agility when creating actionable insights from streaming data. The team points out that Qurious is a step forward in reducing the cost and time required to generate such reports.

Deployed on GKE containers, the main component of the system is the GCP QnA data service for translating natural language sentences into SQL commands that can be executed in BigQuery (Python Github client link). It also includes GCP Cloud SQL to store peripheral data, Cloud Load Balancerand GCP Cloud Logging Service. The system is augmented with a cache for frequently asked questions and a suggestion module.

BigQuery is a popular data warehouse for OLAP applications provided by Google Cloud Platform. It supports SQL (ANSI:2011) and separates the storage-compute boundary with its serverless nature. Its Data QnA service, currently in private alpha, is based on the To analyse paper. It aims to reduce the barrier to analytical processing by converting natural language statements into related SQL code snippets that can be executed against BigQuery data. The system is also experimented with Google spreadsheets to automatically generate formulas.

Implementing a natural language interface for databases is a long-standing problem in database engineering (link to a related 1995 study, a theoretical framework paper). To design such a system, the trade-off between predictability/reliability and intelligence must be balanced. Analyza’s architecture is not based on machine learning, but the continuous improvement of the system is ensured by the curator and the knowledge base/graph. This allows the system to be predictable and intelligent enough to allow production use. As the use of the product generates data, an opportunity for integrating machine learning models into the system may also be possible in the future.

For more information about BigQuery Data QnA, the following information case study featured in Google Cloud Next can also be viewed.



Source link