BigDAWG

Welcome to BigDAWG documentation

Introduction

The Intel Science and Technology Center for Big Data is developing an open-source reference implementation of a Polystore database. The BigDAWG (Big Data Working Group) system supports heterogeneous database engines, multiple programming languages and complex analytics for a variety of workloads.

_images/fig1.png

BigDAWG Architecture

This BigDAWG release contains our initial prototype of a polystore middleware as well as support for 3 database engines: PostgreSQL, SciDB, and Accumulo. The architecture for this release is shown above.

Our goal with this release is to give end-users and database researchers an idea about what a Polystore database looks like. For the most part, we hope that you will download the release, experiment with the data we have distributed and create your own queries. Please do reach out to us if you have some bigger goals in mind or if you run into any issues while using this release - we are happy to help you navigate.

A simple example

Before we get into the details of what BigDAWG is, here is a very simple query example. This query execute a relational island query on a polystore storing MIMIC II data in the BigDAWG language:

curl -X POST -d "bdrel(select * from mimic2v26.d_patients limit 4;)" http://localhost:8080/bigdawg/query/

Output:

subject_id  sex dob dod hospital_expire_flg
1039    M   3063-10-05 00:00:00.0   3147-04-05 00:00:00.0   Y
1010    F   2620-12-07 00:00:00.0   2688-07-30 00:00:00.0   Y
1000    M   2442-05-11 00:00:00.0   2512-03-02 00:00:00.0   Y
1038    M   2747-06-02 00:00:00.0   2807-11-13 00:00:00.0   N

For further details on what islands are, please refer to the Introduction and Overview section or refer to any one of our numerous publications that describe BigDAWG.

Get the code

What you need to get started is in :ref:`getting-started`section.

For (future) reference, the short version is:

The source source is available on GitHub.

Within the Docker toolbox, go into the provisions directory of the above repository and run setup_bigdawg_docker.sh:

./setup_bigdawg_docker.sh

This should start up three databases and middleware. You should now be able to execute a query such as the one above in a seperate window.

Contributing

We hope that you find this area of research as interesting as we do! We look forward to community invovlement. If you are interested in contributing, please let us know, we have many ideas where we could use help.

We have many ideas for new contributors such as adding new engines, islands and improving middleware capabilities. If this sounds interesting, let us know and we can set up a time to chat.

Website: http://bigdawg.mit.edu

The mailing list for the project is located at google groups: http://groups.google.com/group/bigdawg To contact the BigDAWG developers: bigdawg-help@mit.edu