How to add primary keys to an already existing table defined with/without primary keys in WSO2 DAS 310

Kasun Siyambalapitiya
3 min readDec 30, 2018

--

Image Source: https://stocksnap.io/photo/vintage-keys-GRRCFFNXRE

This was an issue I faced when deploying a certain Capp in one of our DAS setup. While trying to resolve the above, I was able to find a bit internals on how primary keys works on wso2das-3.1.0 and though it might be useful to post it in case if anyone face a similar situation.

Running spark scripts is the way for performing batch processing analytics when working with WSO2 DAS 3.1.0 [1]. In above approach there may be times that you have defined your tables without proper primary key/keys or you have missed out a field or two (more) that should be within the primary key constraint. According the behaviour of how primary key works in WSO2 DAS 3.1.0 , appending primary keys ( both completely new and left over fields) at a later stage to an existing table could results in duplicate records as explained below.

When redefining tables with relevant primary keys in any of the above mentioned scenarios(adding a completely new primary key or appending new fields for the existing primary key constraint), the introduced primary key constraint doesn’t check already available records in the existing table when inserting new records. This leads to the insertion of duplicate records (duplicates of records which existed prior to defining of primary key constraints) even though relevant primary keys are defined.

In simple terms let’s say the that Table A is having `NIC` and `Name` as its fields and no primary keys are defined. Let’s assume that the following records also exists in the above table and you have used the following query for inserting data into table TableA

TableA
CREATE TEMPORARY TABLE TableA USING CarbonAnalytics OPTIONS (tableName “TableA”, schema “NIC string -i, Name string -i”, mergeSchema “false”);
INSERT INTO TABLE TableA
SELECT NIC, Name
FROM sampleStream;

As per the requirement, the inserting query is redefined as follows with the relevant primary key (NIC ) been appended.

CREATE TEMPORARY TABLE TableA USING CarbonAnalytics OPTIONS (tableName "TableA", schema "NIC string -i, Name string -i", primaryKeys "NIC", mergeSchema "false");
INSERT INTO TABLE TableA
SELECT NIC, Name
FROM sampleStream;

Upon executing above, already existing records (98579324V and 98721232V) again get inserted into table without considering the primary key constraint defined and the newly defined constraint gets effective only on the latter insertions of the same record.

According wso2 analyticsteam the behaviour of wso2das-3.1.0 on primary keys is as follows,

The behaviour of the DAS regarding the primary key is, it contains the primary key of the temporary table as a metadata within the DAS and defining a primary key for the data when inserting it into the database. These primary keys are not defined in the physical database. Because of this behaviour of the DAS, if we changed the schema (adding a primary key) in the middle, it will not consider the already inserted data for the primary key and insert those data again.

So in order to avoid adding duplicates, it required to INSERT OVERWRITE the relevant tables (in this case TableA) first with relevant primary keys defined (new/ altered primary key constraints) as follows,

CREATE TEMPORARY TABLE TableA USING CarbonAnalytics OPTIONS (tableName "TableA", schema "NIC string -i, Name string -i", primaryKeys "NIC", mergeSchema "false");
INSERT OVERWRITE TABLE TableA
SELECT NIC, Name
FROM sampleStream;

Upon performing the above, TableA will be freshly populated with non duplicate data and then replace the INSERT OVERWRITE insert query with INSERT INTOwith the same primary keys defined in order to gain a considerable performance gain as the use of INSERT INTO with relevant primary keys defined is more efficient and faster than using INSERT OVERWRITE which will delete the entire table and repopulate it with given data which affects the overall performance of wso2das-3.1.0 .

Hope this helps :)

--

--

Kasun Siyambalapitiya
Kasun Siyambalapitiya

No responses yet