Skip to content

Add batch size option, replace DevRev Typescript SDK with Axios for uploading and bugfixes #30

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 11, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .devrev/repo.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
deployable: true
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Code owners (if set required to merge PR)
* @radovanjorgic @navneel99 @samod @patricijabrecko @devrev/airdrop
290 changes: 7 additions & 283 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,13 @@

## Release Notes

### v1.2.5

- Add batch size option.
- Replace DevRev Typescript SDK requests with Axios for uploading and downloading artifacts.
- Remove unneccessary postState from default workers.
- Fix bugs related to attachment streaming.

### v1.2.4

- Do not fail the extraction of attachments if streaming of single attachment fails.
Expand Down Expand Up @@ -126,286 +133,3 @@ It provides features such as:
```bash
npm install @devrev/ts-adaas
```

# Usage

ADaaS Snap-ins can import data in both directions: from external sources to DevRev and from DevRev to external sources. Both directions are composed of several phases.

From external source to DevRev:

- External Sync Units Extraction
- Metadata Extraction
- Data Extraction
- Attachments Extraction

From DevRev to external source:

- Data Loading

Each phase comes with unique requirements for processing task, and both timeout and error handling.

The ADaaS library exports processTask to structure the work within each phase, and onTimeout function to handle timeouts.

### ADaaS Snap-in Invocation

Each ADaaS snap-in must handle all the phases of ADaaS extraction. In a Snap-in, you typically define a `run` function that iterates over events and invokes workers per extraction phase.

```typescript
import { AirdropEvent, EventType, spawn } from '@devrev/ts-adaas';

interface DummyExtractorState {
issues: { completed: boolean };
users: { completed: boolean };
attachments: { completed: boolean };
}

const initialState: DummyExtractorState = {
issues: { completed: false },
users: { completed: false },
attachments: { completed: false },
};

function getWorkerPerExtractionPhase(event: AirdropEvent) {
let path;
switch (event.payload.event_type) {
case EventType.ExtractionExternalSyncUnitsStart:
path = __dirname + '/workers/external-sync-units-extraction';
break;
case EventType.ExtractionMetadataStart:
path = __dirname + '/workers/metadata-extraction';
break;
case EventType.ExtractionDataStart:
case EventType.ExtractionDataContinue:
path = __dirname + '/workers/data-extraction';
break;
}
return path;
}

const run = async (events: AirdropEvent[]) => {
for (const event of events) {
const file = getWorkerPerExtractionPhase(event);
await spawn<DummyExtractorState>({
event,
initialState,
workerPath: file,
options: {
isLocalDevelopment: true,
},
});
}
};

export default run;
```

## Extraction

The ADaaS snap-in extraction lifecycle consists of three main phases: External Sync Units Extraction, Metadata Extraction, and Data Extraction. Each phase is defined in a separate file and is responsible for fetching the respective data.

The ADaaS library provides a repository management system to handle artifacts in batches. The `initializeRepos` function initializes the repositories, and the `push` function uploads the artifacts to the repositories. The `postState` function is used to post the state of the extraction task.

State management is crucial for ADaaS Snap-ins to maintain the state of the extraction task. The `postState` function is used to post the state of the extraction task. The state is stored in the adapter and can be retrieved using the `adapter.state` property.

### 1. External Sync Units Extraction

This phase is defined in `external-sync-units-extraction.ts` and is responsible for fetching the external sync units.

```typescript
import {
ExternalSyncUnit,
ExtractorEventType,
processTask,
} from '@devrev/ts-adaas';

const externalSyncUnits: ExternalSyncUnit[] = [
{
id: 'devrev',
name: 'devrev',
description: 'Demo external sync unit',
item_count: 2,
item_type: 'issues',
},
];

processTask({
task: async ({ adapter }) => {
await adapter.emit(ExtractorEventType.ExtractionExternalSyncUnitsDone, {
external_sync_units: externalSyncUnits,
});
},
onTimeout: async ({ adapter }) => {
await adapter.emit(ExtractorEventType.ExtractionExternalSyncUnitsError, {
error: {
message: 'Failed to extract external sync units. Lambda timeout.',
},
});
},
});
```

### 2. Metadata Extraction

This phase is defined in `metadata-extraction.ts` and is responsible for fetching the metadata.

```typescript
import { ExtractorEventType, processTask } from '@devrev/ts-adaas';
import externalDomainMetadata from '../dummy-extractor/external_domain_metadata.json';

const repos = [{ itemType: 'external_domain_metadata' }];

processTask({
task: async ({ adapter }) => {
adapter.initializeRepos(repos);
await adapter
.getRepo('external_domain_metadata')
?.push([externalDomainMetadata]);
await adapter.emit(ExtractorEventType.ExtractionMetadataDone);
},
onTimeout: async ({ adapter }) => {
await adapter.emit(ExtractorEventType.ExtractionMetadataError, {
error: { message: 'Failed to extract metadata. Lambda timeout.' },
});
},
});
```

### 3. Data Extraction

This phase is defined in `data-extraction.ts` and is responsible for fetching the data. In this phase also attachments metadata is extracted.

```typescript
import { EventType, ExtractorEventType, processTask } from '@devrev/ts-adaas';
import { normalizeAttachment, normalizeIssue, normalizeUser } from '../dummy-extractor/data-normalization';

const issues = [
{ id: 'issue-1', created_date: '1999-12-25T01:00:03+01:00', ... },
{ id: 'issue-2', created_date: '1999-12-27T15:31:34+01:00', ... },
];

const users = [
{ id: 'user-1', created_date: '1999-12-25T01:00:03+01:00', ... },
{ id: 'user-2', created_date: '1999-12-27T15:31:34+01:00', ... },
];

const attachments = [
{ url: 'https://app.dev.devrev-eng.ai/favicon.ico', id: 'attachment-1', ... },
{ url: 'https://app.dev.devrev-eng.ai/favicon.ico', id: 'attachment-2', ... },
];

const repos = [
{ itemType: 'issues', normalize: normalizeIssue },
{ itemType: 'users', normalize: normalizeUser },
{ itemType: 'attachments', normalize: normalizeAttachment },
];

processTask({
task: async ({ adapter }) => {
adapter.initializeRepos(repos);

if (adapter.event.payload.event_type === EventType.ExtractionDataStart) {
await adapter.getRepo('issues')?.push(issues);
await adapter.emit(ExtractorEventType.ExtractionDataProgress, { progress: 50 });
} else {
await adapter.getRepo('users')?.push(users);
await adapter.getRepo('attachments')?.push(attachments);
await adapter.emit(ExtractorEventType.ExtractionDataDone, { progress: 100 });
}
},
onTimeout: async ({ adapter }) => {
await adapter.postState();
await adapter.emit(ExtractorEventType.ExtractionDataProgress, { progress: 50 });
},
});
```

### 4. Attachments Streaming

The ADaaS library handles attachments streaming to improve efficiency and reduce complexity for developers. During the extraction phase, developers need only to provide metadata in a specific format for each attachment, and the library manages the streaming process.

The Snap-in should provide attachment metadata following the `NormalizedAttachment` interface:

```typescript
export interface NormalizedAttachment {
url: string;
id: string;
file_name: string;
author_id: string;
parent_id: string;
}
```

## Loading phases

### 1. Loading Data

This phase is defined in `load-data.ts` and is responsible for loading the data to the external system.

Loading is done by providing an ordered list of itemTypes to load and their respective create and update functions.

```typescript
processTask({
task: async ({ adapter }) => {
const { reports, processed_files } = await adapter.loadItemTypes({
itemTypesToLoad: [
{
itemType: 'tickets',
create: createTicket,
update: updateTicket,
},
{
itemType: 'conversations',
create: createConversation,
update: updateConversation,
},
],
});

await adapter.emit(LoaderEventType.DataLoadingDone, {
reports,
processed_files,
});
},
onTimeout: async ({ adapter }) => {
await adapter.emit(LoaderEventType.DataLoadingProgress, {
reports: adapter.reports,
processed_files: adapter.processedFiles,
});
});
```

The loading functions `create` and `update` provide loading to the external system. They provide denormalization of the records to the schema of the external system and provide HTTP calls to the external system. Both loading functions must handle rate limiting for the external system and handle errors.

Functions return an ID and modified date of the record in the external system, or specify rate-liming offset or errors, if the record could not be created or updated.

### 2. Loading Attachments

This phase is defined in `load-attachments.ts` and is responsible for loading the attachments to the external system.

Loading is done by providing the create function to create attachments in the external system.

```typescript
processTask({
task: async ({ adapter }) => {
const { reports, processed_files } = await adapter.loadAttachments({
create,
});

await adapter.emit(LoaderEventType.AttachmentLoadingDone, {
reports,
processed_files,
});
},
onTimeout: async ({ adapter }) => {
await adapter.postState();
await adapter.emit(LoaderEventType.AttachmentLoadingProgress, {
reports: adapter.reports,
processed_files: adapter.processedFiles,
});
},
});
```

The loading function `create` provides loading to the external system, to make API calls to the external system to create the attachments and handle errors and external system's rate limiting.

Functions return an ID and modified date of the record in the external system, specify rate-liming back-off, or log errors, if the attachment could not be created.
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@devrev/ts-adaas",
"version": "1.2.4",
"version": "1.2.5",
"description": "DevRev ADaaS (AirDrop-as-a-Service) Typescript SDK.",
"type": "commonjs",
"main": "./dist/index.js",
Expand Down
16 changes: 6 additions & 10 deletions src/deprecated/uploader/index.ts
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
import { axiosDevRevClient } from '../../http/axios-devrev-client';
import { axiosClient } from '../../http/axios-client';
import { betaSDK, client } from '@devrev/typescript-sdk';
import fs, { promises as fsPromises } from 'fs';
import { createFormData } from '../common/helpers';
Expand Down Expand Up @@ -108,15 +108,11 @@ export class Uploader {
): Promise<any | null> {
const formData = createFormData(preparedArtifact, fetchedObjects);
try {
const response = await axiosDevRevClient.post(
preparedArtifact.url,
formData,
{
headers: {
'Content-Type': 'multipart/form',
},
}
);
const response = await axiosClient.post(preparedArtifact.url, formData, {
headers: {
'Content-Type': 'multipart/form-data',
},
});

return response;
} catch (error) {
Expand Down
34 changes: 0 additions & 34 deletions src/http/axios-devrev-client.ts

This file was deleted.

Loading