173 - duplicate datasets
Created by: ri-pandey
Description
Allow duplicate datasets to be ingested in Bioloop, and for authorized users to be able to accept/reject them.
Related Issue(s)
Closes #173
Changes Made
List the main changes made in this PR. Be as specific as possible.
-
Feature added -
Bug fixed -
Code refactored -
Documentation updated
Checklist
Before submitting this PR, please make sure that:
-
Your code passes linting and coding style checks. -
Documentation has been updated to reflect the changes. -
You have reviewed your own code and resolved any merge conflicts. -
You have requested a review from at least one team member. -
Any relevant issue(s) have been linked to this PR.
Additional Information Documentation of the process - https://github.com/IUSCA/bioloop/blob/173-duplicates/docs/dataset_duplication.md.
Summary of features:
- Bioloop can now register duplicate datasets for a given dataset.
- Multiple duplicate datasets can coexist in the system, with versions being assigned to concurrent duplicates (this feature is disabled at the moment).
- Alerts are shown regarding state of a dataset in case it has a duplicate, or is a duplicate. This is done in 3 places:
- project dataset modal
- project dataset table
- dataset page
- file browser
- project dataset modal
- Operators and admins are shown notifications to make them aware of the duplication, and buttons to accept/reject datasets in alerts. Users only see alerts.
- Both the API and the worker layers are involved in accepting or rejecting a duplicate. For detailed explanation of steps, see the linked documentation.
- During acceptance/rejection, state locks are placed on the original and/or duplicate datasets before they are accepted/rejected, and the locks are released once the acceptance/rejection steps are successful.
- State locks ensure that
- Workflows can’t be kicked off
- pending workflows will start failing due to API locks
- Worker scripts that try to write to dataset will fail
- Dataset information can’t be written to via API
- Dataset's state can’t be modified via API when acceptance/rejection is in progress
- Acceptance and Rejection can’t be initiated on a dataset at the same time
- UI controls are disabled when a dataset is being accepted/rejected.
- Workflows can’t be kicked off
- Checks have been added to the API layer so that certain operations are forbidden for duplicate datasets (like adding them to a project). The API/UI code has also been updated to omit duplicate datasets from the existing UI controls that show datasets (like the Project-dataset search).
- At the end of the acceptance workflow, the original dataset's filesystem resources are purged, and the duplicate dataset replaces the original dataset at the database level and filesystem level. The original dataset is (soft-) deleted in the database.
- At the end of the rejection workflow, the duplicate dataset is (soft-) deleted in the database, and its filesystem resources are purged.
- Acceptance/rejection are irreversible operations. Users see a modal to confirm before they accept/reject a dataset.
- After a file downloaded is initiated, the dataset is re-fetched (without page refresh), so user can be made aware of the fact that a duplicate of the dataset that they are downloading is currently being integrated by the system.
- Duplicate datasets can be seen in the
/duplicateDatasets
view