Skip to content

173 - duplicates

ryanlong requested to merge 173-duplicates-new into main

Created by: ri-pandey

Description

Allow duplicate datasets to be ingested in Bioloop, and for authorized users to be able to accept/reject them.

Related Issue(s)

Closes #173

Changes Made

List the main changes made in this PR. Be as specific as possible.

  • Feature added
  • Bug fixed
  • Code refactored
  • Documentation updated

Checklist

Before submitting this PR, please make sure that:

  • Your code passes linting and coding style checks.
  • Documentation has been updated to reflect the changes.
  • You have reviewed your own code and resolved any merge conflicts.
  • You have requested a review from at least one team member.
  • Any relevant issue(s) have been linked to this PR.

Additional Information Documentation of the process - https://github.com/IUSCA/bioloop/blob/173-duplicates-new/docs/dataset_duplication.md.

Summary of features:

  1. Bioloop can now register duplicate datasets for a given dataset.
  2. Multiple duplicate datasets can coexist in the system, with versions being assigned to concurrent duplicates (this feature is disabled at the moment).
  3. Alerts are shown (only to operators and admins) regarding state of a dataset in case it has a duplicate, or is a duplicate. This is done in 3 places:
    1. project dataset modal
      1. project dataset table
      2. dataset page
    2. file browser
  4. Operators and admins are shown notifications to make them aware of the duplication, and buttons to accept/reject datasets in alerts. Users only see alerts.
  5. Both the API and the worker layers are involved in accepting or rejecting a duplicate. For detailed explanation of steps, see the linked documentation.
  6. Checks have been added to the API layer so that certain operations are forbidden for duplicate datasets (like adding them to a project). The API/UI code has also been updated to omit duplicate datasets from the existing UI controls that show datasets (like the Project-dataset search).
  7. After acceptance, the duplicate dataset replaces the original dataset at the database level and filesystem level. The original dataset is (soft-) deleted in the database.
  8. After rejection, the duplicate dataset is (soft-) deleted in the database, and its filesystem resources are purged.
  9. Acceptance/rejection are irreversible operations. Users see a modal to confirm before they accept/reject a dataset.
  10. After a file downloaded is initiated, the dataset is re-fetched (without page refresh), so user can be made aware of the fact that a duplicate of the dataset that they are downloading is currently being integrated by the system.
  11. Duplicate datasets can be seen in the /duplicateDatasets view

Merge request reports

Loading