Detect when a duplicate dataset is added to samba
Created by: charlesbrandt
Currently, when a dataset is added to the shared samba storage, if it has the same name as an existing dataset in the bioloop system, it will stay on the drive until an operator cleans it up manually.
Manual cleanup is not happening. With multiple operators adding data to the shared location, they may not know what is a duplicate vs an active job. Ideally they would check for the duplicate via the web UI before adding it to the shared location, but that is not happening either.
If the dataset on samba is the same as the dataset in bioloop, it would be safe to remove the dataset on samba. compare with what we have in the database if it's the same, go ahead and delete the file (it has been archived)
if it is different, there are a few different options
generate a message for operators:
put it on the dashboard
get confirmation from the operator
move the conflicting dataset to another directory on samba
(could be /duplicates/raw_data and /duplicate/data_products for example)
It may be viable to implement #169 as part of this ticket.
FYI @rperigo
, @deepakduggirala
@ryanlong89