Skip to content

Delete Collections Recursively

Overview

Delete one or multiple collection hierarchies.

Parameters:

Name Type Description Default
collection_names Union[str, List[str]]

One or multiple names.

required
safe_delete Optional[str]

Client name to be used when printing the safe delete commands.

None
also_delete_first_collection bool

Deletes the start collection along with the children collections.

False
force_actual_name bool

Edge Case. If multiple duplicate friendly names and one of the actual names is the name passed in.

False
delete_assets bool

if True, will delete all assets from every collection in the hierarchy.

False
delete_assets_timeout int

If delete_assets is True, this is the timeout for deleting the assets. If None, the default is 30 minutes.

30
api_version Optional[str]

If None, default is "2019-11-01-preview".

None

Returns:

Type Description
None

None. Will print out the collections being deleted.

Source code in purviewautomation/collections.py
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
def delete_collections_recursively(
    self,
    collection_names: Union[str, List[str]],
    safe_delete: Optional[str] = None,
    also_delete_first_collection: bool = False,
    delete_assets: bool = False,
    delete_assets_timeout: int = 30,
    force_actual_name: bool = False,
    api_version: Optional[str] = None,
) -> None:  # TODO need to update this and add force_actual_name
    """Delete one or multiple collection hierarchies.

    Args:
        collection_names: One or multiple names.
        safe_delete: Client name to be used when printing
            the safe delete commands.
        also_delete_first_collection: Deletes the start collection
            along with the children collections.
        force_actual_name: Edge Case. If multiple duplicate
            friendly names and one of the actual names is the
                name passed in.
        delete_assets: if True, will delete all assets from every
            collection in the hierarchy.
        delete_assets_timeout: If delete_assets is True,
            this is the timeout for deleting the assets.
            If None, the default is 30 minutes.
        api_version: If None, default is "2019-11-01-preview".

    Returns:
        None. Will print out the collections being deleted.
    """
    if not api_version:
        api_version = self.collections_api_version

    if not isinstance(collection_names, (str, list)):
        raise ValueError("The collection_names parameter has to either be a string or a list.")
    elif isinstance(collection_names, str):
        collection_names = [collection_names]

    for name in collection_names:
        delete_list = []
        recursive_list = []
        coll_name = self.get_real_collection_name(name, force_actual_name=force_actual_name)
        child_collections_check = self.get_child_collection_names(coll_name)
        if child_collections_check["count"] == 0:
            err_msg = (
                f"The collection '{name}' has no child collections. Can only delete collections that have children. "
                "To delete collections with no children, "
                f"use: delete_collections('{name}')"
            )
            raise ValueError(err_msg)

        self._recursive_append(coll_name, delete_list)
        if delete_list[0] is not None:
            for item in delete_list:
                self._recursive_append(item, recursive_list)
            for item2 in recursive_list:
                if item2 is not None:
                    delete_list.append(item2)
                    self._recursive_append(item2, recursive_list)

        if safe_delete:
            if also_delete_first_collection:
                self._safe_delete_recursivly(delete_list, safe_delete, coll_name, True)
            else:
                self._safe_delete_recursivly(delete_list, safe_delete, coll_name)

        if delete_list[0] is not None:
            if also_delete_first_collection:
                delete_list.insert(0, collection_names[0])
            for coll in delete_list[::-1]:  # starting from the most child collection
                if delete_assets:
                    remove_duplicate_names = []
                    if coll not in remove_duplicate_names:
                        remove_duplicate_names.append(coll)
                    for coll in remove_duplicate_names:
                        self.delete_collection_assets(collection_names=coll, timeout=delete_assets_timeout)
                self.delete_collections([coll])

Important

  • This method only deletes collections that have children (sub collections). To delete collections that have no children, see Delete Collections
  • Collection names are case sensitive. My-Company is different than my-Company.

  • To delete collections that also have assets, add the delete_assets parameter see: Delete Assets Section

Examples

Delete One Collection Hierarchy

Given the below Purview:

Delete Collections Recursively

To delete all of the collections under My-Collection:

client.delete_collections_recursively(collection_names="My-Collection")

The output printed to the screen:

Delete Collections Recursively

Purview after running the code:

Delete Collections Recursively

Delete One Collection Hierarchy Along with the Initial Collection

Given the below Purview:

Delete Collections Recursively

To delete all of the collections under My-Collection and also delete My-Collection as well, pass in True to the also_delete_first_collection parameter:

client.delete_collections_recursively(collection_names="My-Collection",
                                     also_delete_first_collection=True)

Purview after running the code where My-Collection along with the child collections are deleted:

Delete Collections Recursively

Delete Multiple Collection Hierarchies

Given the below Purview:

Delete Collections Recursively

To delete all of the collections under Another Collection Hierarchy and under My-Collection:

collections = ["Another Collection Hierarchy", "My-Collection"]
client.delete_collections_recursively(collection_names=collections)

Purview after running the code:

Delete Collections Recursively

Rollback/Safe Delete

When deleting collections, passing in the safe_delete parameter will output the collection/s that were deleted in order to recreate the collection. Think of this as a rollback option.

Given the below Purview:

Delete Collections Recursively

This will delete all of the collections under My-Collection and output (print to the screen) the exact script to recreate the entire hierarchy. The actual names and friendly names are all the same:

client.delete_collections_recursively(collection_names="My-Collection", 
                                      safe_delete="client")

Purview output:

Delete Collections Recursively

The exact script will also output (print to the screen). Simply copy and then run the code recreate the entire hierarchy or save it in a file to be used later:

Delete Collections Recursively

Run the code as shown above:

client.create_collections(start_collection='wlryvp', collection_names='favguw', safe_delete_friendly_name='Sub Collection 2')
client.create_collections(start_collection='favguw', collection_names='rjzjxl', safe_delete_friendly_name='Sub Collection 3')
client.create_collections(start_collection='rjzjxl', collection_names='bxhfsh', safe_delete_friendly_name='Sub Collection 4')
client.create_collections(start_collection='bxhfsh', collection_names='kahqba', safe_delete_friendly_name='Sub Collection 5')

Purview output after the code runs will recreate the entire hierarchy:

Delete Collections Recursively

Delete Assets

To delete all of the assets in a hierarchy (delete all of the assets in every collection in the hierarchy), use the delete_assets parameter with the optional delete_assets_timeout option:

Important

The Service Principal or user that authenticated/connected to Purview would need to be listed as a Data Curator on the collection in order to delete assets in that collection. For more info, see: Purview Roles

Deleting assets in a collection is irreversible. Re-scan the deleted assets to add them back to the collections.

The code will delete all of the assets and the collection hierarchy. To only delete assets and not delete the collections, see: Delete Collection Assets

The root collection (top level collection) can't be deleted. In the examples above, purview-test-2 is the root collection. To only delete the assets, see: Delete Collection Assets

Info

The timeout when deleting assets is 30 minutes. If there are a large number of assets in the collection, pass an integer to the timeout parameter (in minutes) to increase (or decrease) the time.

For example: client.delete_collections_recursively(collection_names="My Collection", delete_assets=True, delete_assets_timeout=60) will allow the code to run up to one hour (60 minutes).

If assets are deleted faster than one hour (only takes one minute to delete the assets) the code will stop after a minute (or whenever all of the assets are deleted).

The below Purview has assets in multiple collections under My-Collection (Sub Collection 2 has two assets and Sub Collection 3 has three assets):

Delete Collections Recursively Delete Collections Recursively

To delete all of the assets and all of the collections under My-Collections:

client.delete_collections_recursively(collection_names="My-Collections",
                                      delete_assets=True)

The resulting Purview:

Delete Collections Recursively