Skip to content

Delete Collection Assets

Overview

Delete all assets in one or multiple collections.

Parameters:

Name Type Description Default
collection_names Union[str, List[str]]

Collection name or names to delete assets.

required
timeout int

How long in minutes before the code times out. Default is 30 minutes.

30
force_actual_name bool

Edge Case. If multiple duplicate friendly names and one of the actual names is the name passed in.

False
api_version Optional[str]

Catalog API version. If None, default is "2022-03-01-preview".

None

Returns:

Type Description
None

Prints that the collection assets have been deleted.

Source code in purviewautomation/collections.py
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
def delete_collection_assets(
    self,
    collection_names: Union[str, List[str]],
    timeout: int = 30,
    force_actual_name: bool = False,
    api_version: Optional[str] = None,
) -> None:
    """Delete all assets in one or multiple collections.

    Args:
        collection_names: Collection name or names to delete assets.
        timeout: How long in minutes before the code times out.
            Default is 30 minutes.
        force_actual_name: Edge Case. If multiple duplicate friendly names
            and one of the actual names is the name passed in.
        api_version: Catalog API version.
            If None, default is "2022-03-01-preview".

    Returns:
        Prints that the collection assets have been deleted.
    """
    if not api_version:
        api_version = self.catalog_api_version

    if not isinstance(collection_names, (str, list)):
        err = "The collection_names parameter has to be a string or list type."
        raise ValueError(err)
    elif isinstance(collection_names, str):
        collection_names = [collection_names]

    collections = self.list_collections(only_names=True)

    for name in collection_names:
        collection = self.get_real_collection_name(collection_name=name, force_actual_name=force_actual_name)

        future_timeout_time = datetime.now() + timedelta(minutes=timeout)
        final = False
        print(f"Attempting to delete assets in collection: '{collections[collection]['friendlyName']}'")
        print("Note: This could take time if there's a large number of assets in the collection")

        while not final and datetime.now() <= future_timeout_time:
            url = f"{self.catalog_endpoint}/api/search/query?api-version={api_version}"
            # max value is 1000
            data = f'{{"keywords": null, "limit": 1000, "filter": {{"collectionId": "{collection}"}}}}'
            asset_request = requests.post(url=url, data=data, headers=self.header)

            if asset_request.status_code == 403:
                err_msg = (
                    f"The Service Principal or user needs to be listed as a Data Curator on collection '{collections[collection]['friendlyName']}' "
                    "in order to delete assets on that collection."
                )
                raise ValueError(err_msg)

            results = asset_request.json()
            total = len(results["value"])
            if total == 0:
                final = True
                print(
                    f"All assets have been successfully deleted from collection: '{collections[collection]['friendlyName']}'"
                )
                print("\n")
            else:
                guids = [item["id"] for item in results["value"]]
                guid_str = "&guid=".join(guids)
                url = f"{self.catalog_endpoint}/api/atlas/v2/entity/bulk?guid={guid_str}"
                delete_request = requests.delete(url, headers=self.header)

Important

Important

The Service Principal or user that authenticated/connected to Purview would need to be listed as a Data Curator on the collection in order to delete assets in that collection. For more info, see: Purview Roles

The timeout when running delete_collection_assets is 30 minutes. If there are a large number of assets in the collection, pass an integer to the timeout parameter (in minutes) to increase (or decrease) the time.

For example, client.delete_collection_assets(collection_names="My Collection", timeout=60) will allow the code to run up to one hour (60 minutes).

If assets are deleted faster than one hour (only takes one minute to delete the assets) the code will stop after a minute (or whenever all of the assets are deleted).

Examples

Delete All Assets in One Collection

The below Purview has a collection called Collection To Delete that has 3 assets:

Delete Collection Assets

To delete all of the assets in the collection:

client.delete_collection_assets(collection_names="Collection To Delete")

Refresh Purview to see that all of the assets have been deleted:

Delete Collection Assets

Delete All Assets in Multiple Collections

Info

The collections don't have to be in the same hierarchy. They can be located in any hierarchy.

In the below Purview, the collections My-Company and Collection To Delete both have two assets:

Delete Collection Assets

Delete Collection Assets

To delete all the assets in both collections:

collections = ["My-Company", "Collection To Delete"]
client.delete_collection_assets(collection_names=collections)

Output:

Delete Collection Assets Delete Collection Assets

Handling Duplicate Friendly Names

In the event there's multiple duplicate friendly names/edge cases, see: Handeling Multiple Duplicate Friendly Names.