Garbage Collection
Artifactory uses checksum-based storage to ensure that each binary file is only stored once.
When a new file is deployed, Artifactory checks if a binary with the same checksum already exists and if so, links the repository path to this binary. Upon deletion of a repository path, Artifactory does not delete the binary since it may be used by other paths. However, once all paths pointing to a binary are deleted, the file is actually no longer being used. To make sure your system does not become clogged with unused binaries, Artifactory periodically runs a "Garbage Collection" to identify unused ("deleted") binaries and dispose of them from the datastore. By default, this is set to run every 4 hours and is controlled by acronexpression.
For example, to run garbage collection every 12 hours you should specify the following expression:
0 0 /12 * * ?
Cron Expression |
Specifies the frequency in which garbage collection should be run automatically |
Next Run Time |
Indicates the next automatic run of garbage collection according to the specifiedCron Expression |
现在运行 |
Manually invokes garbage collection immediately |
Garbage collection frequency
Garbage collection is a resource intensive operation. Running it too frequently may compromise system performance.
From Artifactory 6.12.0, a faster Garbage Collection strategy was introduced and runs automatically when enablingtheTrash Cansettings. The new cleanup strategy fetches and undeploys the trashcan artifacts located under the trashcan repository that are older than the configured trash retention period.
The linked binary is deleted if there is no other artifact referencing the checksum in question. The cleanup also runs on multiple threads (configurable by settingartifactory.gc.numberOfWorkersThreads=3
).
Note
Unreferenced binaries, (including existing unreferenced binaries or artifacts that were manually deleted from the trashcan), will be deleted during the previous Full GC strategy that runs every20 GC iterations (configurable,artifactory.gc.skipFullGcBetweenMinorIterations=20
).
Garbage Collection Improvements
The following improvements have been introduced to the Garbage Collection mechanism:
From Artifactory 6.12.0, a faster Garbage Collection strategy was introduced and runs automatically when enabling theTrash Cansettings. The new cleanup strategy fetches and undeploys the trashcan artifacts located under the trashcan repository that are older than the configured trash retention period.
The linked binary is deleted if there is no other artifact referencing the checksum in question. The cleanup also runs on multiple threads (configurable by settingartifactory.gc.numberOfWorkersThreads=
3).Unreferenced binaries, (including existing unreferenced binaries or artifacts that were manually deleted from the trashcan), will be deleted during the previous Full GC strategy that runs every 20 GC iterations (configurable,
artifactory.gc.skipFullGcBetweenMinorIterations=20
).- From Artifactory 7.31.10, you can improve the Garbage Collection performance, by skipping the need to set the order of the objects by adding the
artifactory.gc.skipOrderByFullGc=true
parameter to theartifactory.system.properties
file.
Storage Quota Limits
Artifactory lets you set a limit on how much of your entire system disk space storage may be used to ensure that your server file system capacity is never used up. This helps to keep your system reliable and available.
Once disk space used for storage reaches the specified limit, any attempt to deploy a binary is rejected by Artifactory with a status code of413 Request Entity Too Largeand a "Datastore disk space is too high" error is displayed at the bottom of theMaintenancescreen.
- When using filesystem storage, the partition checked is the one containing the
$JFROG_HOME/artifactory/var/data/artifactory/filestore
目录中。 - When using database blob storage, the partition checked is the one containing the
$JFROG_HOME/artifactory/var/
data/artifactory/cache
目录中。
- When using the S3 template, the cache-fs will be the checked partition, by default is the
$JFROG_HOME/artifactory/var/artifactory/
data/cache
目录中。
To help you avoid reaching your disk space quota, Artifactory also allows you to specify a warning level. Once the specified percentage of disk space is used, Artifactory will log a warning in the$ JFROG_HOME / artifactory / var / log/artifactory-service.log
file and display a "Datastore disk space is too high" warning at the bottom of theMaintenancescreen.
Enable Quota Control |
When set, Artifactory will monitor disk space usage and issue warnings and errors according to the quotas specified inStorage Space LimitandStorage Space Warning |
Storage Space Limit |
The percentage of available disk space that may be used for storage before Artifactory rejects deployments and issues errors |
Storage Space Warning |
The percentage of available disk space that may be used for storage before Artifactory issues warnings |
Cleanup Unused Cached Artifacts
When configuring a remote repository, theKeep Unused Artifactssetting lets you specify how long a cached unused artifact from that repository should be kept before it is a candidate for cleanup. This setting does not immediately clean up the unused cached artifact, but merely marks it for clean up after the specified number of hours.TheCleanup Unused Cached Artifactssetting specifies when the cleanup operation should run, and only then unused, cached artifacts marked for cleanup are actually removed from the system.
The cleanup frequency is specified with acronexpression.For example, to run cleanup every 12 hours you should specify the following expression:
0 0 /12 * * ?
Cleanup Virtual Repositories
Virtual repositories use an internal cache to store aggregated metadata such as POM files. The Cleanup Virtual Repositories operation deletes cached POM files that are older than 168 hours (one week)
The cleanup frequency is specified with acronexpression.For example, to run cleanup every 12 hours you should specify the following expression:
0 0 /12 * * ?
Storage
Compress the Internal Database |
Derby database only This feature is only relevant when using the internal Derby database A Derby database may typically contain unused allocated space when a large amount of data is deleted from a table or its indices are updated. By default, Derby does not return unused space to the operating system. For example, once a page has been allocated to a table or index, it is not automatically returned to the operating system until the table or index is destroyed. When you invoke this action, Artifactory reclaims unused and allocated space in a table and its indexes thereby compressing the internal database. We recommend running this when Artifactory activity is low, since compression may not be able to complete when storage is busy (in which case the storage will not be affected). |
Prune Unreferenced Data |
Unreferenced binary files may occur due to running with wrong file system permissions on storage folders, or running out of storage space. When you invoke this action, Artifactory removes unreferenced binary files and empty folders present in the filestore or cache folders. Ensure complete shutdown To avoid such errors, we recommend that you always allow Artifactory to shut down completely Note: this action does not refer to the removal of empty directories in the repositories tree - these are automatically removed asynchronously when a folder is found to be empty after the removal of artifacts within it. |