Filestore Sharding

Filestore Sharding Overview

让你管理你的分片二进制提供者binaries in a sharded filestore. A sharded filestore is one that is implemented on a number of physical mounts (M), which store binary objects with redundancy (R), where R <= M.

For example, the following diagram represents a sharded filestore where M=3 and R=2. In other words, the filestore consists of 3 physical mounts which store each binary in two copies.

Artifactory’s sharding binary provider presents several benefits.

Unmatched stability and reliability

Thanks to redundant storage of binaries, the system can withstand any mount going down as long as M >= R.

Unlimited scalability

If the underlying storage available approaches depletion, you only need to add another mount; a process that requiresno downtimeof the filestore. Once the mount is up and running, the system regenerates the filestore redundancy according to configuration parameters you control.

For a standalone Artifactory setup
A restart is required to apply the changes to yourbinarystore.xmlsettings.

For a High Availability setup
Restarting each cluster separately to apply the changes to yourbinarystore.xmlsettings will result in no downtime.

Filestore performance optimization

Sharding Binary Provider offers several configuration parameters that allow you to optimize how binaries are read from or written to the filestore according to your specific system’s requirements.

S3 Sharding

Artifactory allows you to shard multiple S3 buckets. For more information, seeS3 Sharding.

Filestore Fundamentals

This page provides you with the information about a specific binary provider. For more information on filestores and the various filestores that you can use, seeConfiguring the Filestore.

JFrog Subscription Levels

SELF-HOSTED

_{ENTERPRISE
ENTERPRISE+}

Page Contents

Sharding Binary Provider

Artifactory offers a Sharding Binary Provider that lets you manage your binaries in a sharded filestore. A sharded filestore is one that is implemented on a number of physical mounts (M), which store binary objects with redundancy (R), where R <= M.
This binary provider is not independent and will always be used as part of a more complex template chain of providers.

type	sharding
lenientLimit	Default: 1 (From version 5.4. Note that for filestores configured with a custom chain and not using the built-in templates the default value of the`lenientLimit`parameter is 0 to maintain consistency with previous versions.) The minimum number of successful writes that must be maintained for an upload to be successful. The next balance cycle (triggered with the GC mechanism) will eventually transfer the binary to enough nodes such that the redundancy commitment is preserved. In other words, Leniency governs the minimal allowed redundancy in cases where the redundancy commitment was not kept temporarily. For example, if`lenientLimit`is set to 3, my setup includes 4 filestores, and 1 of them goes down, writing will continue. If a 2nd filestore goes down, writing will stop. The amount of currently active nodes must always be greater or equal than the configured lenientLimit. If set to 0, the redundancy value has to be kept.
readBehavior	This parameter dictates the strategy for reading binaries from the mounts that make up the sharded filestore. Possible values are: roundRobin (default):Binaries are read from each mount using a round robin strategy.
writeBehavior	This parameter dictates the strategy for writing binaries to the mounts that make up the sharded filestore. Possible values are: roundRobin (default):Binaries are written to each mount using a round robin strategy. freeSpace:Binaries are written to the mount with the greatest absolute volume of free space available. percentageFreeSpace:Binaries are written to the mount with the percentage of free space available.
redundancy	Default: r = 1 The number of copies that should be stored for each binary in the filestore. Note that redundancy must be less than or equal to the number of mounts in your system for Artifactory to work with this configuration.
concurrentStreamWaitTimeout	Default: 30,000 ms To support the specified redundancy, accumulates the write stream in a buffer, and uses “r” threads (according to the specified redundancy) to write to each of the redundant copies of the binary being written. A binary can only be considered written once all redundant threads have completed their write operation. Since all threads are competing for the write stream buffer, each one will complete the write operation at a different time. This parameter specifies the amount of time (ms) that any thread will wait for all the others to complete their write operation. If a write operation fails, you can try increasing the value of this parameter.
concurrentStreamBufferKb	Default: 32 Kb The size of the write buffer used to accumulate the write stream before being replicated for writing to the “r” redundant copies of the binary. If a write operation fails, you can try increasing the value of this parameter.
maxBalancingRunTime	Default: 3,600,000 ms (1 hour) Once a failed mount has been restored, this parameter specifies how long each balancing session may run before it lapses until the next Garbage Collection has completed. For more details about balancing, seeUsing Balancing to Recover from Mount Failure. To restore your system to full redundancy more quickly after a mount failure, you may increase the value of this parameter. If you find this causes an unacceptable degradation of overall system performance, you can consider decreasing the value of this parameter, but this means that the overall time taken for Artifactory to restore full redundancy will be longer.
freeSpaceSampleInterval	Default: 3,600,000 ms (1 hour) To implement its write behavior, Artifactory needs to periodically query the mounts in the sharded filestore to check for free space. Since this check may be a resource intensive operation, you may use this parameter to control the time interval between free space checks. If you anticipate a period of intensive upload of large volumes of binaries, you can consider decreasing the value of this parameter in order to reduce the transient imbalance between mounts in your system.
minSpareUploaderExecutor	Default: 2 Artifactory维护执行的线程池writes to each redundant unit of storage. Depending on the intensity of write activity, eventually, some of the threads may become idle and are then candidates for being killed. However, Artifactory does need to maintain some threads alive for when write activities begin again. This parameter specifies the minimum number of threads that should be kept alive to supply redundant storage units.
uploaderCleanupIdleTime	Default: 120,000 ms (2 min) 最大时间的线程可能仍然闲置before becoming candidates for being killed.

Double Shards Binary Provider

Thedouble-shardstemplate is used for pure sharding configuration that uses 2 physical mounts with 1 copy (which means each artifact is saved only once).

double-shards template configuration

If you choose to use the double-shards template, yourbinarystore.xmlconfiguration file should look like this:

What's in the template?

While you don't need to configure anything else in yourbinarystore.xml, this is what thedouble-shardstemplate looks like under the hood.
For details about thecache-fsprovider, seeCached Filesystem Binary Provider.
For details about theshardingprovider, seeSharding Binary Provider.
For details about thestate-awaresub-provider, seeState-Aware Binary Provider.

    1   1

Redundant Shards Binary Provider

Theredundant-shardstemplate is used for pure sharding configuration that uses 2 physical mounts with 2 copies (which means each shard stores a copy of each artifact). To learn more about the different sharding capabilities, refer toFilestore Sharding.

redundant-shards template configuration

If you choose to use theredundant-shardstemplate, yourbinarystore.xmlconfiguration file should look like this:

What's in the template?

While you don't need to configure anything else in yourbinarystore.xml, this is what theredundant-shardstemplate looks like under the hood.
Details about thecache-fsprovider can be found in theCached Filesystem Binary Providersection.
Details about theshardingprovider can be found in theSharding Binary Providersection.
Details about thestate-awaresub-provider can be found in theState-Aware Binary Providersection.

    2   1

State-Aware Binary Provider

This binary provider is not independent and will always be used in theshardingorsharding-clusterproviders. The provider is aware if its underlying disk is functioning or not. It is identical to the basicfilesystemprovider, however, it can also recover from errors (the parent provider is responsible for recovery) with the addition of thecheckPeriodfield.

type	state-aware
checkPeriod	Default: 15000 ms The minimum time to wait between trying to re-activate the provider if it had fatal errors at any point.
writeEnabled	Default: true From Artifactory 6.18 and later, enables/disablesthe write operations for the binary provider.If set to false, the state-aware provider will continue to serve read requests, so Artifactory can continue to read binaries from that provider. In addition, the garbage collection can continue to clean the deleted binaries from the provider. (Only applicable under a sharding provider.)
zone	The name of the sharding zone the provider is part of (only applicable under a sharding provider).
filestoredir	Custom file store directory. You can provide a custom directory for the file store so that the artifacts are stored in a directory of your choice.

Sharding-Cluster Binary Provider

The sharding-cluster binary provider can be used together with other binary providers for both local or cloud-native storage.

It adds a crossNetworkStrategy parameter to be used as read and write behaviors for validation of the redundancy values and the balance mechanism. It must include aRemote Binary Providerin its dynamic-provider setting to allow synchronizing providers across the cluster.

The Sharding-Cluster provider listens to cluster topology events and creates or removes dynamic providers based on the current state of nodes in the cluster.

type	sharding-cluster
zones	The zones defined in the sharding mechanism. Read/write strategies take providers based on zones.
lenientLimit	Default: 1 Theminimumnumber of successful writes thatmust be maintained for an upload to be successful. The next balance cycle (triggered with the GC mechanism) will eventually transfer the binary to enough nodes such that the redundancy commitment is preserved. In other words, Leniency governs the minimal allowed redundancy in cases where the redundancy commitment was not kept temporarily. For example, if lenientLimit is set to 3, my setup includes 4 filestores, and 1 of them goes down, writing will continue. If a 2nd filestore goes down, writing will stop. The amount of currently active nodes must always be greater or equal than the configured lenientLimit. If set to 0, the redundancy value has to be kept.
dynamic-provider	The type of provider that can be added and removed dynamically based on cluster topology changes. Currently only theRemote Binary Provideris supported as a dynamic provider.

Sharding-Cluster Binary Provider Example

           crossNetworkStrategy crossNetworkStrategy 2 1   filestore1   15000 5000 15000 200 2 remote

Configuring a Sharding Binary Provider

A sharding binary provider is a binary provider as described inConfiguring the Filestore.

Basic Sharding Configuration

Basic sharding configuration is used to configure a sharding binary provider for Artifactory instance.

The following parameters are available for a basic sharding configuration:

readBehavior	This parameter dictates the strategy for reading binaries from the mounts that make up the sharded filestore. Possible values are: roundRobin (default):Binaries are read from each mount using a round robin strategy.
writeBehavior	This parameter dictates the strategy for writing binaries to the mounts that make up the sharded filestore. Possible values are: roundRobin (default):Binaries are written to each mount using a round robin strategy. freeSpace:Binaries are written to the mount with the greatest absolute volume of free space available. percentageFreeSpace:Binaries are written to the mount with the percentage of free space available.
redundancy	Default: r=1 The number of copies that should be stored for each binary in the filestore. Note that redundancy must be less than or equal to the number of mounts in your system for Artifactory to work with this configuration.
lenientLimit	Default: 1 (From version 5.4. Note that for filestores configured with a custom chain and not using the built-in templates the default value of the`lenientLimit`parameter is 0 to maintain consistency with previous versions.) Theminimumnumber of successful writes thatmust be maintained for an upload to be successful. The next balance cycle (triggered with the GC mechanism) will eventually transfer the binary to enough nodes such that the redundancy commitment is preserved. In other words, Leniency governs the minimal allowed redundancy in cases where the redundancy commitment was not kept temporarily. For example, if`lenientLimit`is set to 3, my setup includes 4 filestores, and 1 of them goes down, writing will continue. If a 2nd filestore goes down, writing will stop. The amount of currently active nodes must always be greater or equal than the configured lenientLimit. If set to 0, the redundancy value has to be kept.
concurrentStreamWaitTimeout	Default: 30,000 ms To support the specified redundancy, accumulates the write stream in a buffer, and uses “r” threads (according to the specified redundancy) to write to each of the redundant copies of the binary being written. A binary can only be considered written once all redundant threads have completed their write operation. Since all threads are competing for the write stream buffer, each one will complete the write operation at a different time. This parameter specifies the amount of time (ms) that any thread will wait for all the others to complete their write operation. If a write operation fails, you can try increasing the value of this parameter.
concurrentStreamBufferKb	Default: 32 Kb The size of the write buffer used to accumulate the write stream before being replicated for writing to the “r” redundant copies of the binary. If a write operation fails, you can try increasing the value of this parameter.
maxBalancingRunTime	Default: 3,600,000 ms (1 hour) Once a failed mount has been restored, this parameter specifies how long each balancing session may run before it lapses until the next Garbage Collection has completed. For more details about balancing, please refer toUsing Balancing to Recover from Mount Failure. To restore your system to full redundancy more quickly after a mount failure, you may increase the value of this parameter. If you find this causes an unacceptable degradation of overall system performance, you can consider decreasing the value of this parameter, but this means that the overall time taken for Artifactory to restore full redundancy will be longer.
freeSpaceSampleInterval	Default: 3,600,000 ms (1 hour) To implement its write behavior, Artifactory needs to periodically query the mounts in the sharded filestore to check for free space. Since this check may be a resource intensive operation, you may use this parameter to control the time interval between free space checks. If you anticipate a period of intensive upload of large volumes of binaries, you can consider decreasing the value of this parameter in order to reduce the transient imbalance between mounts in your system.
minSpareUploaderExecutor	Default: 2 Artifactory维护执行的线程池writes to each redundant unit of storage. Depending on the intensity of write activity, eventually, some of the threads may become idle and are then candidates for being killed. However, Artifactory does need to maintain some threads alive for when write activities begin again. This parameter specifies the minimum number of threads that should be kept alive to supply redundant storage units.
uploaderCleanupIdleTime	Default: 120,000 ms (2 min) 最大时间的线程可能仍然闲置before becoming candidates for being killed.

Basic Sharding Example 1

The code snippet below is a sample configuration for the following setup:

A cached sharding binary provider with three mounts and redundancy of 2.
Each mount "X" writes to a directory called /filestoreX.
The read strategy for the provider isroundRobin.
The write strategy for the provider ispercentageFreeSpace.

             // Specify the read and write strategy and redundancy for the sharding binary provider  roundRobin percentageFreeSpace 2  //For each sub-provider (mount), specify the filestore location  filestore1   filestore2   filestore3

Basic Sharding Example 2

The following code snippet shows the "double-shards" template which can be used as is for your binary store configuration.

   shard-fs-1   shard-fs-2

The double-shards template uses a cached provider with two mounts and a redundancy of 1, i.e. only one copy of each artifact is stored.

To modify the parameters of the template, you can change the values of the elements in the template definition. For example, to increase redundancy of the configuration to 2, you only need to modify thetag as shown below.

Cross-Zone Sharding Configuration

Sharding across multiple zones in an HA Artifactory cluster allows you to create zones or regions of sharded data to provide additional redundancy in case one of your zones becomes unavailable. You can determine the order in which the data is written between the zones and you can set the method for establishing the free space when writing to the mounts in the neighboring zones.

The following parameters are available for a cross-zone sharding configuration in thebinarystore.xmlfile:

readBehavior

This parameter dictates the strategy for reading binaries from the mounts that make up the cross-zone sharded filestore.
Possible value is:

zone:BInaries are read from each mount according to zone settings.

writeBehavior

This parameter dictates the strategy for writing binaries to cross-zone sharding mounts:

Possible values are:

zonePercentageFreeSpace:Binaries are written to the mount in the relevant zone with the highest percentage of free space available.

zoneFreeSpace:Binaries are written to the mount in the zone with the greatest absolute volume of free space available.

Add to the Artifactory System YAML file

The following parameters are available for a cross-zone sharding configuration in theArtifactory System YAMLfile:

shared.node.id	Unique descriptive name of this server. Uniqueness Make sure that each node has an id that is unique on your whole network.
shared.node.crossZoneOrder	Sets the zone order in which the data is written to the mounts. In the following example, crossZoneOrder: "us-east1,us-east2",分片写入US-EAST-1区和then to the US-EAST-2 zone.

You can dynamically add nodes to an existing sharding cluster using theArtifactory System YAMLfile. To do so, you need your cluster to already be configured with sharding, and by adding the crossZoneOrder: us-east-1,us-east-2property, the new node can write to the existing cluster nodes without changing the binarystore.xml file.

Example:

This example displays a cross-zone sharding scenario in which the Artifactory cluster is configured with a redundancy of 2 and includes the following steps:

The developer first deploys the package to the closest Artifactory node.
The package is then automatically deployed to the 'US-EAST-1" zone to the shard with the highest percentage of free space in the "S1" shard (with 51% free space).
The package is deployed using the same method to the "S3" shard, that also has the highest percentage of free space in the 'US-EAST-2' zone.

The code snippet below is a sample configuration of our cross-zone setup:
- 1 Artifactory cluster across 2 zones: "us-east-1" and "us-east-2" in this order.
- 4 HA nodes, 2 nodes in each zone.
- 4 mounts (shards), 2 mounts in each zone.
- The write strategy for the provider iszonePercentageFreeSpace.
Example: Cross-zone sharding configuration inArtifactory System YAML
```
node: id: "west-node-1" crossZoneOrder: "us-east-1,us-east-2"
```
Example: Cross-zone sharding configuration in the binarystore.xml
```
          2 zone zonePercentageFreeSpace   mount1 us-east-1   mount2 us-east-1   mount3 us-east-2   mount4 us-east-2  
```

Configuring Sharding for HA Cluster

高可用性集群,Artifactory报价s templates that supportsharding-clusterfor File-System, S3 and Google Storage.

When configuring your filestore on an HA cluster, you need to place thebinarystore.xmlunder$JFROG_HOME/artifactory/var/etc/artifactoryin the primary node and it will be synced to the other members in the cluster.

File System Cluster Binary Provider

When using thecluster-file-systemtemplate,each node has its own local filestore (just like in thefile-system binary provider) and is connected to all other cluster nodes via dynamically allocatedRemote Binary Providersusing theSharding-Cluster Binary Provider.

Cluster-File-System Template Configuration

If you choose to use thecluster-file-systemtemplate, yourbinarystore.xmlconfiguration file should look like this:

What's in the template?

While you don't need to configure anything else in yourbinarystore.xml, this is what the cluster-file-system template looks like under the hood.
Details about thecache-fsprovider can be found in theCached Filesystem Binary Providersection.
Details about thesharding-clustercan be found in theSharding-Cluster Binary Providersection.
Details about thestate-awaresub-provider can be found in theState-Aware Binary Providersection.

<配置version = " 2 " > <链> < !——模板= "集群-file-system"-->         local    remote   crossNetworkStrategy crossNetworkStrategy 2

Cluster-File-System Example

The following example shows a file system cluster binary configuration with a custom file store directory.

<配置version = " 2 " > <链> < !——模板= "集群-file-system"-->         /opt/jfrog/artifactory/var/data/artifactory/cache 500000000   local /opt/jfrog/artifactory/var/data/artifactory/filestore    remote   crossNetworkStrategy crossNetworkStrategy 2 1

S3 Sharding

Artifactory allows you to shard multiple S3 buckets. For more information, see

Using Balancing to Recover from Mount Failure

In case of a mount failure, the actual redundancy in your system will be reduced accordingly. In the meantime, binaries continue to be written to the remaining active mounts. Once the malfunctioning mount has been restored, the system needs to rebalance the binaries written to the remaining active mounts to fully restore (i.e. balance) the redundancy configured in the system. Depending on how long the failed mount was inactive, this may involve a significant volume of binaries that now need to be written to the restored mount, which may take significant amount of time. Since restoring the full redundancy is a resource intensive operation, the balancing operation is run in a series of distinct sessions until complete. These are automatically invoked after aGarbage Collectionprocess has been run in the system.

Restoring Balance in Unbalanced Redundant Storage Units

In the case of voluntary actions that cause an imbalance the system redundancy, such as when doing a filestore migration, you may manually invoke rebalancing of redundancy using theOptimize System StorageREST API endpoint. Applying this endpoint raises a flag for Artifactory to run rebalancing following the next Garbage Collection. Note that, to expedite rebalancing, you can invoke garbage collection manually from the Artifactory UI.

Optimizing System Storage

After deployment, duplications of files bigger than the redundancy are expected and the garbage collector removes them after the optimization run. Oneach deployoperation to a shard, Artifactory checks if the checksum exists more than once. If the checksum exists more than once and the count of such repetitions is bigger than the number of redundancy, Artifactory triggers the optimization flag. When the next full garbage collection runs, the duplicates are removed. By default, the full garbage collection runs after 20 iterations and each iteration runs after four hours resulting in six iterations per day. For more information on garbage collection, seeGarbage Collection.

Artifactory REST API provides an endpoint that allows you to raise a flag to indicate that Artifactory should invoke balancing between redundant storage units of a sharded filestore after the next garbage collection. Therefore, you can set the flag to optimization flag manually.For information on the API, seeOptimize System Storage.

Content

Space Tools

Filestore Sharding