Date: Thu, 2 Mar 2023 09:48:30 +0000 (UTC) Message-ID: <385885160.28645.1677750510937@confluence1e.prod-use1.jfrog.local> Subject: Exported From Confluence MIME-Version: 1.0 Content-Type: multipart/related; boundary="----=_Part_28644_1349980247.1677750510936" ------=_Part_28644_1349980247.1677750510936 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Content-Location: file:///C:/exported.html Filestore Sharding

Filestore Sharding

=20
=20
=20
=20

Filestore Sharding O= verview

分片本ary Provider that lets yo= u manage your binaries in a sharded filestore. A sharded filestore is one t= hat is implemented on a number of physical mounts (M), which store binary o= bjects with redundancy (R), where R <=3D M.

For example, the following diagram re= presents a sharded filestore where M=3D3 and R=3D2. In other words, the fil= estore consists of 3 physical mounts which store each binary in two copies.=

Artifactory=E2=80=99s sharding binary= provider presents several benefits.

无与伦比的稳定性和relia =性

Thanks to redundant storage of binari= es, the system can withstand any mount going down as long as M >=3D R.

Unlimited scalability<= /strong>

If the underlying storage available a= pproaches depletion, you only need to add another mount; a process that req= uiresno downtimeof the filestore. Once the mount is up and runni= ng, the system regenerates the filestore redundancy according to configurat= ion parameters you control.

For a standalone Artifactory setup
A restart is requ= ired to apply the changes to yourbinarystore.xmlsettings= .

For a High Availability setup
Restarting each cluste= r separately to apply the changes to yourbinarystore.xml= ;settings will result in no downtime.

Filestore performance optimiz= ation

分片本ary Provider offers sever= al configuration parameters that allow you to optimize how binaries are rea= d from or written to the filestore according to your specific system=E2=80= =99s requirements.

S3 Sharding

Artifactory allows you to shard multiple S3 buckets. For more informatio= n, seeS3 Sharding.

Filestore Fundamentals

This page provides you with the information about a specific binary prov= ider. For more information on filestores and the various filestores that yo= u can use, seeConfiguring the Filestore.

=20
=20=20
=20
=20
=20
=20




分片本ary Provide= r

Artifactory offers a Sharding Binary = Provider that lets you manage your binaries in a sharded filestore. A shard= ed filestore is one that is implemented on a number of physical mounts (M),= which store binary objects with redundancy (R), where R <=3D M.=
This binary provider is not independent and will always be used as part= of a more complex template chain of providers.

=20=20=20
= type
sharding
lenientLimit

Default: 1 (From version 5.4. Note that for filestores configured with a= custom chain and not using the built-in templates the default value of the=lenientLimit 参数是0到马int= ain consistency with previous versions.)

The minimum number of successful writes that must be = maintained for an upload to be successful. The next balance cycle (triggere= d with the GC mechanism) will eventually transfer the binary to enough node= s such that the redundancy commitment is preserved.
In other words, Leni= ency governs the minimal allowed redundancy in cases where the redundancy c= ommitment was not kept temporarily.

For e= xample, iflenientLimitis set to 3, my setup includ= es 4 filestores, and 1 of them goes down, writing will continue. If a 2nd f= ilestore goes down, writing will stop.

Th= e amount of currently active nodes must always be greater or equal than the= configured lenientLimit. If set to 0, the redundancy value has to be kept.=

readBe= havior

This parameter dictates the str= ategy for reading binaries from the mounts that make up the sharded filesto= re.

Possible values are:

roundRobin (default):= Binaries are read from each mount using a round robin strategy.

write= Behavior

This parameter dictates the strategy for writing binaries to the mounts= that make up the sharded filestore. Possible values are:

roundRobin (default): Binaries are written to each mount u= sing a round robin strategy.

freeSpace:= Binaries are written to the mount with the greatest absolute volume of free= space available.

= percentageFreeSpace:B= inaries are written to the mount with the percentage of free space availabl= e.

redundan= cy
Default: r =3D 1

The number of c= opies that should be stored for each binary in the filestore. Note that red= undancy must be less than or equal to the number of mounts in your system f= or Artifactory to work with this configuration.

concurrentStreamWaitTimeout

Default: 30,000 ms

To support the specified redundancy, accumulates the write stream in a b= uffer, and uses =E2=80=9Cr=E2=80=9D threads (according to the specified red= undancy) to write to each of the redundant copies of the binary being writt= en. A binary can only be considered written once all redundant threads have= completed their write operation. Since all threads are competing for the w= rite stream buffer, each one will complete the write operation at a differe= nt time. This parameter specifies the amount of time (ms) that any thread w= ill wait for all the others to complete their write operation.

If a write operation fails, you can try increasing the value of this par= ameter.

concurrentStreamBufferKb

Default: 32 Kb
The size of the write buffer used to accumulate the wr= ite stream before being replicated for writing to the =E2=80=9Cr=E2=80=9D r= edundant copies of the binary.

If a write operation fails, you can try increasing the value of this par= ameter.

Default: 3,600,000 ms (1 hour)
Once a failed mount has been restored,= this parameter specifies how long each balancing session may run before it= lapses until the next Garbage Collection has completed. For more details a= bout balancing, seeUsing Balancing to Recover from Mount Failure.

To restore your system to full redundanc= y more quickly after a mount failure, you may increase the value of this pa= rameter. If you find this causes an unacceptable degradation of overall sys= tem performance, you can consider decreasing the value of this parameter, b= ut this means that the overall time taken for Artifactory to restore full r= edundancy will be longer.
freeSpaceSampleInterval

Default: 3,600,000 ms (1 hour)

To implement its write behavior, Artifactory needs to periodically query= the mounts in the sharded filestore to check for free space. Since this ch= eck may be a resource intensive operation, you may use this parameter to co= ntrol the time interval between free space checks.

If you anticipate a period of intensive = upload of large volumes of binaries, you can consider decreasing the value = of this parameter in order to reduce the transient imbalance between mounts= in your system.
minSpareUploaderExecutor

Default: 2

Artifactory = maintains a pool of threads to execute writes to each redundant unit of sto= rage. Depending on the intensity of write activity, eventually, some of the= threads may become idle and are then candidates for being killed. However,= Artifactory does need to maintain some threads alive for when write activi= ties begin again. This parameter specifies the minimum number of threads th= at should be kept alive to supply redundant storage units.

uploaderCleanupIdleTime

Default: 120,000 ms (2 min)

The maximum period of time threads may remain idle before becoming can= didates for being killed.


Double Shards Binar= y Provider

Thedouble-shardstemplate is used for pure= sharding configuration that uses 2 physical mounts with 1 copy (which mean= s each artifact is saved only once).

double-shar= ds template configuration

If you choose to use the double-shards template, your binarystore.xmlconfiguration file should look like this:

=20
  
=20

What's in the template?&n= bsp;

While you don't need to configure anything else in yourbinarys= tore.xml, this is what thedouble-shardstemplate l= ooks like under the hood.
For details about thecache-fs&= nbsp;provider, see缓存文件system Binary Provider.
For det= ails about theshardingprovider, see分片本ary Provider.
For de= tails about thestate-awaresub-provider, see结合二进制箴言=考虑.

=20
  =09  1   1   
=20

Redundant Shards= Binary Provider

Theredundant-shardstemplate is used for pure shardi= ng configuration that uses 2 physical mounts with 2 copies (which means eac= h shard stores a copy of each artifact). To learn more about the diffe= rent sharding capabilities, refer toFilestore Sharding.

redundan= t-shards template configuration

If you choose to use theredundant-shardstemplate, y= ourbinarystore.xmlconfiguration file sh= ould look like this:

=20
  
=20

What's in the template?=

While you don't need to configure anything else in yourbinarys= tore.xml, this is what theredundant-shardstemplat= e looks like under the hood.
Details about thecache-fs&n= bsp;provider can be found in the缓存文件system Binary Providersection.
Details about theshardingprovider = can be found in the
分片本ary Providersection.
Details about the state-awaresub-provider can be found in theState-Aware Binary Provider= section.

=20
  =09  2   1   
=20

结合二进制P =提供

This binary provider is not independent and will always be used in= theshardingorsharding-cluster= ;providers. The provider is aware if its underlying disk is functioning or = not. It is identical to the basicfilesystemprovide= r, however, it can also recover from errors (the parent provider is respons= ible for recovery) with the addition of thecheckPeriodfield.


=20=20=20
type
state-aware
checkPe= riod

Default: 15000 ms

The min= imum time to wait between trying to re-activate the provider if it had fata= l errors at any point.

writeEnabled



Default: true

From Artifactory 6.18 and later, enables/disable= sthe write operations for the binary provider.If set to false, the state-aware provider will = continue to serve read requests, so Artifactory can continue to read binari= es from that provider. In addition, the garbage collection can continue to = clean the deleted binaries from the provider. (Only applicable under a= sharding provider.)

= zone
The name of the sharding zone the = provider is part of (only applicable under a sharding provider).
filestoredir

Custom file store directory.

You can provide a custom directory for the file store so that the artif= acts are stored in a directory of your choice.


Sharding-Cluste= r Binary Provider

The sharding-cluster binary provid= er can be used together with other binary providers for both local or cloud= -native storage.

It adds a crossNetworkStrategy parameter to be used as read and wri= te behaviors for validation of the redundancy values and the balance mechan= ism. It must include aRemote Binary Providerin its dynamic-provider setti= ng to allow synchronizing providers across the cluster.

The Sharding-Cluster provider listens to cluster topology events and cre= ates or removes dynamic providers based on the current state of nodes in th= e cluster.

type
shardi= ng-cluster
zones

The= zones defined in the sharding mechanism. Read/write strategies take provid= ers based on zones.

lenientLimit

Default: 1

Theminimumnumber of successful writes that = ;must be maintained for an upload to be successful. The next balance= cycle (triggered with the GC mechanism) will eventually transfer the binar= y to enough nodes such that the redundancy commitment is preserved.
In o= ther words, Leniency governs the minimal allowed redundancy in cases where = the redundancy commitment was not kept temporarily.

For example, if lenientLimit is set to 3, my setup inc= ludes 4 filestores, and 1 of them goes down, writing will continue. If a 2n= d filestore goes down, writing will stop.

The amount of currently active nodes must always be greater or equal than = the configured lenientLimit. If set to 0, the redundancy value has to be ke= pt.

dynamic-provider
The ty= pe of provider that can be added and removed dynamically based on cluster t= opology changes. Currently only theRemote Binary Provider所在= sp;is supported as a dynamic provider.

Sharding= -Cluster Binary Provider Example

=20
<相依= =搞笑版本3 d“v1”> <链> <提供者id = 3 d”cache-fs= " type=3D"cache-fs"> <= ;provider id=3D"sharding-cluster" type=3D"sharding-cluster"> &nb= sp;  &nb= sp;  &nb= sp;  <= ;/provider>    crossNe= tworkStrategy crossN= etworkStrategy 2 1   filestore1<= ;/fileStoreDir>   15000<= ;/checkPeriod> 50= 00 15000&= lt;/socketTimeout> 200&l= t;/maxConnections> 2<= ;/connectionRetry> remote  
=20

Configuring= a Sharding Binary Provider

A sharding binary provider is a binar= y provider as described inConfiguring the Filestore.

Basic Sharding Conf= iguration

Basic sharding configuration is = used to configure a sharding binary provider for Artifactory instance. = ;

The following parameters are available for a basic sharding configuratio= n:

read= Behavior

This parameter dictates the str= ategy for reading binaries from the mounts that make up the sharded filesto= re.

Possible values are:

roundRobin (default):= Binaries are read from each mount using a round robin strategy.

wri= teBehavior

This parameter dictates the strategy for writing binaries to the mounts= that make up the sharded filestore. Possible values are:

roundRobin (default): Binaries are written to each mount using a round robin strategy.<= /span>

freeSpace:Binaries are written to the mount with the greatest absolut= e volume of free space available.

percentageFr= eeSpace:Binaries are wr= itten to the mount with the percentage of free space available.

<= /td>
redund= ancy
Default: r=3D1

The number of cop= ies that should be stored for each binary in the filestore. Note that redun= dancy must be less than or equal to the number of mounts in your system for= Artifactory to work with this configuration.

lenientLimit

Default: 1 (From version 5.4. Note that for filestores configured with a= custom chain and not using the built-in templates the default value of the=lenientLimit 参数是0到马intai= n consistency with previous versions.)

Theminimumnumber of succe= ssful writes thatmust be maintained for an upload to = be successful. The next balance cycle (triggered with the GC mechanism)= will eventually transfer the binary to enough nodes such that the redundan= cy commitment is preserved.
换句话说,宽大处理控制最小=紧密相联的wed redundancy in cases where the redundancy commitment was not kept t= emporarily.

For example, if lenientLimitis set to 3, my setup incl= udes 4 filestores, and 1 of them goes down, writing will continue. If a 2nd= filestore goes down, writing will stop.

= The amount of currently active nodes must always be greater or equal than t= he configured lenientLimit. If set to 0, the redundancy value has to be kep= t.

concurrentStreamWaitTimeout

Default: 30,000 ms

To support the specified redundancy, accumulates the write stream in a b= uffer, and uses =E2=80=9Cr=E2=80=9D threads (according to the specified red= undancy) to write to each of the redundant copies of the binary being writt= en. A binary can only be considered written once all redundant threads have= completed their write operation. Since all threads are competing for the w= rite stream buffer, each one will complete the write operation at a differe= nt time. This parameter specifies the amount of time (ms) that any thread w= ill wait for all the others to complete their write operation.

If a write operation fails, you can try increasing the value of this par= ameter.

concurrentStreamBufferKb

Default: 32 Kb
The size of the write buffer used to accumulate the wr= ite stream before being replicated for writing to the =E2=80=9Cr=E2=80=9D r= edundant copies of the binary.

If a write operation fails, you can try increasing the value of this par= ameter.

maxBalancingRunTime
freeSpaceSampleInterval

Default: 3,600,000 ms (1 hour)

To implement its write behavior, Artifactory needs to periodically query= the mounts in the sharded filestore to check for free space. Since this ch= eck may be a resource intensive operation, you may use this parameter to co= ntrol the time interval between free space checks.

If you anticipate a period of intensive = upload of large volumes of binaries, you can consider decreasing the value = of this parameter in order to reduce the transient imbalance between mounts= in your system.
minSpareUploaderExecutor

Default: 2

Artifactory = maintains a pool of threads to execute writes to each redundant unit of sto= rage. Depending on the intensity of write activity, eventually, some of the= threads may become idle and are then candidates for being killed. However,= Artifactory does need to maintain some threads alive for when write activi= ties begin again. This parameter specifies the minimum number of threads th= at should be kept alive to supply redundant storage units.

uploaderCleanupIdleTime

Default: 120,000 ms (2 min)

The maximum period of time threads may remain idle before becoming can= didates for being killed.

Basic Sharding Example 1=

The code snippet below is a sample co= nfiguration for the following setup:

  • A cached sharding binary provider wi= th three mounts and redundancy of 2.
  • Each mount "X" writes to a directory= called /filestoreX.
  • The read strategy for the provider i= sroundRobin.
  • The write strategy for the provider = ispercentageFreeSpa= ce.
=20
   =09=09=09=09=09&l= t;!-- This is a cached filestore --> =09=09=09=09= =09 =09= =09      // Specify the read and write strategy and redundancy for the sharding bina= ry provider  roundRobin=09=09=09=09=09= =09 percentageFreeSpace 2  //For each sub-provider (mount), specify the filestore location  filestore1   filestore2   filestore3  
=20

Basic Sharding Example 2=

The following code snippet shows the "double-shards" template which can = be used as is for your binary store configuration.

=20
 =09 =09 =09=09shard-fs-1 =09 =09 =09=09shard-fs-2 =09 
=20

The double-shards template uses a cached provider with two mounts and a = redundancy of 1, i.e. only one copy of each artifact is stored.

=20
 =09 =09=09 =09=09=091 =09=09=09 =09=09=09 =09=09 =09 
=20

To modify the parameters of the template, you can change the values of t= he elements in the template definition. For example, to increase redun= dancy of the configuration to 2, you only need to modify thetag as shown below.

=20
 =09 =09=09 =09=09=092 =09=09=09 =09=09=09 =09=09 =09 
=20

Cross-Zone Sha= rding Configuration

Sharding across multiple zones in an = HA Artifactory cluster allows you to create zones or regions of sharde= d data to provide additional redundancy in case one of your zones beco= mes unavailable. You can determine the order in which the data is written b= etween the zones and you can set the method for establishing the free space= when writing to the mounts in the neighboring zones.

The following parameters are available for a cross-zone sharding configu= ration in thebinarystor= e.xmlfile:

readBehavior

This parameter dictates the str= ategy for reading binaries from the mounts that make up the cross-zone shar= ded filestore.
Possible value is:

zone:二进制文件读取=从每个山根据zone settings.

writeBehavior=

This parameter dictates the str= ategy for writing binaries to cross-zone sharding mounts:

Possible va= lues are:

zonePercentag= eFreeSpace:Binaries are written to the mount in the relevant= zone with the highest percentage of free space available.

zoneFreeSpace:Binaries are written to the mount in = the zone with the greatest absolute volume of free space available.<= /p>

Add to the A= rtifactory System YAML file

The following parameters are availabl= e for a cross-zone sharding configuration in theArtifactory System YAML=file:

shared.node.id

Unique descriptive name of this server.

Uniqueness

Make sure that each node has an id that is unique on your whole network.=

share= d.node.crossZoneOrder
Sets the zone order in which the data is written to the mounts. In the foll= owing example,

crossZoneOrder: "us-east1= ,us-east2",the sharding will write to the US-EAST-1 zone and then t= o the US-EAST-2 zone.


You can dynamically add nodes to an e= xisting sharding cluster using theArtifactory System YAMLfile. To do so, you = need your cluster to already be configured with sharding, and by adding the= crossZoneOrder: us-east-1,us-eas= t-2property, the new node can write to the existing cluster = nodes without changing the binarystore.xml file.

Example:

This example displays a = cross-zone sharding scenario in which the Artifactory cluster is configured= with a redundancy of 2 and includes the following steps:

  1. The developer first deploys the package to the closest Artifactory node= .
  2. The package is then automatically deployed to the 'US-EAST-1" zone to t= he shard with the highest percentage of free space in the "S1" shard (with = 51% free space).
  3. The package is deployed using the same method to the "S3" shard, that a= lso has the highest percentage of free space in the 'US-EAST-2' zone.

    The code snippet= below is a sample configuration of our cross-zone setup:

    • 1 Artifactory cluster across 2 zones= : "us-east-1" and "us-east-2" in this order.
    • 4 HA nodes, 2 nodes in each zone.
    • 4 mounts (shards), 2 mounts in each zone.
    • The write strategy for the provider = iszonePercentageFreeSpace.

    Example: Cross-zone s= harding configuration inArtifactory System YAML

    =20
    node: id: "west-node-1" crossZoneOrder: "us-east-1,us-east-2"
    =20

    Example: Cross-zone = sharding configuration in the binarystore.xml

    =20
     =09 =09=09 =09 =09=09 =09=09 =09=09 =09=09 =09=09 =09=20 =09 =20 =09 =09=092 =09=09zone =09=09zonePercentageFreeSpace =09 =09=20 =09=09mount1 =09=09us-east-1 =09 =09=20 =09=09mount2 =09=09us-east-1 =09 =09=20 =09=09mount3 =09=09us-east-2 =09 =09=20 =09=09mount4 =09=09us-east-2 =09 
    =20

Configur= ing Sharding for HA Cluster

For a High Availability cluster, Artifactory offers templates that= supportsharding-clusterfor File-System, S3 and= Google Storage.

When configuring your filestore on an HA cluster, you need to place the<= code> binarystore.xmlunder$JFROG_HOME/artifac= tory/var/etc/artifactoryin the primary node and it will be syn= ced to the other members in the cluster.

File System Cl= uster Binary Provider

When using thecluster-file-systemtemplate,&= nbsp;each node has its own local filestore (just like in thefile-system b= inary provider) and is connected to all other cluster nodes via dynamic= ally allocatedRemote Binary Providersusing theSharding-Cluster Binary Provider.

Clust= er-File-System Template Configuration


If you choose to use thecluster-file-system= template, yourbinarystore.xmlconfiguration fi= le should look like this:


=20
 =09 
=20

What's in the template?=

While you don't need to configure anything else in yourbinar= ystore.xml, this is what the cluster-file-system template = looks like under the hood.
Details about thecache-fsprovider can be found in the
缓存文件system Binary Provider=section.
Details about thesharding-cluste<= em>rcan be found in theSharding-Cluster Binary Providersectio= n.
Details about thestate-awaresub-provider can= be found in theState-Aware Binary Providersection.

=20
 =09       =09 local    remote   crossNetworkStrategy crossNetworkStrategy 2   
=20


Cluster-File-System= Example

The following example shows a file system cluster binary configuration w= ith a custom file store directory.

=20
          /opt/jfrog/artifactory/var/data/artifactory= /cache 500000000   local /opt/jfrog/artifactory/var/data/artifactory/fil= estore    remote   crossNetworkStrategy crossNetworkStrategy 2 1   
=20

S3 Sharding

Artifactory allows you to shard multiple S3 buckets. For more informatio= n, see


Using = Balancing to Recover from Mount Failure

In case of a mount failure, the actual redundancy in your system will be= reduced accordingly. In the meantime, binaries continue to be written to t= he remaining active mounts. Once the malfunctioning mount has been restored= , the system needs to rebalance the binaries written to the remaining activ= e mounts to fully restore (i.e. balance) the redundancy configured in the s= ystem. Depending on how long the failed mount was inactive, this may involv= e a significant volume of binaries that now need to be written to the resto= red mount, which may take significant amount of time. Since restoring the f= ull redundancy is a resource intensive operation, the balancing operation i= s run in a series of distinct sessions until complete. These are automatica= lly invoked after aGarbage Collectionprocess has been run in the = system.


Restoring Balance in Unbalanced Redundant Storage Units

的自愿行动ns that= cause an imbalance the system redundancy, such as when doing a filestore m= igration, you may manually invoke rebalancing of redundancy using the =Optimize System StorageREST API endpoint. Applying this endpoint raises a= flag for Artifactory to run rebalancing following the next Garbage Collect= ion. Note that, to expedite rebalancing, you can invoke garbage collection = manually from the Artifactory UI.


Optimizing System Stor= age

After deployment, duplications of files bigger than the redundancy are e= xpected and the garbage collector removes them after the optimization run. = Oneach deployoperation to a shard, Ar= tifactory checks if the checksum exists more than once. If the checksum exi= sts more than once and the count of such repetitions is bigger than the num= ber of redundancy, Artifactory triggers the optimization flag. When the nex= t full garbage collection runs, the duplicates are removed. By default, the= full garbage collection runs after 20 iterations and each iteration runs a= fter four hours resulting in six iterations per day. For more information o= n garbage collection, seeGarbage = Collection.

Artifactory REST API provides an endpoint that allows you to raise a fla= g to indicate that Artifactory should invoke balancing between redundant st= orage units of a sharded filestore after the next garbage collection. There= fore, you can set the flag to optimization flag manually.For information on the API, seeOptimize System Storage