* [k8s] clsi-cache: double the number of shards
* [monorepo] add missing clsi-cache env vars to dev-env
* [clsi] flip direction of clsi-cache shard migration
* [clsi] remove upper bound from clsi-cache shard migration
GitOrigin-RevId: a325a11c3ac9e22a12ad2d8ea802b91d2e175e24
* [clsi] initial implementation of compile from history
* [clsi] copy changes
* [saas-e2e] extend test case with nested folder
* [saas-e2e] add test case for tracked changes
* [web] fix accumulating changes from multiple chunks
* [web] optimize size check for compile request payload
* [clsi] deduplicate globalBlobs
* [clsi] add validation for request body details
* [clsi] add metrics for compile from history
* [clsi] download binary files concurrently
* [clsi] skip download of empty file blob
* [clsi] break down e2e compile time metric by compileFromHistory
GitOrigin-RevId: 0dadef93e89d8a172c35cb130a1042d9d1bec42a
* [clsi] tweak logging for clsi-cache
- Use `clsi-cache` identifier on log line
- Add shard to context
- Record nFiles on "too many entries for tar" error
* [clsi] do not trip clsi-cache circuit breaker on ENOENT errors
These can happen when an output/compile-dir is purged while we download
files.
GitOrigin-RevId: ffa73ef312bce5232ef72e3b81966bb6e14d2255
* [k8s] clsi-cache: migrate to StatefulSet
* clsi-cache: optimize ILB services for GKE subsetting
Update the new clsi-cache internal load balancer services
to use optimal settings for GKE subsetting (NEG backends):
- set allocateLoadBalancerNodePorts: false (not needed with NEGs)
- set externalTrafficPolicy: Local (preserve source IP, keep traffic in zone)
- add trafficDistribution: PreferClose (zone affinity)
These settings ensure traffic from CLSI VMs stays within the same zone
when possible, reducing latency and cross-zone network costs.
* [k8s] clsi-cache: add missing resource paths
* [clsi] exclude readOnly clsi-cache shards
---------
Co-authored-by: Daniel Kontsek <daniel.kontsek@overleaf.com>
GitOrigin-RevId: 34f18b319a0e859ff149a135131c95a44bc674d6
* [clsi] try harder at sending files off to a working clsi-cache shard
* [clsi] use a crc for generating a stable sequence of shards to try
Co-authored-by: Brian Gough <brian.gough@overleaf.com>
* [clsi] gradually migrate to crc based shard assigment
* [clsi] tweak selecting clsi-cache shard from crc
Co-authored-by: Brian Gough <brian.gough@overleaf.com>
* [clsi] bump rollout dates of new clsi-cache shard change
---------
Co-authored-by: Brian Gough <brian.gough@overleaf.com>
GitOrigin-RevId: 9386e170503b405580e4d0a8641832f3fcb1fa83
* [clsi] fix circuit breaker for clsi-cache
* [clsi] enable ts-check for CLSICacheHandler
* [clsi] limit the number of .blg files in clsi-cache to 50
* [clsi-cache] limit the number of files per job to 100
* [clsi-cache] explain early registration of buildId
* [clsi-cache] lock down downloads via nginx to project folder
GitOrigin-RevId: 081d0c40b08db3a384c4d765b71a50b973f42151
Stage timeouts:
- frontend waits 5s
- web/clsi waits 4s
- clsi-cache waits 3s
This should ensure that the frontend can receive a valid response after
any of the backend requests failed.
The circuit breaker will remain closed for TIMEOUT + jitter of 0-3 times
the TIMEOUT of the respective service. This should avoid the bulk of
traffic to fail and occasionally issue retries without hammering the
instances while down.
Also do not try the next backend when the abort signal has expired.
GitOrigin-RevId: d612125616a9e416beff2f4c6d7f30066b5b9d6d
* [clsi] add stats and timings to compile response from clsi-cache
* [clsi] set downloadedFromCache when previously downloaded for synctex
Assumption: every compile will emit an output.log. When the output.log
is missing, but the output.synctex.gz exists, it must have been
downloaded from the cache.
GitOrigin-RevId: 41ea34880931e3c43dda3bc9eb26c0d02054894d
* [mics] fix "app" label in clsi-cache metrics in dev-env
* [clsi-cache] validate filePath when processing file
* [clsi-cache] meter ingress and egress bandwidth
Files are downloaded directly from nginx, hence we cannot meter egress
in clsi-cache easily.
GitOrigin-RevId: 24de8c41728f0e9c984113c1470dec6153e75f20
* Revert "[clsi-cache] only use sharding from updated project editor tabs (#25326)"
This reverts commit 1754276bed3186c0536055c983e32476cc90d416.
* [clsi-cache] remove non sharded instances
GitOrigin-RevId: aa3ac46140dfc1722a3350cf7071e5b11af61199
* [clsi-cache] shard per zone into three instances
Keep the old instance as read fallback. We can remove it in 4 days.
Disk size: 2Ti gives us the maximum write throughput of 240MiB/s on a
N2D instance with fewer than 8 vCPUs.
* [clsi] fix format
* [k8s] clsi-cache: bring back storage-classes
* [k8s] clsi-cache: fix reference to zonal storage-classes
* [k8s] clsi-cache: add logging configs
* [clsi] improve sharding
Co-authored-by: Brian Gough <brian.gough@overleaf.com>
* [clsi] fix sharding
Index needs to be positive.
* [clsi] fix sharding
The random part is static per machine/process.
* [clsi] restrict clsi-cache to user projects
Co-authored-by: Brian Gough <brian.gough@overleaf.com>
* [k8s] clsi-cache: align CLSI_CACHE_NGINX_HOST with service LB
---------
Co-authored-by: Brian Gough <brian.gough@overleaf.com>
GitOrigin-RevId: 1efb1b3245c8194c305420b25e774ea735251fb3
* [web] provide an actual rootFolder from EditorProviders in tests
- Fixup SocketIOMock and ShareJS mocks to provide the complete interface
- Extend SocketIOMock interface to count event listeners
- Fixup test that did not expect to find a working rootDoc
* [web] expose imageName from ProjectContext
* [clsi-cache] check compiler settings before using compile from cache
* [web] avoid fetching initial compile from clsi-cache in PDF detach tab
GitOrigin-RevId: e3c754a7ceca55f03a317e1bc8ae45ed12cc2f02
* [clsi] drop support for docker-in-docker
* [clsi] run SyncTeX in specific output dir rather than compile dir
* [clsi] store output.synctex.gz outside of tar-ball in clsi-cache
* [clsi] add documentation for rewriting of docker bind-mounts
* [server-pro] update env vars for sandboxed compiles in sample config
GitOrigin-RevId: 8debd7102ac612544961f237aa4ff1c530aa3da3
* [clsi-cache] initial revision of the clsi-cache service
* [clsi] send output files to clsi-cache and import from clsi-cache
* [web] pass editorId to clsi
* [web] clear clsi-cache when clearing clsi cache
* [web] add split-tests for controlling clsi-cache rollout
* [web] populate clsi-cache when cloning/creating project from template
* [clsi-cache] produce less noise when populating cache hits 404
* [clsi-cache] push docker image to AR
* [clsi-cache] push docker image to AR
* [clsi-cache] allow compileGroup in job payload
* [clsi-cache] set X-Zone header from latest endpoint
* [clsi-cache] use method POST for /enqueue endpoint
* [web] populate clsi-cache in zone b with template data
* [clsi-cache] limit number of editors per project/user folder to 10
* [web] clone: populate the clsi-cache unless the TeXLive release changed
* [clsi-cache] keep user folder when clearing cache as anonymous user
* [clsi] download old output.tar.gz when synctex finds empty compile dir
* [web] fix lint
* [clsi-cache] multi-zonal lookup of single build output
* [clsi-cache] add more validation and limits
Co-authored-by: Brian Gough <brian.gough@overleaf.com>
* [clsi] do not include clsi-cache tar-ball in output.zip
* [clsi-cache] fix reference after remaining constant
Co-authored-by: Alf Eaton <alf.eaton@overleaf.com>
* [web] consolidate validation of filename into ClsiCacheHandler
* [clsi-cache] extend metrics and event tracking
- break down most of the clsi metrics by label
- compile=initial - new compile dir without previous output files
- compile=recompile - recompile in existing compile dir
- compile=from-cache - compile using previous clsi-cache
- extend segmentation on compile-result-backend event
- isInitialCompile=true - found new compile dir at start of request
- restoredClsiCache=true - restored compile dir from clsi-cache
* [clsi] rename metrics labels for download of clsi-cache
This is in preparation for synctex changes.
* [clsi] use constant for limit of entries in output.tar.gz
Co-authored-by: Eric Mc Sween <eric.mcsween@overleaf.com>
* [clsi-cache] fix cloning of project cache
---------
Co-authored-by: Brian Gough <brian.gough@overleaf.com>
Co-authored-by: Alf Eaton <alf.eaton@overleaf.com>
Co-authored-by: Eric Mc Sween <eric.mcsween@overleaf.com>
GitOrigin-RevId: 4901a65497af13be1549af7f38ceee3188fcf881