Commit Graph

293 Commits

Author SHA1 Message Date
Davinder Singh
be5a7b56c8 [WEB + CLSI] Download as docx file feature (#32851)
* using CLSI logic for fetching the project contents and skip the .zip export

* Use unique conversion directory for project-to-docx export to avoid corrupting the shared compile
  directory when a compile runs concurrently

* Remove X-Accel-Buffering header — not needed as CLSI does not run behind nginx

* moving log before sending the data

* Return CLSI stream directly instead of buffering to disk on web

  Previously convertProjectToDocx wrote the CLSI response to a temp file
  on disk, then the controller read it back to stream to the client.
  Now the stream is returned directly and piped to the response,
  avoiding unnecessary disk I/O on the web server.

* Use href redirect for docx export instead of fetching blob into memory

* making functions and files more generic so they can be used in future for other documents exports as well

* adding export-docx split test

* adding unit tests

* adding cypress E2E test

* format:fix

* renaming the route to download from convert

* adding new icon for export docx button

* format:fix

* remove unused showExportDocumentErrorToast export and adding guard against invalid Content-Length header from CLSI

* format:fix

* refactor(clsi): move promisify(parse) into RequestParser

* refactor: generic conversion endpoint with type as route
  param

* refactor: use type→extension map for validated conversion types

* refactor(clsi): remove --standalone flag and fix rejection test

* fixing the href in cypress test

* renaming function

* adding type to Metrics.inc

* fix: rename exportProjectDocument, add WithLock wrapper and metrics type label

* format:fix

* fix: hide docx export from anonymous users and add WithLock wrapper

* format fix

* remove redundant Content-Length validation from DocumentConversionManager

* format:fix

* removing trailing icon

GitOrigin-RevId: e9764fefac2c4b625d23be9e942ea4a8b283c70d
2026-04-24 08:06:10 +00:00
Jakob Ackermann
926d6bccd7 [clsi] reduce write traffic to clsi-cache from free users (#32958)
GitOrigin-RevId: c01cc21c3c82f361c7d82843e65bb087b21434ed
2026-04-22 08:05:59 +00:00
Jakob Ackermann
0544aded40 [clsi] handle draft mode and tikzexternalize as part of sync phase (#32516)
* [clsi] handle draft mode and tikzexternalize as part of sync phase

* [clsi] emit empty string from SafeReader on ENOENT

* [clsi] persist history state after clearing dirty state without changes

GitOrigin-RevId: d9dcd2e6887017f7935b5e95bdbdc6e11a3b18f5
2026-03-31 08:07:19 +00:00
Jakob Ackermann
d66a856baa [clsi] remove locking from docker actions (#32373)
* [clsi] remove locking from docker actions

Start:
- We have an in-memory lock on the compile request

Destroy:
- as part of run: see above
- as part of cleanup: we check the last access time now, so it cannot
  happen concurrent to compiling anymore.

Co-authored-by: Anna Claire Fields <anna.fields@overleaf.com>

* [clsi] update comment

---------

Co-authored-by: Anna Claire Fields <anna.fields@overleaf.com>
GitOrigin-RevId: a58df45416ae31c0b38d5efec7f9371d747303df
2026-03-27 09:06:28 +00:00
Mathias Jakobsen
9c97876268 [web]+clsi] Allow docx import via pandoc (#32004)
Co-authored-by: Jakob Ackermann <jakob.ackermann@overleaf.com>
GitOrigin-RevId: 246b3290ec04867f71545b1a7c5d95d0f68379ff
2026-03-27 09:06:23 +00:00
Jakob Ackermann
07397bbdde [clsi] avoid server error when clearing cache while compiling (#32349)
* [clsi] avoid server error when clearing cache while compiling

* [clsi] tweak API around releasing locks

Co-authored-by: Eric Mc Sween <eric.mcsween@overleaf.com>

---------

Co-authored-by: Eric Mc Sween <eric.mcsween@overleaf.com>
GitOrigin-RevId: d3f171467d3bc26941758dd333f30049b37a05c8
2026-03-23 09:06:18 +00:00
Jakob Ackermann
3aa69c6ffa [k8s] clsi-cache: double the number of shards (#32323)
* [k8s] clsi-cache: double the number of shards

* [monorepo] add missing clsi-cache env vars to dev-env

* [clsi] flip direction of clsi-cache shard migration

* [clsi] remove upper bound from clsi-cache shard migration

GitOrigin-RevId: a325a11c3ac9e22a12ad2d8ea802b91d2e175e24
2026-03-20 09:07:11 +00:00
Jakob Ackermann
6377624d25 [clsi] ignore download errors for binary files in compile from history (#32263)
GitOrigin-RevId: 3c1940b2d56701ec4b07d1457ee1af2de317a047
2026-03-19 09:07:00 +00:00
Brian Gough
9f1e4d99e5 handle old versions of latexmk in run count extraction (#30597)
* handle old versions of latexmk in run count extraction

the log lines for the run number change from stderr to stdout in TL2022

* extend SimpleLatexFileTest to include TL2017

* reset metrics for each scenario in SimpleLatexFileTests

* fix buildscript merge conflict

GitOrigin-RevId: fb74f2025d21ddf43be6a3b90ac6f7df4d975db6
2026-03-19 09:06:55 +00:00
Jakob Ackermann
69a7927267 [clsi] shard clsi_compiles_total metric by syncType (#32255)
GitOrigin-RevId: 43111697323ec6697ef5f42cf17807ea564181a0
2026-03-19 09:05:55 +00:00
Jakob Ackermann
c2130dccb9 [clsi] use cheapest gzip compression level for history snapshot (#32251)
memoir manual with 1.5MiB snapshot:

level |load/decompress time | store/compress time |  size   | ratio | total sync time
 ---  | ---                 | ---                 |  ---    | ---   | ---
 6    | 18ms                | 57ms                |  412KiB | 26%   | 88ms
 1    | 17ms                | 28ms                |  509KiB | 32%   | 53ms
 0    |  8ms                | 10ms                | 1578KiB | 0%    | 33ms

total sync: read snapshot, walk dir, sync files to disk, save snapshot

GitOrigin-RevId: a2b1ee063af5aa749014f942db5e08bb1e685848
2026-03-19 09:05:50 +00:00
Jakob Ackermann
f947b549e4 [clsi-perf] migrate to compile from history mode (#32234)
* [clsi] only download history snapshot from clsi-cache when enabled

* [clsi-perf] migrate to compile from history mode

GitOrigin-RevId: 2dd54e032bd85d6335488741c039a5a1bd60090d
2026-03-18 09:07:51 +00:00
Jakob Ackermann
2e389c5a41 [rails] migrate compiles of conversions/submissions to history mode (#32053)
* [saas-e2e] test gallery templates with binary file

* [rails] add make target for fixing rubocop errors

* [rails] migrate compiles of conversions/submissions to history mode

* [rails] forward version to clsi request

* [rails] trim down compile request

* [saas-e2e] source v1 secrets after make install

GitOrigin-RevId: 65269e1df1051c9f3b4f1813d2e9dcf32a01be50
2026-03-18 09:07:22 +00:00
Jakob Ackermann
d5b55b831d [clsi] make last access tracking more robust (#32192)
* [clsi] do not overwrite last access during initial scan

* [clsi] cleanup submission cache 5-10min after startup

* [clsi] address review comments

GitOrigin-RevId: e03beec1b3deaee50629ada72b0242a8a2b2ae66
2026-03-18 09:07:10 +00:00
Jakob Ackermann
a9c413857a [clsi] avoid destroying containers of recently accessed projects (#32186)
* [clsi] avoid destroying containers of recently accessed projects

Co-authored-by: Anna Claire Fields <anna.fields@overleaf.com>

* [clsi] gracefully handle missing access time during container cleanup

* [clsi] fix cyclic import

---------

Co-authored-by: Anna Claire Fields <anna.fields@overleaf.com>
GitOrigin-RevId: 8195b6fccbe26d2fd673d38356af5d44cf4042a3
2026-03-18 09:07:01 +00:00
Jakob Ackermann
81b7121408 [clsi] initial implementation of compile from history (#31883)
* [clsi] initial implementation of compile from history

* [clsi] copy changes

* [saas-e2e] extend test case with nested folder

* [saas-e2e] add test case for tracked changes

* [web] fix accumulating changes from multiple chunks

* [web] optimize size check for compile request payload

* [clsi] deduplicate globalBlobs

* [clsi] add validation for request body details

* [clsi] add metrics for compile from history

* [clsi] download binary files concurrently

* [clsi] skip download of empty file blob

* [clsi] break down e2e compile time metric by compileFromHistory

GitOrigin-RevId: 0dadef93e89d8a172c35cb130a1042d9d1bec42a
2026-03-06 09:12:07 +00:00
Jakob Ackermann
eca31afb4a [clsi] remove unused endpoints for downloading output files (#31692)
GitOrigin-RevId: a0cac10f3585414779b026f38c2af2773c80082f
2026-03-06 09:06:33 +00:00
Jakob Ackermann
6c6e8d9a97 [monorepo] switch all output file reads to clsi-nginx (#31691)
* [monorepo] switch all output file reads to clsi-nginx

* [clsi-lb] allow gallery download requests

* [terraform] clsi: use nginx.conf from clsi service

* [clsi] fix flakey tests

* [clsi] replace alias with rewrite and root in nginx config

* [k8s] clsi-lb: expose download port on internal service

* [web] add explicit endpoint for downloading all output files

Serve the output.zip endpoint from clsi.

* [clsi] fix regex for latexqc submission ids

Previously, we only handled template submission ids.

GitOrigin-RevId: 6c3b21b01ec41ae767530b14aac31fbe3d640dd5
2026-02-24 09:07:12 +00:00
Brian Gough
f3e8601cba fix caching of minted output files in TL2025 (#31455)
GitOrigin-RevId: b82df4d9c7898332b310fd956c5f002bf5b20e39
2026-02-11 09:06:14 +00:00
Jakob Ackermann
c5bc4a1259 [clsi] tweak logging for clsi-cache (#31452)
* [clsi] tweak logging for clsi-cache

- Use `clsi-cache` identifier on log line
- Add shard to context
- Record nFiles on "too many entries for tar" error

* [clsi] do not trip clsi-cache circuit breaker on ENOENT errors

These can happen when an output/compile-dir is purged while we download
files.

GitOrigin-RevId: ffa73ef312bce5232ef72e3b81966bb6e14d2255
2026-02-11 09:06:09 +00:00
Jakob Ackermann
8eba220693 [clsi] remove initial vs recompile flag from clsi-perf metric (#31052)
GitOrigin-RevId: 75d101b355b291206386b0e6838571894af17a48
2026-01-28 09:06:43 +00:00
Jakob Ackermann
0ee8b25298 [k8s] clsi-cache: migrate to StatefulSet (#30886)
* [k8s] clsi-cache: migrate to StatefulSet

* clsi-cache: optimize ILB services for GKE subsetting

Update the new clsi-cache internal load balancer services
to use optimal settings for GKE subsetting (NEG backends):

- set allocateLoadBalancerNodePorts: false (not needed with NEGs)
- set externalTrafficPolicy: Local (preserve source IP, keep traffic in zone)
- add trafficDistribution: PreferClose (zone affinity)

These settings ensure traffic from CLSI VMs stays within the same zone
when possible, reducing latency and cross-zone network costs.

* [k8s] clsi-cache: add missing resource paths

* [clsi] exclude readOnly clsi-cache shards

---------

Co-authored-by: Daniel Kontsek <daniel.kontsek@overleaf.com>
GitOrigin-RevId: 34f18b319a0e859ff149a135131c95a44bc674d6
2026-01-27 09:05:50 +00:00
Andrew Rumble
cd7da983d1 Merge pull request #30232 from overleaf/ar/convert-clsi-to-es-modules
[clsi] convert to ES modules

GitOrigin-RevId: fb7fa52cc8f678ee31be352e62a5dff95e88008b
2026-01-22 09:06:23 +00:00
Andrew Rumble
645ee30aa9 Merge pull request #30887 from overleaf/ar-give-engagement-modify-institution-manager-capability
[web] Allow engagement role to modify institution managers

GitOrigin-RevId: 3fca81ea1aaa1427da62102cb638f0b288e609b2
2026-01-22 09:05:51 +00:00
Jakob Ackermann
3f9a7cf463 [clsi] consolidate metrics for clsi-perf (#30746)
* [clsi] remove all clsi-perf/health-check metrics

* [clsi] always emit E2E compile time metric

* [clsi] do not collect metrics for clsi-cache-template compiles

* [clsi] fix unit tests: request.metricsOpts always exists

* [clsi] use a gauge for the e2e compile time metric of clsi-perf

Co-authored-by: Eric Mc Sween <eric.mcsween@overleaf.com>

* [clsi] remove metrics for binary file downloads from clsi-perf

---------

Co-authored-by: Eric Mc Sween <eric.mcsween@overleaf.com>
GitOrigin-RevId: 7995512e57c802086350e3d1a0ec5213ecdf0a05
2026-01-19 09:06:34 +00:00
Jakob Ackermann
023f39ded9 [clsi] try harder at sending files off to a working clsi-cache shard (#30673)
* [clsi] try harder at sending files off to a working clsi-cache shard

* [clsi] use a crc for generating a stable sequence of shards to try

Co-authored-by: Brian Gough <brian.gough@overleaf.com>

* [clsi] gradually migrate to crc based shard assigment

* [clsi] tweak selecting clsi-cache shard from crc

Co-authored-by: Brian Gough <brian.gough@overleaf.com>

* [clsi] bump rollout dates of new clsi-cache shard change

---------

Co-authored-by: Brian Gough <brian.gough@overleaf.com>
GitOrigin-RevId: 9386e170503b405580e4d0a8641832f3fcb1fa83
2026-01-15 09:05:26 +00:00
Jakob Ackermann
32ad596e54 [clsi] minor fixes for clsi-cache (#30551)
* [clsi] fix circuit breaker for clsi-cache

* [clsi] enable ts-check for CLSICacheHandler

* [clsi] limit the number of .blg files in clsi-cache to 50

* [clsi-cache] limit the number of files per job to 100

* [clsi-cache] explain early registration of buildId

* [clsi-cache] lock down downloads via nginx to project folder

GitOrigin-RevId: 081d0c40b08db3a384c4d765b71a50b973f42151
2026-01-07 09:06:30 +00:00
Brian Gough
67aa42a57a Merge pull request #29650 from overleaf/bg-update-clsi-tests-to-2025
update clsi acceptance tests to use texlive 2025.1 by default

GitOrigin-RevId: d69e97132e87873a8b91c39494c545250298d935
2025-11-13 09:06:23 +00:00
Jakob Ackermann
5140fff347 [clsi] gracefully handle fast exit of synctex/wordcount containers (#29505)
* [clsi] gracefully handle fast exit of synctex/wordcount containers

* [clsi] do not change container options in-place for logging

GitOrigin-RevId: 0b685310a3c72f8f46125fefaa30c1ddb19e7b07
2025-11-05 09:06:40 +00:00
Jakob Ackermann
28c1c7db37 [clsi-cache] add circuit breaker to clsi-cache requests (#29339)
Stage timeouts:
- frontend waits 5s
- web/clsi waits 4s
- clsi-cache waits 3s
This should ensure that the frontend can receive a valid response after
any of the backend requests failed.

The circuit breaker will remain closed for TIMEOUT + jitter of 0-3 times
the TIMEOUT of the respective service. This should avoid the bulk of
traffic to fail and occasionally issue retries without hammering the
instances while down.

Also do not try the next backend when the abort signal has expired.

GitOrigin-RevId: d612125616a9e416beff2f4c6d7f30066b5b9d6d
2025-10-29 09:05:34 +00:00
Eric Mc Sween
61d823f946 Merge pull request #29301 from overleaf/em-fast-png-copy-metrics
More precise metrics on fast PNG copy

GitOrigin-RevId: 8b3a65a8a70152f1743c45f701448dc97be7ffeb
2025-10-28 09:05:23 +00:00
Jakob Ackermann
02391c6c51 [clsi] prepare for clsi-cache survey (#29274)
* [clsi] add stats and timings to compile response from clsi-cache

* [clsi] set downloadedFromCache when previously downloaded for synctex

Assumption: every compile will emit an output.log. When the output.log
is missing, but the output.synctex.gz exists, it must have been
downloaded from the cache.
GitOrigin-RevId: 41ea34880931e3c43dda3bc9eb26c0d02054894d
2025-10-23 08:05:43 +00:00
Eric Mc Sween
d66c73a29e Merge pull request #29176 from overleaf/em-clsi-image-timings
CLSI: Capture image processing timings
GitOrigin-RevId: 28c2f73f260f2e82a64751bb46655e7546a458ef
2025-10-20 08:05:42 +00:00
Eric Mc Sween
f09a494e56 Merge pull request #29106 from overleaf/bg-fix-capdrop-in-docker-runner
fix capdrop in docker runner

GitOrigin-RevId: 1e8c81723a9e152ec85a3a2776965891fbe07606
2025-10-16 08:06:47 +00:00
Eric Mc Sween
8af6fbc368 Merge pull request #29085 from overleaf/em-metrics-dependencies
Log noteworthy dependencies in the CLSI performance log

GitOrigin-RevId: 8412251e0cc77f305867d645ad5d9d3bbb9b0890
2025-10-16 08:06:42 +00:00
Eric Mc Sween
ee7ccd6be4 Merge pull request #29076 from overleaf/em-metrics-compile-passes
Add number of passes to compile metrics

GitOrigin-RevId: b9a6b6691f2feb7f376cd1bb94c81ecb7c3bc580
2025-10-16 08:06:37 +00:00
Eric Mc Sween
4d39899e7b Merge pull request #29117 from overleaf/jpa-clsi-logging
[clsi] log high level details for large clsi-cache requests

GitOrigin-RevId: e368d745554c925a665f8794514cc8bfed78b7b3
2025-10-16 08:06:32 +00:00
Eric Mc Sween
9813bc4b51 Merge pull request #28992 from overleaf/em-compile-metrics-runs
Add metric measuring the execution time of each latexmk rule

GitOrigin-RevId: fcb7215f7f53063e6fe046c01bbcc81e6441c064
2025-10-13 08:07:07 +00:00
Eric Mc Sween
74524db293 Merge pull request #28909 from overleaf/em-compile-metrics
Use histograms to track CLSI compile times

GitOrigin-RevId: cf25f1e6d2094186f419acc70748f0c71b6c3240
2025-10-13 08:07:02 +00:00
Brian Gough
58094ebcd6 Merge pull request #28988 from overleaf/bg-add-file-info-to-performance-logs
add latexmk fdb file info to performance logs

GitOrigin-RevId: 3cc5709cd10fd55c2cd8aff7754fb7868aacdf0c
2025-10-13 08:05:23 +00:00
Brian Gough
da3f366643 Merge pull request #28959 from overleaf/bg-exclude-health-checks-from-performance-logs
exclude health checks from performance logs

GitOrigin-RevId: 88db63e00b32b2b015ee25c7d555546ed7d9a95b
2025-10-13 08:05:18 +00:00
Brian Gough
d24f37d3a4 Merge pull request #28880 from overleaf/bg-add-time-option-to-clsi
add latexmk `-time` option to clsi and record performance logs

GitOrigin-RevId: 467473859359913da73f83e10b63b45603ea175c
2025-10-09 08:06:12 +00:00
Jimmy Domagala-Tang
ed8da26479 add persistent directory for rolling builds texlive location (#28563)
GitOrigin-RevId: ea131bc99f27be32055d40a92a967f524f29d02d
2025-09-23 08:07:59 +00:00
Miguel Serrano
b910cb47ef Merge pull request #28138 from overleaf/msm-remove-volumes-dockerode
[clsi] Remove `Volumes` from container options

GitOrigin-RevId: 53a60f69e9689ee777d9b300127885de7b88c1fb
2025-09-01 08:05:03 +00:00
Jakob Ackermann
8c39add865 [clsi-cache] meter ingress and egress bandwidth (#27143)
* [mics] fix "app" label in clsi-cache metrics in dev-env

* [clsi-cache] validate filePath when processing file

* [clsi-cache] meter ingress and egress bandwidth

Files are downloaded directly from nginx, hence we cannot meter egress
in clsi-cache easily.

GitOrigin-RevId: 24de8c41728f0e9c984113c1470dec6153e75f20
2025-07-16 08:05:59 +00:00
Brian Gough
2f2862ecd7 Merge pull request #26637 from overleaf/bg-clsi-fix-process-group-for-local-compiles
fix "stop compile" option for local command runner in CE/SP

GitOrigin-RevId: 7986b505362aaf33ac6e161b3b54458baba1e2e6
2025-07-11 08:07:11 +00:00
Jakob Ackermann
26a7a7d7b8 [clsi] mark VM as unhealthy when detecting of-of-disk condition (#25721)
* [clsi] shed load when detecting out-of-disk condition

* [clsi] mark VM as unhealthy when detecting of-of-disk condition

GitOrigin-RevId: 25cda6785c0d973f50ec6206bee389804f35917e
2025-05-21 08:05:34 +00:00
Jakob Ackermann
ec1bd69605 [clsi-cache] remove non sharded instances (#25645)
* Revert "[clsi-cache] only use sharding from updated project editor tabs (#25326)"

This reverts commit 1754276bed3186c0536055c983e32476cc90d416.

* [clsi-cache] remove non sharded instances

GitOrigin-RevId: aa3ac46140dfc1722a3350cf7071e5b11af61199
2025-05-16 08:05:02 +00:00
Jakob Ackermann
fba8f776a1 [web] avoid trying to fetch synctex.gz from clsi-cache in free projects (#25445)
* [web] avoid trying to fetch synctex.gz from clsi-cache in free projects

* [clsi] parse boolean query parameter

GitOrigin-RevId: 99c98aac8147a626b704e9a888b7fc660cc5ab17
2025-05-12 08:05:24 +00:00
Jakob Ackermann
d489e35782 [web] emit event when synctex mapping was downloaded from clsi-cache (#25424)
* [clsi] tell frontend when synctex mapping was downloaded from clsi-cache

* [web] emit event when synctex mapping was downloaded from clsi-cache

GitOrigin-RevId: 1f6b7e0faaa7dd76449aad566802da971a4cf9ed
2025-05-09 08:06:00 +00:00