Commit Graph

133 Commits

Author SHA1 Message Date
Arseny Sher
6100a02d0f Prefix WAL files in s3 with environment name.
It wasn't merged to prod yet, so safe to enable.
2022-07-01 19:21:28 +04:00
Arseny Sher
97fed38213 Fix cadaca010c for older ssh clients. 2022-07-01 19:20:59 +04:00
Arseny Sher
cadaca010c Make ansible to work with storage nodes through teleport from local box. 2022-07-01 16:58:34 +03:00
Dmitry Ivanov
5ee19b0758 Fix bloated coverage uploads (#2005)
Move coverage data to a better directory, merge it better and don't publish it from CircleCI pipeline
2022-06-29 17:59:19 +03:00
Kirill Bulatov
8a714f1ebf Add coverage to GH actions and rework part of them (#1987) 2022-06-27 19:15:56 +03:00
Anastasia Lubennikova
6d7dc384a5 Add zenith-us-stage-ps-3 to deploy 2022-06-23 14:52:32 +03:00
chaitanya sharma
e1336f451d renamed .zenith data-dir to .neon. 2022-06-09 18:19:18 +02:00
Dmitry Rodionov
b155fe0e2f avoid perf test result context for pg regress 2022-06-02 17:41:34 +03:00
Anton Chaporgin
e5a2b0372d remove sk1 from inventory (#1845)
https://github.com/neondatabase/cloud/issues/1454
2022-06-01 15:40:45 +03:00
Alexey Kondratov
ff233cf4c2 Use :local compute-tools tag to build compute-node image 2022-05-31 23:12:30 +03:00
Kian-Meng Ang
f1c51a1267 Fix typos 2022-05-28 14:02:05 +03:00
Arseny Sher
54b75248ff s3 WAL offloading staging review.
- Uncomment accidently `self.keep_alive.abort()` commented line, due to this
  task never finished, which blocked launcher.
- Mess up with initialization one more time, to fix offloader trying to back up
  segment 0. Now we initialize all required LSNs in handle_elected,
  where we learn start LSN for the first time.
- Fix blind attempt to provide safekeeper service file with remote storage
  params.
2022-05-27 14:02:52 +04:00
Arseny Sher
0e1bd57c53 Add WAL offloading to s3 on safekeepers.
Separate task is launched for each timeline and stopped when timeline doesn't
need offloading. Decision who offloads is done through etcd leader election;
currently there is no pre condition for participating, that's a TODO.

neon_local and tests infrastructure for remote storage in safekeepers added,
along with the test itself.

ref #1009

Co-authored-by: Anton Shyrabokau <ahtoxa@Antons-MacBook-Pro.local>
2022-05-27 06:19:23 +04:00
Kirill Bulatov
06f5e017a1 Move rustfmt check to GH Action 2022-05-26 01:03:48 +03:00
Andrey Taranik
9ab52e2186 helm repository name fix for production proxy deploy (#1790) 2022-05-25 15:41:18 +03:00
Andrey Taranik
703f691df8 production inventory update (#1779) 2022-05-25 14:30:50 +03:00
Sergey Melnikov
d32b491a53 Add zenith-us-stage-sk-6 to deploy (#1728) 2022-05-25 10:31:10 +03:00
Kirill Bulatov
541ec25875 Properly shutdown test mock S3 server 2022-05-24 19:09:31 +03:00
Andrey Taranik
d97617ed3a updated proxy and proxy scram deployment for prod and stress environments (#1758) 2022-05-20 23:12:30 +03:00
Egor Suvorov
bd2979d02c CirleCI/check-codestyle-python: print versions 2022-05-19 00:09:13 +02:00
Andrey Taranik
b9f84f4a83 trun on storage deployment to neon-stress enviroment (#1729) 2022-05-17 23:04:04 +03:00
Arthur Petukhovsky
134eeeb096 Add more common storage metrics (#1722)
- Enabled process exporter for storage services
- Changed zenith_proxy prefix to just proxy
- Removed old `monitoring` directory
- Removed common prefix for metrics, now our common metrics have `libmetrics_` prefix, for example `libmetrics_serve_metrics_count`
- Added `test_metrics_normal_work`
2022-05-17 19:29:01 +03:00
Andrey Taranik
070c255522 Neon stress deploy (#1720)
* storage and proxy deployment for neon stress environment

* neon stress inventory fix
2022-05-17 18:03:01 +03:00
Kirill Bulatov
f2881bbd8a Start and stop single etcd and mock s3 servers globally in python tests 2022-05-17 01:17:44 +03:00
Kirill Bulatov
9a0fed0880 Enable at least 1 safekeeper in every test 2022-05-17 01:17:44 +03:00
Andrey Taranik
cded72a580 remove sk-2 from staging inventory list (#1699) 2022-05-13 20:41:54 +03:00
Anton Shyrabokau
20361395bb Add zenith-us-stage-sk-5 to circleci inventory (#1665)
Co-authored-by: Debian <admin@ip-10-0-5-32.us-west-2.compute.internal>
2022-05-11 21:36:53 +03:00
Arseny Sher
6cb14b4200 Optionally remove WAL on safekeepers without s3 offloading.
And do that on staging, until offloading is merged.
2022-05-10 22:41:02 +04:00
Sergey Melnikov
11a44eda0e Add TLS support in scram-proxy (#1643)
* Add TLS support in scram-proxy

* Fix authEndpoint
2022-05-05 23:48:16 +03:00
Andrey Taranik
4024bfe736 get_binaries script fix (#1638)
* get_binaries script fix

* minor improvment for get_binaries
2022-05-05 22:21:07 +03:00
Stas Kelvich
51a0f2683b fix scram-proxy addresses 2022-05-04 01:35:30 +03:00
Anastasia Lubennikova
2f9b17b9e5 Add simple test of pageserver recovery after crash. To cause a crash, use failpoints in checkpointer 2022-05-03 17:13:09 +03:00
Stas Kelvich
801b749e1d Set correct authEndpoint for the new proxy 2022-05-02 21:46:32 +03:00
Andrey Taranik
68ba6a58a0 authEndpoint fix 2022-05-02 17:55:13 +03:00
Andrey Taranik
8f479a712f minor fixes in proxy deployment 2022-05-02 17:55:13 +03:00
Stas Kelvich
2477d2f9e2 Deploy standalone SRAM proxy on staging 2022-05-02 17:55:13 +03:00
Andrey Taranik
aa933d3961 proxy settings update for new domain (#1597) 2022-04-29 20:05:14 +03:00
Arseny Sher
3fd234da07 Enable etcd for safekeepers in deploy. 2022-04-26 18:13:50 +04:00
Kirill Bulatov
d060a97c54 Simplify clippy runs 2022-04-25 16:23:34 +03:00
Andrey Taranik
56f6269a8e rename docker images to neondatabase docker account (#1570)
* rename docker images to neondatabase docker account

* docker images build fix (permisions for Cargo.lock)
2022-04-25 11:34:51 +03:00
Heikki Linnakangas
a4700c9bbe Use pprof to get flamegraph of get_page and get_relsize requests.
This depends on a hacked version of the 'pprof-rs' crate. Because of
that, it's under an optional 'profiling' feature. It is disabled by
default, but enabled for release builds in CircleCI config. It doesn't
currently work on macOS.

The flamegraph is written to 'flamegraph.svg' in the pageserver
workdir when the 'pageserver' process exits.

Add a performance test that runs the perf_pgbench test, with profiling
enabled.
2022-04-21 20:32:48 +03:00
Kirill Bulatov
81879f8137 Restore missing cachepot env vars 2022-04-18 12:32:04 +03:00
Kirill Bulatov
9b7dcc2bae Use proper cachepot bucket 2022-04-17 16:35:40 +03:00
Kirill Bulatov
e97f94cc30 Bump rustc version 2022-04-14 23:01:06 +03:00
Dmitry Rodionov
1d36c5a39e reenable s3 on staging pagservers by default
After deadlockk fix in https://github.com/neondatabase/neon/pull/1496 s3
seems to work normally. There is one more discovered issue but it is not
a blocker so can be fixed separately.
2022-04-13 20:10:39 +03:00
Arthur Petukhovsky
87020f8126 Fix CI staging deploy (#1499)
- Remove stopped safekeeper from inventory
- Fix github pages address after neon rename
2022-04-13 10:59:29 +03:00
Alexey Kondratov
0fbe657b2f Fix remote e2e tests after repository rename (#1434)
Also start them after release build instead of debug. It saves 3-5
minutes and we anyway use release mode in Docker images.
2022-04-13 00:02:06 +03:00
Arthur Petukhovsky
81ba23094e Fix scripts to deploy sk4 on staging (#1476)
Adjust ansible scripts and inventory for sk4 on staging
2022-04-07 20:38:26 +03:00
Dmitry Rodionov
9594362f74 change python cache version to 2 (fixes python cache in circle CI) 2022-03-29 10:42:30 +03:00
Anton Shyrabokau
be6a6958e2 CI: rebuild postgres when Makefile changes (#1429) 2022-03-28 18:19:20 -07:00