Commit Graph

1894 Commits

Author SHA1 Message Date
Konstantin Knizhnik
28243d68e6 Yet another apporach of copying logical timeline size during branch creation (#2139)
* Yet another apporach of copying logical timeline size during branch creation

* Fix unit tests

* Update pageserver/src/layered_repository.rs

Co-authored-by: Thang Pham <thang@neon.tech>

* Update pageserver/src/layered_repository.rs

Co-authored-by: Thang Pham <thang@neon.tech>

* Update pageserver/src/layered_repository.rs

Co-authored-by: Thang Pham <thang@neon.tech>

Co-authored-by: Thang Pham <thang@neon.tech>
2022-07-26 09:11:10 +03:00
Kirill Bulatov
45680f9a2d Drop CircleCI runs (#2082) 2022-07-25 18:30:30 +03:00
Dmitry Ivanov
5f4ccae5c5 [proxy] Add the password hack authentication flow (#2095)
[proxy] Add the `password hack` authentication flow

This lets us authenticate users which can use neither
SNI (due to old libpq) nor connection string `options`
(due to restrictions in other client libraries).

Note: `PasswordHack` will accept passwords which are not
encoded in base64 via the "password" field. The assumption
is that most user passwords will be valid utf-8 strings,
and the rest may still be passed via "password_".
2022-07-25 17:23:10 +03:00
Thang Pham
39c59b8df5 Fix flaky test_branch_creation_before_gc test (#2142) 2022-07-22 12:44:20 +01:00
Alexander Bayandin
9dcb9ca3da test/performance: ensure we don't have tables that we're creating (#2135) 2022-07-22 11:00:05 +01:00
Dmitry Rodionov
e308265e42 register tenants task thread pool threads in thread_mgr
needed to avoid this warning: is_shutdown_requested() called in an unexpected thread
2022-07-22 11:43:38 +03:00
Thang Pham
ed102f44d9 Reduce memory allocations for page server (#2010)
## Overview

This patch reduces the number of memory allocations when running the page server under a heavy write workload. This mostly helps improve the speed of WAL record ingestion. 

## Changes
- modified `DatadirModification` to allow reuse the struct's allocated memory after each modification
- modified `decode_wal_record` to allow passing a `DecodedWALRecord` reference. This helps reuse the struct in each `decode_wal_record` call
- added a reusable buffer for serializing object inside the `InMemoryLayer::put_value` function
- added a performance test simulating a heavy write workload for testing the changes in this patch

### Semi-related changes
- remove redundant serializations when calling `DeltaLayer::put_value` during `InMemoryLayer::write_to_disk` function call [1]
- removed the info span `info_span!("processing record", lsn = %lsn)` during each WAL ingestion [2]

## Notes
- [1]: in `InMemoryLayer::write_to_disk`, a deserialization is called
  ```
  let val = Value::des(&buf)?;
  delta_layer_writer.put_value(key, *lsn, val)?;
  ``` 
  `DeltaLayer::put_value` then creates a serialization based on the previous deserialization
  ```
  let off = self.blob_writer.write_blob(&Value::ser(&val)?)?;
  ```
- [2]: related: https://github.com/neondatabase/neon/issues/733
2022-07-21 12:08:26 -04:00
Konstantin Knizhnik
572ae74388 More precisely control size of inmem layer (#1927)
* More precisely control size of inmem layer

* Force recompaction of L0 layers if them contains large non-wallogged BLOBs to avoid too large layers

* Add modified version of test_hot_update test (test_dup_key.py) which should generate large layers without large number of tables

* Change test name in test_dup_key

* Add Layer::get_max_key_range function

* Add layer::key_iter method and implement new approach of splitting layers during compaction based on total size of all key values

* Add test_large_schema test for checking layer file size after compaction

* Make clippy happy

* Restore checking LSN distance threshold for checkpoint in-memory layer

* Optimize stoage keys iterator

* Update pageserver/src/layered_repository.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Update pageserver/src/layered_repository.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Update pageserver/src/layered_repository.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Update pageserver/src/layered_repository.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Update pageserver/src/layered_repository.rs

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>

* Fix code style

* Reduce number of tables in test_large_schema to make it fit in timeout with debug build

* Fix style of test_large_schema.py

* Fix handlng of duplicates layers

Co-authored-by: Heikki Linnakangas <heikki@zenith.tech>
2022-07-21 07:45:11 +03:00
Arthur Petukhovsky
b445cf7665 Refactor test_unavailability (#2134)
Now test_unavailability uses async instead of Process. The test is refactored to fix a possible race condition.
2022-07-20 22:13:05 +03:00
Kirill Bulatov
cc680dd81c Explicitly enable cachepot in Docker builds only 2022-07-20 17:09:36 +03:00
Heikki Linnakangas
f4233fde39 Silence "Module already imported" warning in python tests
We were getting a warning like this from the pg_regress tests:

    =================== warnings summary ===================
    /usr/lib/python3/dist-packages/_pytest/config/__init__.py:663
      /usr/lib/python3/dist-packages/_pytest/config/__init__.py:663: PytestAssertRewriteWarning: Module already imported so cannot be rewritten: fixtures.pg_stats
        self.import_plugin(import_spec)

    -- Docs: https://docs.pytest.org/en/stable/warnings.html
    ------------------ Benchmark results -------------------

To fix, reorder the imports in conftest.py. I'm not sure what exactly
the problem was or why the order matters, but the warning is gone and
that's good enough for me.
2022-07-20 16:55:41 +03:00
Heikki Linnakangas
b4c74c0ecd Clean up unnecessary dependencies.
Just to be tidy.
2022-07-20 16:31:25 +03:00
Heikki Linnakangas
abff15dd7c Fix test to be more robust with slow pageserver.
If the WAL arrives at the pageserver slowly, it's possible that the
branch is created before all the data on the parent branch have
arrived. That results in a failure:

    test_runner/batch_others/test_tenant_relocation.py:259: in test_tenant_relocation
        timeline_id_second, current_lsn_second = populate_branch(pg_second, create_table=False, expected_sum=1001000)
    test_runner/batch_others/test_tenant_relocation.py:133: in populate_branch
        assert cur.fetchone() == (expected_sum, )
    E   assert (500500,) == (1001000,)
    E     At index 0 diff: 500500 != 1001000
    E     Full diff:
    E     - (1001000,)
    E     + (500500,)

To fix, specify the LSN to branch at, so that the pageserver will wait
for it arrive.

See https://github.com/neondatabase/neon/issues/2063
2022-07-20 15:59:46 +03:00
Thang Pham
160e52ec7e Optimize branch creation (#2101)
Resolves #2054

**Context**: branch creation needs to wait for GC to acquire `gc_cs` lock, which prevents creating new timelines during GC. However, because individual timeline GC iteration also requires `compaction_cs` lock, branch creation may also need to wait for compactions of multiple timelines. This results in large latency when creating a new branch, which we advertised as *"instantly"*.

This PR optimizes the latency of branch creation by separating GC into two phases:
1. Collect GC data (branching points, cutoff LSNs, etc)
2. Perform GC for each timeline

The GC bottleneck comes from step 2, which must wait for compaction of multiple timelines. This PR modifies the branch creation and GC functions to allow GC to hold the GC lock only in step 1. As a result, branch creation doesn't need to wait for compaction to finish but only needs to wait for GC data collection step, which is fast.
2022-07-19 14:56:25 -04:00
Heikki Linnakangas
98dd2e4f52 Use zstd and multiple threads to compress artifact tarball.
For faster and better compression.
2022-07-19 21:31:34 +03:00
Heikki Linnakangas
71753dd947 Remove github CI 'build_postgres' job, merging it with 'build_neon'
Simplifies the workflow. Makes the overall build a little faster, as
the build_postgres step doesn't need to upload the pg.tgz artifact,
and the build_neon step doesn't need to download it again.

This effectively reverts commit a490f64a68. That commit changed the
workflow so that the Postgres binaries were not included in the
neon.tgz artifact. With this commit, the pg.tgz artifact is gone, and
the Postgres binaries are part of neon.tgz again.
2022-07-19 21:31:22 +03:00
Alexander Bayandin
4446791397 github/workflows: pause stress env deployment (#2122) 2022-07-19 17:40:58 +01:00
Alexander Bayandin
5ff7a7dd8b github/workflows: run periodic benchmarks earlier (#2121) 2022-07-19 16:33:33 +01:00
Heikki Linnakangas
3dce394197 Use the same cargo options for every cargo call.
The "cargo metadata" and "cargo test --no-run" are used in the workflow
to just list names of the final binaries, but unless the same cargo
options like --release or --debug are used in those calls, they will in
fact recompile everything.
2022-07-19 16:36:59 +03:00
Heikki Linnakangas
df7f644822 Move things around in github yml file, for clarity.
Also, this avoids building the list of test binaries in release mode.
They are not included in the neon.tgz tarball in release mode.
2022-07-19 16:36:59 +03:00
Arthur Petukhovsky
bf5333544f Fix missing quotes in GitHub Actions (#2116) 2022-07-19 10:57:24 +03:00
Heikki Linnakangas
0b8049c283 Update core_changes.md, describing Postgres changes.
I went through "git diff REL_14_2" and updated the doc to list all the
changes, categorized into what I think could form a logical set of
patches.
2022-07-19 09:53:12 +03:00
Heikki Linnakangas
f384e20d78 Minor cleanup in layer_repository.rs. 2022-07-19 07:50:55 +03:00
Heikki Linnakangas
0b14fdb078 Reorganize, expand, improve internal documentation
Reorganize existing READMEs and other documentation files into mdbook
format. The resulting Table of Contents is a mix of placeholders for
docs that we should write, and documentation files that we already had,
dropped into the most appropriate place.

Update the Pageserver overview diagram. Add sections on thread
management and WAL redo processes.

Add all the RFCs to the mdbook Table of Content too.

Per github issue #1979
2022-07-18 17:39:12 +03:00
Arseny Sher
a69fdb0e8e Fix commit_lsn monotonicity violation.
On ProposerElected message receival WAL is truncated at streaming point; this
code expected that, once vote is given for the proposer / term switch happened,
flush_lsn can be advanced only by this proposer (or higher one). However, that
didn't take into account possibility of accumulating written WAL and flushing it
after vote is given -- flushing goes without term checks. Which eventually led
to the violation in question.

ref #2048
2022-07-18 15:15:51 +03:00
Arseny Sher
eeff56aeb7 Make get_dir_size robust to concurrent deletions.
ref #2055
2022-07-18 15:13:10 +03:00
Dmitry Rodionov
7987889cb3 keep successfully downloaded index parts 2022-07-18 12:27:04 +03:00
Dmitry Rodionov
912a08317b do not ignore errors during downloading of tenant index parts 2022-07-18 12:27:04 +03:00
Kirill Bulatov
c4b2347e21 Use less restricrtive lock guard during storage sync 2022-07-17 12:49:18 +03:00
dependabot[bot]
373bc59ebe Bump pywin32 from 227 to 301 (#2102) 2022-07-16 16:05:12 +01:00
Egor Suvorov
94003e1ebc postgres_ffi: test restoring from intermediate LSNs by wal_craft 2022-07-15 19:06:50 +03:00
Egor Suvorov
19ea486cde postgres_ffi/xlog_utils: refactor find_end_of_wal test
* Deduce `last_segment` automatically
* Get rid of local `wal_dir`/`wal_seg_size` variables
* Prepare to test parsing of WAL from multiple specific points, not just the start;
  extract `check_end_of_wal` function to check both partial and non-partial WAL segments.
2022-07-15 19:06:50 +03:00
Alexander Bayandin
95c40334b8 github/workflows: post periodic benchmark failures to slack (#2105) 2022-07-15 15:39:49 +01:00
Sergey Melnikov
a68d5a0173 Run workflow on release branch (#2085) 2022-07-15 13:18:55 +02:00
Alexey Kondratov
c690522870 [compute_tools] Change owner of the schema public only once (#2058)
Otherwise, we will change it back to the db owner on each restart. Even
if user already changed schema owner to some other user.
2022-07-15 12:25:07 +02:00
Heikki Linnakangas
eaa550afcc Reduce size of cargo deps cache, by excluding ~/.cargo/registry/src. 2022-07-15 13:18:48 +03:00
Heikki Linnakangas
a490f64a68 Don't include Postgres binaries in neon.tgz
neon.tgz artifact in the github workflow included the contents of
'tmp_install', but that seems pointless, because the same files are
included earlier already in the pg.tgz artifact.
2022-07-15 12:33:13 +03:00
Thang Pham
fe65d1df74 reduce concurrent tasks in test_branching_with_pgbench.py
- add thread limit
- run `pgbench` with 1 client
2022-07-15 12:30:09 +03:00
Heikki Linnakangas
c68336a246 Strip debug symbols from test binaries, to make the artifact smaller.
Uploading large artifacts is slow in github actions. To speed that up,
make the artifact smaller.

The code coverage tool doesn't require debug symbols, so remove them.

We've discussed doing the same for *all* binaries, but it's nice to
have debugging symbols for debugging purposes, and so that you get
more complete stack traces. The discussion is ongoing, but let's at
least do this for the test symbols now.
2022-07-14 23:08:57 +03:00
Heikki Linnakangas
0886aced86 Update dependencies.
- Updated dependencies with "cargo update"
- Updated workspace_hack with "cargo hakari generate"

There's no particular reason to do this now, just a periodic refresh.
2022-07-14 22:13:51 +03:00
Heikki Linnakangas
a342957aee Use ok_or_else() instead of ok_or(), to silence clippy warnings.
"cargo clippy" started to complain about these, after running "cargo
update". Not sure why it didn't complain before, but seems reasonable to
fix these. (The "cargo update" is not included in this commit)
2022-07-14 22:13:51 +03:00
Heikki Linnakangas
79f5685d00 Enable basic optimizations even in 'dev' builds.
Change the build options to enable basic optimizations even in debug
mode, and always build dependencies with more optimizations. That
makes the debug-mode binaries somewhat faster, without messing up
stack traces and line-by-line debugging too much.
2022-07-14 20:46:35 +03:00
Egor Suvorov
c004a6d62f Do not cancel in-progress checks on the main branch
See https://docs.github.com/en/actions/using-workflows/workflow-syntax-for-github-actions#concurrency

* Previously there was a single concurrency group per each branch.
  As the `main` branch got pushed into frequently, very few commits got
  tested to the end. It resulted in "broken" `main` branch as there were
  no fully successful workflow runs.
  Now the `main` branch gets a separate concurrency group for each commit.
* As GitHub Actions syntax does not have the conditional operator, it is
  emulated via logical and/or operations. Although undocumented, they
  return one of their operands instead of plain true/false.
* Replace 3-space indentation with 2-space indentation while we are here
  to be consistent with the rest of the file.
2022-07-14 17:20:00 +03:00
Egor Suvorov
1b6a80a38f Fix flaky test_concurrent_computes
* Wait for all computes (except one) to complete before proceeding with
  the single compute.
* It previously waited for too few seconds. As the test is randomized, it was
  not failing all the time, but only in specific unlucky cases.
  E.g. when there were no successfuly queries by concurrent computes,
  and the single node had big timeouts and spent lots of time making the
  transaction.
  See https://github.com/neondatabase/neon/runs/7234456482?check_suite_focus=true
  (around line 980).
* Wait for exactly one extra transaction by the single compute.
2022-07-14 16:23:39 +03:00
Alexey Kondratov
12bac9c12b Wait for compute image before deploy in GitHub Action
We need both storage **and** compute images for deploy, because control plane
picks the compute version based on the storage version. If it notices a fresh
storage it may bump the compute version. And if compute image failed to build
it may break things badly.
2022-07-14 11:27:16 +02:00
Kirill Bulatov
9a7427c203 Fill build-args for Docker builds via GH Actions context 2022-07-14 10:28:15 +03:00
Arthur Petukhovsky
968c20ca5f Add zenith-1-ps-3 to prod inventory (#2084) 2022-07-13 21:22:44 +03:00
Alexey Kondratov
f8a64512df [compute_tools] Set public schema owner to db owner (#2058)
Otherwise, it does not have a control on it, which is reasonable thing
to have and some users already hit it.
2022-07-13 15:38:22 +02:00
Alexander Bayandin
07acd6ddde Fix clippy warnings in postgres_ffi/build.rs (#2081) 2022-07-13 14:12:11 +01:00
Sergey Melnikov
2b21d7b5bc Migrate from CircleCI to Github Actions: docker build and deploy (#1986) 2022-07-13 12:51:20 +03:00