Checkout
 

kotkov in subversion

Make the dump stream parser more resilient to malformed dump streams that

do not contain \n characters at all.

Previously, we'd attempt to load the whole input into memory due to how

svn_stream_readline() is currently implemented. Doing so could potentially

choke for large files. The corresponding real-world case is where a user

(accidentally) attempts to load a huge binary file that does not contain \n

characters as the repository dump.

This is the potential cause of the OOM reported in

https://lists.apache.org/thread.html/c96eb5618ac0bf6e083345e0fdcdcf834e30913f26eabe6ada7bab62@%3Cusers.subversion.apache.org%3E

* subversion/libsvn_repos/load.c

(parse_format_version): Read the dump version string directly from

stream, with an upper limit of 80 bytes. Comment on why we don't use

svn_stream_readline() for this particular case.

(svn_repos_parse_dumpstream3): Update the call to parse_format_version().

Fix an issue with the readline implementation for file streams that could

cause excessive memory usage for inputs containing one or multiple \0 bytes.

This is the likely cause of the OOM reported in

https://lists.apache.org/thread.html/c96eb5618ac0bf6e083345e0fdcdcf834e30913f26eabe6ada7bab62@%3Cusers.subversion.apache.org%3E

(I think that the problem itself is a regression introduced by me in 1.10.)

Note: one thing I noticed while preparing the fix is that our `readline_fn`

functions for different streams have inconsistent behavior if the input data

contains \0 bytes. More specifically, they may return different `line` values,

that may either be truncated at \0 or actually contain the whole data between

EOLs, including \0 bytes. For now, this patch only fixes the excessive memory

usage problem, and I noted this related problem in the test and left it for

future work.

* subversion/libsvn_subr/stream.c

(readline_apr_lf, readline_apr_generic): Reallocate the buffer based on its

current size, instead of calculating the new size based on the already

prealloc'd size. There are no actual benefits in reallocating based on

`blocksize`, and in the described case with \0 bytes doing so also backfires

and may cause excessive allocations due to the actual size of the string

being less than we expect it to. A degenerate case of the erroneous

behavior is ...

* subversion/tests/libsvn_subr/stream-test.c

(test_stream_readline_file_nul): ...exploited in this new test.

(test_funcs): Run new test.

* subversion/tests/libsvn_subr

(): Adjust svn:ignore.

When compiling SQLite, enable the SQLITE_OMIT_WAL compile-time option.

We don't use WAL (write-ahead logging) feature of SQLite, but just keeping it

enabled has a visible I/O performance penalty, because SQLite has to check if

the write-ahead log file is present on disk. In a couple of my experiments,

disabling this feature resulted in a ~10% faster `svn st` on a large working

copy.

* subversion/libsvn_subr/sqlite3wrapper.c

(): Define SQLITE_OMIT_WAL.

When compiling SQLite, set the SQLITE_DEFAULT_MEMSTATUS=0 compile-time option.

This is the recommended option that is not enabled by default. Setting it to

zero avoids using a mutex (and thus suffering a performance penalty) during

every sqlite3_malloc() call, where this mutex is used to properly update the

allocations stats. We don't use sqlite3_status(), so set this option to zero.

See https://sqlite.org/compile.html#recommended_compile_time_options

* subversion/libsvn_subr/sqlite3wrapper.c

(): Define SQLITE_DEFAULT_MEMSTATUS=0.

Win32: fix an incorrect error status being propagated to the caller in case

where we fail to stat the destination path while being in the middle of a

failed rename.

So, if we fail to stat the destination in the middle of such rename, propagate

the *original* error. Because overriding the original error with the new one

loses information about the actual cause of failure and may even confuse some

of the callers, who for instance attempt to recover in case of an EACCES,

since they may not be receiving that error at all.

(This is the behavior we had for a long time, even before r1865518, but now

seems to be an appropriate moment to fix that)

* subversion/libsvn_subr/io.c

(win32_file_rename): Use a separate `stat_err` for the case of the failed

GetFileAttributes() call. If we failed to stat the file, return the

original `err` status to the caller.

* subversion/libsvn_subr/io.c

(win32_file_rename): Rename `status` to `err`. This lays the groundwork for

fixing the incorrect error status being propagated to the caller in case

where we fail to stat the destination path while being in the middle of

a failed rename.

Win32: prevent svn_io_file_rename2() from spinning in the retry loop in a

racy case where the rename function clears the read-only attribute from the

destination file, but another thread or process is quick enough to set it

back before the original rename function has a chance to proceed.

Despite sounding like an uncommon case, this may happen when the API is

being used to atomically update the file contents while also altering its

read-only attribute (and where such updates can occur in parallel).

Fix this by including the call that clears the read-only attribute on the

destination into the retry loop instead of calling it only once before

looping.

* subversion/libsvn_subr/io.c

(win32_file_rename): Handle EACCES/EEXIST and attempt to clear the

destination read-only attribute if it's there.

(svn_io_file_rename2): Don't call svn_io_set_file_read_write() in the

Win32-specific portion of this function, since clearing the read-only

attribute is now a part of the win32_file_rename() function.

Win32: tweak the SSL certificate validation override to avoid hitting the wire

for URL based objects and revocation checks.

The primary purpose of this callback is to resolve SVN_AUTH_SSL_UNKNOWNCA

failures using CryptoAPI and Windows local certificate stores. To do so, we

should be fine with just using the immediately available data on the local

machine.

Doing the opposite of that appears to be troublesome, as always connecting

to remote CRL and OCSP endpoints can result in spurious errors or significant

(user-reported) network stalls caused by timeouts if the endpoints are

inaccessible or unreliable.

The new approach should also be in par with the default basic behavior of

several major browsers, for example:

https://chromium.googlesource.com/chromium/src/net/+/3d1dad1c17ae3ff59e7c35841af94b66f4bca1ba/cert/cert_verify_proc_win.cc#919

* subversion/libsvn_subr/win32_crypto.c

(windows_validate_certificate): Use the CERT_CHAIN_CACHE_ONLY_URL_RETRIEVAL

and CERT_CHAIN_REVOCATION_CHECK_CACHE_ONLY flags when preparing the

certificate chain. Ignore errors in obtaining valid revocation information

when verifying the chain, as they could be induced by the new cache-only

behavior.

* STATUS: Propose r1859732, r1859734 for backport to 1.12.x (

io: Implement the functions that check for node existence using

the native Win32 APIs on Windows).

* subversion/libsvn_subr/io.c

(io_win_check_path): Tweak comment, no functional changes.

io: Implement the functions that check for node existence using the native

Win32 APIs on Windows:

svn_io_check_path()

svn_io_check_resolved_path()

svn_io_check_special_path()

This changeset aims for two distinct things:

1) First of all, starting from r1833621, these functions were patched to stop

checking symlinks on Windows. The root cause for this change has been an

incorrect implementation of stat in APR that didn't properly distinguish

between various types of reparse points, some of which are not symlinks.

Providing the custom implementation allows us to remove the hack and

properly handle such reparse points irrespectively of the APR version

in use.

Additional details on the core of the issue can be found here:

- https://github.com/golang/go/issues/23684

- https://github.com/dotnet/corefx/issues/24250

2) These APIs are used in various performance-critical code paths such as

in the core part of `svn st`.

On Win32 the necessary answers for these particular functions can be

obtained with a single call to GetFileAttributes(), which is much faster

than using the generic stat implementations from APR 1.6.x and 1.7.x

(I believe that it would be even slower in the latter case).

A couple of quick tests show about 20%-25% improvement in the speed of

`svn st` for a large working copy. The improvement may be more significant

with indexers or virus scanners, as just asking for file attributes may

completely avoid opening a file (by retrieving the result through the

fast I/O QueryOpen).

* subversion/libsvn_subr/io.c

(FILE_ATTRIBUTE_TAG_INFO, FileAttributeTagInfo): Add these definitions

for old versions of Windows SDK.

(typedef GetFileInformationByHandleEx_t,

get_file_information_by_handle_ex_proc): New.

(win_init_dynamic_imports): Import `GetFileInformationByHandleEx()`.

(win32_get_file_information_by_handle): New helper function.

(io_win_check_path): New helper with the Win32 implementation required

for the "check path" functions.

(svn_io_check_path,

svn_io_check_resolved_path,

svn_io_check_special_path): Invoke the new helper.

(io_check_path): Undo the workaround from r1833621 that stopped passing

APR_FINFO_LINK when performing a stat.

* publish/download.html

(#pre-releases): Fix broken markup that was causing comments to

be rendered on the page.

Reimplement fsfs private operations required by `svnfsfs` (stats, dump index,

load index) as "ioctls".

Technically we achieve this by introducing the new svn_fs_ioctl() API that

adds a generic way of performing backend-specific I/O operations.

This change serves two purposes:

- It allows us to properly expose FS-specific details and invoke FS-specific

operations everywhere without necessarily promoting them into a proper

public API (the ioctl code itself may be made either public or private,

depending on the requirements).

- It solves a potential dependency/linking problem where tools like `svnfsfs`

work through the libsvn_fs's loader, but also have to load and call private

APIs from libsvn_fs_fs thus ignoring the loader. The latter part may

potentially cause issues with the global shared state, etc. With the

patch, all such operations always go through the FS loader.

* subversion/include/svn_fs.h

(svn_fs_ioctl, SVN_FS_DECLARE_IOCTL_CODE, svn_fs_ioctl_code_t): New.

* subversion/include/svn_error_codes.h

(SVN_ERR_FS_UNRECOGNIZED_IOCTL_CODE): New error code.

* subversion/include/private/svn_fs_fs_private.h

(svn_fs_fs__get_stats, svn_fs_fs__dump_index, svn_fs_fs__load_index):

These functions are now implemented as...

(SVN_FS_FS__IOCTL_GET_STATS, SVN_FS_FS__IOCTL_DUMP_INDEX,

SVN_FS_FS__IOCTL_LOAD_INDEX): ...these new ioctls, which ...

(svn_fs_fs__ioctl_get_stats_input_t, svn_fs_fs__ioctl_get_stats_output_t,

svn_fs_fs__ioctl_dump_index_input_t, svn_fs_fs__ioctl_load_index_input_t):

...use these new structures.

* subversion/libsvn_fs/fs-loader.h

(fs_library_vtable_t.ioctl, fs_vtable_t.ioctl): New vtable members.

* subversion/libsvn_fs/fs-loader.c

(svn_fs_ioctl): Implement the new API by forwarding it to an appropriate

vtable member.

* subversion/libsvn_fs_fs/fs_fs.h

(svn_fs_fs__get_stats, svn_fs_fs__dump_index, svn_fs_fs__load_index):

These functions are now declared here.

* subversion/libsvn_fs_fs/fs.c

(): Include `svn_fs_fs_private.h`.

(fs_ioctl): Implement the ioctl dispatcher for three current fsfs-specific

operations.

(fs_vtable): Initialize the `ioctl` field.

(library_vtable): Initialize the `ioctl` field to NULL.

* subversion/libsvn_fs_fs/dump-index.c,

subversion/libsvn_fs_fs/load-index.c,

subversion/libsvn_fs_fs/stats.c

(): Tweak includes.

* subversion/libsvn_fs_base/fs.c

(library_vtable, fs_vtable): Initialize the `ioctl` field to NULL.

* subversion/libsvn_fs_x/fs.c

(library_vtable, fs_vtable): Initialize the `ioctl` field to NULL.

* subversion/svnfsfs/dump-index-cmd.c

(dump_index): Invoke an appropriate svn_fs_ioctl().

* subversion/svnfsfs/load-index-cmd.c

(load_index): Invoke an appropriate svn_fs_ioctl().

* subversion/svnfsfs/stats-cmd.c

(subcommand__stats): Invoke an appropriate svn_fs_ioctl().

* subversion/tests/libsvn_fs/fs-test.c

(test_unrecognized_ioctl): New test.

(test_funcs): Run the new test.

* subversion/tests/libsvn_fs_fs/fs-fs-private-test.c

(get_repo_stats, dump_index, load_index): Switch to svn_fs_ioctl().

* build.conf

(svnfsfs, fs-fs-private-test): Don't link to libsvn_fs_fs.

  1. … 3 more files in changeset.
Following up on r1847572, trace the errors constructed with the new helper.

* subversion/libsvn_fs_fs/tree.c

(open_path): Wrap the err_not_directory() return values with

svn_error_trace().

Suggested by: brane

fsfs: Fix SVN-4791, an issue with the DAG open_path() that was causing

unexpected SVN_ERR_FS_NOT_DIRECTORY errors when attempting to open a path

with `open_path_node_only | open_path_allow_null` flags.

The implication of this is that certain svn_fs_closest_copy() calls could be

returing unexpected errors as well. For the end user, this means that such

errors were possible, for example, during certain `svn update`s.

The root cause of this behavior is the implementation of the optimization

within the open_path() routine. The optimization attempts to do a cache

lookup of the prospective parent directory of the path to be opened.

If the cache lookup is successful, the parent node is assumed — but not

checked — to be a directory. The absense of the check was causing

unexpected errors instead of returning `NULL` in a case where:

- open_path() is called with `open_path_node_only | open_path_allow_null`

- the path to be opened points to a non-existing path, but its parent path

is a file

- the DAG node of the "parent" file is cached

See https://lists.apache.org/thread.html/693a95b0da834387e78a7f08df2392b634397d32f37428c81c02f8c5@%3Cdev.subversion.apache.org%3E

* subversion/libsvn_fs_fs/tree.c

(err_not_directory): New helper function factored out from...

(open_path): ...here. Check the kind of the DAG node for the prospective

parent returned from cache in the `open_path_node_only` optimization.

Handle the case where it is not a directory; utilize the new helper to

construct the appropriate error.

* subversion/tests/libsvn_fs/fs-test.c

(test_closest_copy_file_replaced_with_dir): New regression test.

(test_funcs): Run new test.

Following up on r1836306, provide an alternative fix for allowing things

like "svn commit -F <(echo foo)" that doesn't require a more expensive

apr_file_info_get(APR_FINFO_SIZE | APR_FINFO_TYPE) syscall in the

low-level and commonly used function.

Instead of additionally blacklisting APR_PIPE handles, rework the code to

work under the conditions where `finfo.size` may be wrong. That is, try to

read one more byte than necessary, expect to see an EOF, but if we don't

(e.g., for pipes), fall through to reading the remaining part of the file

chunk-by-chunk.

This approach should work correctly under all possible cases where `finfo.

size` is incorrect, and also allows to just ask for APR_FINFO_SIZE, which

is generally cheaper. At least on Windows, this uses GetFileSizeEx() and

results in a single syscall.

* subversion/libsvn_subr/io.c

(stringbuf_from_aprfile): Replace the additional blacklist condition for

APR_PIPEs with the approach described above.

Add failing test for issue #4739, "Accept incoming deletion" option doing

nothing for a locally deleted file in the tree conflict resolver.

* subversion/tests/libsvn_client/conflicts-test.c

(test_update_incoming_delete_locally_deleted_file): New.

(test_funcs): Run new test.

* publish/docs/release-notes/1.10.html:

Merge the outstanding changes from 'staging'. The shortlog of the merged

changes is:

r1825983: Tweak the "FSFS f8 is new" bits in the intro.

r1825981: (#lz4-over-the-wire): Document current behaviour over ra_svn.

r1825980: Fix syntax error.

r1825871: (LZ4 compression over the wire in http:// and svn:// connections):

Try a more compact variant of the table that doesn't have the "compression"

word in every cell.

* publish/docs/release-notes/1.10.html

(Client-Server Protocol Changes): Comment this (currently) incomplete

section out, to avoid showing a work-in-progress documentation to the

users.

* publish/docs/release-notes/1.10.html

(Compatibility Concerns, LZ4 compression): Remove pre-release compatibility

warnings, given that Subversion 1.10 is now GA.

* STATUS: Nominate r1826747.

Fix an issue in the svn_txdelta_to_svndiff_stream() API that could cause

unexpected short reads (EOFs) on the stream.

This API is used when performing PUT requests to the server. Consequently,

the bug could result in truncated payload being sent to the server and

failing commits over http://.

* subversion/libsvn_delta/svndiff.c

(svndiff_stream_read_fn): Handle a case where we have received the final

window from the txdelta stream, but the remaining part of the buffer

cannot fully accommodate it during this call to read_fn. Instead of

exiting and triggering an unexpected short read, allow the remaining

part to be read during subsequent calls to read_fn.

* subversion/tests/libsvn_delta

(): Add 'svndiff-stream-test' to svn:ignore.

* subversion/tests/libsvn_delta/svndiff-stream-test.c: New file with a

regression test for this issue.

* build.conf

(svndiff-stream-test): New.

(__ALL_TESTS__): Run svndiff-stream-test.

* docs/release-notes/1.10.html

(LZ4 compression over the wire in http:// and svn:// connections):

Try a more compact variant of the table that doesn't have the "compression"

word in every cell. In other words, turn "No compression", "LZ4 compression"

and "zlib compression" into "None", "LZ4" and "zlib" respectively.

* docs/release-notes/1.10.html

(Configuring the repository to use LZ4 compression): Fix section link.

Extend the 1.10 release notes with a subsection about using LZ4 compression

over the wire. For now, only document the negotiation and choice of the

compression algorithm for http://, and leave a TODO about svn://.

* docs/release-notes/1.10.html

(LZ4 compression): Extend the opener with the information that LZ4

compression is supported for both http:// and svn://.

(LZ4 compression over the wire in http:// and svn:// connections):

New subsection.

(New Feature Compatibility Table): Link to the new subsection.

* site/staging/docs/release-notes/1.10.html

(normalize-props): Add section describing the new `--normalize-props`

option for `svnadmin load`.

Following up on r1816967, attempt to fix the Python 3 buildbot failure.

See https://ci.apache.org/builders/svn-x64-macosx-local-python3/builds/822

* subversion/tests/cmdline/svntest/verify.py

(): Only import the `svntest` module.

(compare_dump_files): Access the fs_has_sha1() check function as

`svntest.main.fs_has_sha1()`.

Use https:// links when downloading or suggesting to download SQLite

amalgamation.

* configure.ac

(SQLITE_URL): Construct https:// link.

* get-deps.sh

(get_sqlite): Use https:// link.

fsfs: Use the `WITHOUT ROWID` optimization for rep-cache.db in format 8.

This optimization, introduced in SQLite 3.8.2, works well for tables that

have non-integer primary keys, such as

hash TEXT NOT NULL PRIMARY KEY

in the rep-cache.db. (See the https://sqlite.org/withoutrowid.html article

for additional details.)

A quick experiment showed a reduction of the on-disk size of the database

by ~1.75x. The lookups should also be faster, both due to the reduced

database size and due to the lesser amount of internal bsearches. This

should improve the times of new commits and `svnadmin load`, especially

for large repositories that also have large rep-cache.db files.

In order to maintain compatibility, since SQLite versions prior to 3.8.2

do not support this statement, we only start using it for fsfs format 8

repositories and simultaneously bump the minimal required SQLite version

from 3.7.12 (May 2012) to 3.8.2 (December 2013). The last step ensures that

all binaries compiled to support format 8 can work with the tables with

this optimization. Also, as the various scripts have both the minimal

and recommended (3.7.15.1) SQLite versions, we bump the recommended

version to the last 3.8.x patch version, which is 3.8.11.1.

* subversion/libsvn_fs_fs/rep-cache-db.sql

(STMT_CREATE_SCHEMA): Rename this ...

(STMT_CREATE_SCHEMA_V1): ...to this.

(STMT_CREATE_SCHEMA_V2): New, enables `WITHOUT ROWID` optimization.

(STMT_GET_REP, STMT_SET_REP, STMT_GET_REPS_FOR_RANGE,

STMT_GET_MAX_REV, STMT_DEL_REPS_YOUNGER_THAN_REV,

STMT_LOCK_REP, STMT_UNLOCK_REP):

Note that these statements work for both V1 and V2 schemas.

* subversion/libsvn_fs_fs/fs.h

(SVN_FS_FS__MIN_REP_CACHE_SCHEMA_V2_FORMAT): New.

* subversion/libsvn_fs_fs/rep-cache.c

(REP_CACHE_SCHEMA_FORMAT): Remove.

(open_rep_cache): Select between creating a V1 or V2 schemas based

on the format of the filesystem.

* subversion/libsvn_subr/sqlite.c

(): Bump minimum required SQLite version to 3.8.2.

* subversion/tests/cmdline/svnadmin_tests.py

(check_hotcopy_fsfs_fsx): Check if the Python's built-in SQLite version

is enough to interpret the schema of rep-cache.db, and skip the check

if it's not.

* build/generator/gen_win_dependencies.py

(_find_sqlite): Bump minimum required SQLite version to 3.8.2.

* configure.ac

(SQLITE_MINIMUM_VER): Bump to 3.8.2.

(SQLITE_RECOMMENDED_VER): Bump to 3.8.11.1.

(SQLITE_RECOMMENDED_VER_REL_YEAR): New, required to construct the

download URL which includes the release year for the newer SQLite

amalgamation versions.

(SQLITE_URL): Update the download URL.

* get-deps.sh

(SQLITE_VERSION): Bump to 3.8.11.1.

(SQLITE_VERSION_REL_YEAR): New.

(get_sqlite): Update the download URL that includes the release year

for the newer SQLite amalgamation versions.

* INSTALL

(C.12.SQLite): Bump minimum required SQLite version to 3.8.2.

(E.1.Prerequisites): Bump the minimum and recommended SQLite versions.

* subversion/tests/cmdline/svntest/__init__.py

(): Remove a fallback import of the external sqlite module. It was

required for compatibility with Python 2.4, where sqlite3 was only

available as an external module. Python 2.5+ includes a standard

sqlite3 module and the minimal required Python version is 2.7.