Patrick Desjardins Blog
Patrick Desjardins picture from a conference

Running Local LLM on a Nvidia 5080 for Coding

Posted on: 2026-02-09

Goal

Few weekends ago, I wanted to see what would be the best model of coding that I could run locally on my Intel 9 with 32 gig of ram and my Nvidia 5080 with 8gig of vram. The idea was to run Open Code to QWEN 3.0 to have a local Claude. The result was impressive in term of how fast the response occured but the final product wasn't the quality of Claude.

How to do it?

I started with QWEN 2.5 and the result was horrible in term of connection with tools like simply reading and writing files. It was fine with general question but that wasn't the goal.

Moving to Qwen3-8B-AWQ did the job but with some tweaking, otherwise it was crashing or returning response too slow.

Installation:

sh
nvidia-smi
pip install --upgrade pip
pip install torch torchvision torchaudio \
  --index-url https://download.pytorch.org/whl/cu121
pip install vllm
pip install autoawq
pip install huggingface_hub
huggingface-cli login

Everytime to use:

cd ~/llm/llm
pyenv activate vllm

unset VLLM_ATTENTION_BACKEND
unset VLLM_USE_FLASHINFER_SAMPLER

pkill -f vllm

vllm serve \
  Qwen/Qwen3-8B-AWQ \
  --quantization awq \
  --dtype float16 \
  --gpu-memory-utilization 0.9 \
  --max-model-len 32768 \
  --port 7555 \
  --api-key "opencode_local" \
  --enable-auto-tool-choice \
  --tool-call-parser hermes \
  --reasoning-parser qwen3

Loading everything takes less than 30 seconds:

(EngineCore_DP0 pid=36318) INFO 02-03 19:53:24 [default_loader.py:291] Loading weights took 4.14 seconds
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:24 [gpu_model_runner.py:3905] Model loading took 5.71 GiB memory and 5.326229 seconds
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:30 [backends.py:644] Using cache directory: /home/miste/.cache/vllm/torch_compile_cache/252055f4c9/rank_0_0/backbone for vLLM's torch.compile
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:30 [backends.py:704] Dynamo bytecode transform time: 5.01 s
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:35 [backends.py:226] Directly load the compiled graph(s) for compile range (1, 2048) from the cache, took 1.098 s
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:35 [monitor.py:34] torch.compile takes 6.11 s in total
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:36 [gpu_worker.py:358] Available KV cache memory: 7.19 GiB
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:36 [kv_cache_utils.py:1305] GPU KV cache size: 52,368 tokens
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:36 [kv_cache_utils.py:1310] Maximum concurrency for 32,768 tokens per request: 1.60x
(EngineCore_DP0 pid=36318) 2026-02-03 19:53:36,340 - INFO - autotuner.py:256 - flashinfer.jit: [Autotuner]: Autotuning process starts ...
(EngineCore_DP0 pid=36318) 2026-02-03 19:53:36,363 - INFO - autotuner.py:262 - flashinfer.jit: [Autotuner]: Autotuning process ends
Capturing CUDA graphs (mixed prefill-decode, PIECEWISE): 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 51/51 [00:07<00:00,  6.51it/s]
Capturing CUDA graphs (decode, FULL): 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 35/35 [00:05<00:00,  6.77it/s]
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:49 [gpu_model_runner.py:4856] Graph capturing finished in 13 secs, took 0.00 GiB
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:49 [core.py:273] init engine (profile, create kv cache, warmup model) took 25.12 seconds
(EngineCore_DP0 pid=36318) INFO 02-03 19:53:51 [vllm.py:630] Asynchronous scheduling is enabled.
(APIServer pid=36148) INFO 02-03 19:53:51 [api_server.py:1014] Supported tasks: ['generate']
(APIServer pid=36148) WARNING 02-03 19:53:51 [model.py:1358] Default sampling parameters have been overridden by the model's Hugging Face generation config recommended from the model creator. If this is not intended, please relaunch vLLM instance with `--generation-config vllm`.
(APIServer pid=36148) INFO 02-03 19:53:51 [serving_responses.py:224] Using default chat sampling params from model: {'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
(APIServer pid=36148) INFO 02-03 19:53:51 [serving_engine.py:271] "auto" tool choice has been enabled.
(APIServer pid=36148) INFO 02-03 19:53:51 [serving_engine.py:271] "auto" tool choice has been enabled.
(APIServer pid=36148) INFO 02-03 19:53:51 [serving_chat.py:146] Using default chat sampling params from model: {'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
(APIServer pid=36148) INFO 02-03 19:53:51 [serving_chat.py:182] Warming up chat template processing...
(APIServer pid=36148) INFO 02-03 19:53:52 [chat_utils.py:599] Detected the chat template content format to be 'string'. You can set `--chat-template-content-format` to override this.
(APIServer pid=36148) INFO 02-03 19:53:52 [serving_chat.py:218] Chat template warmup completed in 1254.3ms
(APIServer pid=36148) INFO 02-03 19:53:53 [serving_completion.py:78] Using default completion sampling params from model: {'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
(APIServer pid=36148) INFO 02-03 19:53:53 [serving_engine.py:271] "auto" tool choice has been enabled.
(APIServer pid=36148) INFO 02-03 19:53:53 [serving_chat.py:146] Using default chat sampling params from model: {'temperature': 0.6, 'top_k': 20, 'top_p': 0.95}
(APIServer pid=36148) INFO 02-03 19:53:53 [api_server.py:1346] Starting vLLM API server 0 on http://0.0.0.0:7555

Configuring OpenCode:

nano ~/.config/opencode/config.json

By copying:

{
  "$schema": "https://opencode.ai/config.json",
  "provider": {
    "vllm_local": {
      "npm": "@ai-sdk/openai-compatible",
      "name": "vLLM (local)",
      "options": {
        "baseURL": "http://127.0.0.1:7555/v1",
        "apiKey": "opencode_local"
      },
      "models": {
        "Qwen/Qwen3-8B-AWQ": {
          "name": "Qwen 3.0 (8B Agent)",
          "tools": {
            "write": true,
            "bash": true,
            "read": true,
            "edit": true
          },
          "limit": {
            "context": 32768,
            "output": 4096
          },
          "supportsToolCalling": true,
          "supportsReasoning": true
        }
      }
    }
  }
}

The OpenCode configuration required a lot of trial and error. Even ChatGPT could not find the right configuration. Moving to QWEN 3.0 and explicitly enabling tools helped move things forward.

Then, running OpenCode. Here is a command that asks how many unit tests are in the repository.

The full answer took 2 minutes and 22 seconds.

Here is the full transcript:


## User

How many tests in this repository?

---

## Assistant (Build · Qwen/Qwen3-8B-AWQ · 100.8s)

_Thinking:_


Okay, the user is asking how many tests are in the repository. Let me think about how to approach this.

First, I remember that the project is a FastAPI application, and the documentation mentions testing. In the CLAUDE.md file, there are sections about testing, including commands to run tests and requirements for test coverage. 

The user probably wants to know the total number of tests, maybe to check if they meet the coverage requirements mentioned (minimum 76%). The test commands listed include using `uv run pytest` with options for coverage and speed. 

I need to find a way to count the tests. The `pytest` command can give a summary when run with the `-v` flag, but maybe there's a better way. Alternatively, the `pytest` command with `--help` might show options related to counting tests. Wait, the `uv run pytest` command is mentioned, so maybe running `uv run pytest --help` could show options. But the user might not have pytest installed, but since the project uses `uv` (uvicorn), it's likely that pytest is installed as a dependency.

Wait, the user is asking for the number of tests, not to run them. So maybe the best approach is to check the test files. The project structure includes a `tests` directory, as mentioned in the CLAUDE.md under "Development Commands". The `tests/services` directory is mentioned, and there are specific test files like `test_database.py`.

But how to count them without running the tests? The user might not have the environment set up, so I can't run the tests directly. However, the `pytest` tool can list the tests when run with `--help` or `--collect-only`, but that might not be straightforward. Alternatively, the `pytest` command can be used with `--help` to see available options.

Wait, the user might be looking for the number of test files or test cases. The `pytest` command can count test cases using `--count` or other flags. Let me check the pytest documentation. Oh, right, `pytest` has a `--count` option to count the number of tests. But I need to confirm if that's available.

Alternatively, the `pytest` command can be run with `--help` to see available options. So, the steps would be:

1. Check if `pytest` is installed. Since the project uses `uv` and the `uv run pytest` command is mentioned, it's likely installed.
2. Run `uv run pytest --help` to see available options.
3. Use the `--count` option to get the number of tests.

But the user might not have the environment set up, so maybe the best approach is to suggest running the test commands provided in the documentation. However, the user is asking for the number, not to run them. 

Alternatively, the `pytest` command can be run with `--help` to see if there's an option to count tests. If not, maybe the `pytest` command can be used with `--collect-only` to list all test files, which can then be counted. 

But without running the tests, it's hard to get the exact number. The user might need to run the tests to get the count. However, the answer should be based on the information provided in the CLAUDE.md. The documentation mentions that the test suite is run with `uv run pytest`, and there are specific test files mentioned like `tests/services/test_database.py`. 

Wait, the user is asking for the number of tests, but the CLAUDE.md doesn't specify the exact count. So, the answer would be that the number of tests can be determined by running `uv run pytest --count` or similar command. However, without actual execution, I can't provide the exact number. 

But the user might be expecting a specific number based on the project structure. The `tests/services` directory has multiple test files, but the exact count isn't given. Therefore, the correct approach is to inform the user that they need to run the test command to get the count, or check the test files. 

Alternatively, if the user is in a Git repository, they could use `git ls-files tests/` to list test files and count them. But that's not reliable as some tests might be in subdirectories. 

In conclusion, the answer should guide the user to run the test command to get the count, as the exact number isn't provided in the documentation.



**Tool: bash**

**Input:**
json
{
  "command": "uv run pytest --help",
  "description": "Check pytest options for test counting"
}


**Output:**

usage: pytest [options] [file_or_dir] [file_or_dir] [...]

positional arguments:
  file_or_dir

general:
  -k EXPRESSION         Only run tests which match the given substring
                        expression. An expression is a Python evaluable
                        expression where all names are substring-matched against
                        test names and their parent classes. Example: -k
                        'test_method or test_other' matches all test functions
                        and classes whose name contains 'test_method' or
                        'test_other', while -k 'not test_method' matches those
                        that don't contain 'test_method' in their names. -k 'not
                        test_method and not test_other' will eliminate the
                        matches. Additionally keywords are matched to classes
                        and functions containing extra names in their
                        'extra_keyword_matches' set, as well as functions which
                        have names assigned directly to them. The matching is
                        case-insensitive.
  -m MARKEXPR           Only run tests matching given mark expression. For
                        example: -m 'mark1 and not mark2'.
  --markers             show markers (builtin, plugin and per-project ones).
  -x, --exitfirst       Exit instantly on first error or failed test
  --maxfail=num         Exit after first num failures or errors
  --strict-config       Enables the strict_config option
  --strict-markers      Enables the strict_markers option
  --strict              Enables the strict option
  --fixtures, --funcargs
                        Show available fixtures, sorted by plugin appearance
                        (fixtures with leading '_' are only shown with '-v')
  --fixtures-per-test   Show fixtures per test
  --pdb                 Start the interactive Python debugger on errors or
                        KeyboardInterrupt
  --pdbcls=modulename:classname
                        Specify a custom interactive Python debugger for use
                        with --pdb.For example:
                        --pdbcls=IPython.terminal.debugger:TerminalPdb
  --trace               Immediately break when running each test
  --capture=method      Per-test capturing method: one of fd|sys|no|tee-sys
  -s                    Shortcut for --capture=no
  --runxfail            Report the results of xfail tests as if they were not
                        marked
  --lf, --last-failed   Rerun only the tests that failed at the last run (or all
                        if none failed)
  --ff, --failed-first  Run all tests, but run the last failures first. This may
                        re-order tests and thus lead to repeated fixture
                        setup/teardown.
  --nf, --new-first     Run tests from new files first, then the rest of the
                        tests sorted by file mtime
  --cache-show=[CACHESHOW]
                        Show cache contents, don't perform collection or tests.
                        Optional argument: glob (default: '*').
  --cache-clear         Remove all cache contents at start of test run
  --lfnf={all,none}, --last-failed-no-failures={all,none}
                        With ``--lf``, determines whether to execute tests when
                        there are no previously (known) failures or when no
                        cached ``lastfailed`` data was found. ``all`` (the
                        default) runs the full test suite again. ``none`` just
                        emits a message about no known failures and exits
                        successfully.
  --sw, --stepwise      Exit on test failure and continue from last failing test
                        next time
  --sw-skip, --stepwise-skip
                        Ignore the first failing test but stop on the next
                        failing test. Implicitly enables --stepwise.
  --sw-reset, --stepwise-reset
                        Resets stepwise state, restarting the stepwise workflow.
                        Implicitly enables --stepwise.

Reporting:
  --durations=N         Show N slowest setup/test durations (N=0 for all)
  --durations-min=N     Minimal duration in seconds for inclusion in slowest
                        list. Default: 0.005 (or 0.0 if -vv is given).
  -v, --verbose         Increase verbosity
  --no-header           Disable header
  --no-summary          Disable summary
  --no-fold-skipped     Do not fold skipped tests in short summary.
  --force-short-summary
                        Force condensed summary output regardless of verbosity
                        level.
  -q, --quiet           Decrease verbosity
  --verbosity=VERBOSE   Set verbosity. Default: 0.
  -r chars              Show extra test summary info as specified by chars:
                        (f)ailed, (E)rror, (s)kipped, (x)failed, (X)passed,
                        (p)assed, (P)assed with output, (a)ll except passed
                        (p/P), or (A)ll. (w)arnings are enabled by default (see
                        --disable-warnings), 'N' can be used to reset the list.
                        (default: 'fE').
  --disable-warnings, --disable-pytest-warnings
                        Disable warnings summary
  -l, --showlocals      Show locals in tracebacks (disabled by default)
  --no-showlocals       Hide locals in tracebacks (negate --showlocals passed
                        through addopts)
  --tb=style            Traceback print mode (auto/long/short/line/native/no)
  --xfail-tb            Show tracebacks for xfail (as long as --tb != no)
  --show-capture={no,stdout,stderr,log,all}
                        Controls how captured stdout/stderr/log is shown on
                        failed tests. Default: all.
  --full-trace          Don't cut any tracebacks (default is to cut)
  --color=color         Color terminal output (yes/no/auto)
  --code-highlight={yes,no}
                        Whether code should be highlighted (only if --color is
                        also enabled). Default: yes.
  --pastebin=mode       Send failed|all info to bpaste.net pastebin service
  --junit-xml=path      Create junit-xml style report file at given path
  --junit-prefix=str    Prepend prefix to classnames in junit-xml output

pytest-warnings:
  -W PYTHONWARNINGS, --pythonwarnings=PYTHONWARNINGS
                        Set which warnings to report, see -W option of Python
                        itself

collection:
  --collect-only, --co  Only collect tests, don't execute them
  --pyargs              Try to interpret all arguments as Python packages
  --ignore=path         Ignore path during collection (multi-allowed)
  --ignore-glob=path    Ignore path pattern during collection (multi-allowed)
  --deselect=nodeid_prefix
                        Deselect item (via node id prefix) during collection
                        (multi-allowed)
  --confcutdir=dir      Only load conftest.py's relative to specified dir
  --noconftest          Don't load any conftest.py files
  --keep-duplicates     Keep duplicate tests
  --collect-in-virtualenv
                        Don't ignore tests in a local virtualenv directory
  --continue-on-collection-errors
                        Force test execution even if collection errors occur
  --import-mode={prepend,append,importlib}
                        Prepend/append to sys.path when importing test modules
                        and conftest files. Default: prepend.
  --doctest-modules     Run doctests in all .py modules
  --doctest-report={none,cdiff,ndiff,udiff,only_first_failure}
                        Choose another output format for diffs on doctest
                        failure
  --doctest-glob=pat    Doctests file matching pattern, default: test*.txt
  --doctest-ignore-import-errors
                        Ignore doctest collection errors
  --doctest-continue-on-failure
                        For a given doctest, continue to run after the first
                        failure

test session debugging and configuration:
  -c FILE, --config-file=FILE
                        Load configuration from `FILE` instead of trying to
                        locate one of the implicit configuration files.
  --rootdir=ROOTDIR     Define root directory for tests. Can be relative path:
                        'root_dir', './root_dir', 'root_dir/another_dir/';
                        absolute path: '/home/user/root_dir'; path with
                        variables: '$HOME/root_dir'.
  --basetemp=dir        Base temporary directory for this test run. (Warning:
                        this directory is removed if it exists.)
  -V, --version         Display pytest version and information about plugins.
                        When given twice, also display information about
                        plugins.
  -h, --help            Show help message and configuration info
  -p name               Early-load given plugin module name or entry point
                        (multi-allowed). To avoid loading of plugins, use the
                        `no:` prefix, e.g. `no:doctest`. See also --disable-
                        plugin-autoload.
  --disable-plugin-autoload
                        Disable plugin auto-loading through entry point
                        packaging metadata. Only plugins explicitly specified in
                        -p or env var PYTEST_PLUGINS will be loaded.
  --trace-config        Trace considerations of conftest.py files
  --debug=[DEBUG_FILE_NAME]
                        Store internal tracing debug information in this log
                        file. This file is opened with 'w' and truncated as a
                        result, care advised. Default: pytestdebug.log.
  -o OVERRIDE_INI, --override-ini=OVERRIDE_INI
                        Override configuration option with "option=value" style,
                        e.g. `-o strict_xfail=True -o cache_dir=cache`.
  --assert=MODE         Control assertion debugging tools.
                        'plain' performs no assertion debugging.
                        'rewrite' (the default) rewrites assert statements in
                        test modules on import to provide assert expression
                        information.
  --setup-only          Only setup fixtures, do not execute tests
  --setup-show          Show setup of fixtures while executing tests
  --setup-plan          Show what fixtures and tests would be executed but don't
                        execute anything

logging:
  --log-level=LEVEL     Level of messages to catch/display. Not set by default,
                        so it depends on the root/parent log handler's effective
                        level, where it is "WARNING" by default.
  --log-format=LOG_FORMAT
                        Log format used by the logging module
  --log-date-format=LOG_DATE_FORMAT
                        Log date format used by the logging module
  --log-cli-level=LOG_CLI_LEVEL
                        CLI logging level
  --log-cli-format=LOG_CLI_FORMAT
                        Log format used by the logging module
  --log-cli-date-format=LOG_CLI_DATE_FORMAT
                        Log date format used by the logging module
  --log-file=LOG_FILE   Path to a file when logging will be written to
  --log-file-mode={w,a}
                        Log file open mode
  --log-file-level=LOG_FILE_LEVEL
                        Log file logging level
  --log-file-format=LOG_FILE_FORMAT
                        Log format used by the logging module
  --log-file-date-format=LOG_FILE_DATE_FORMAT
                        Log date format used by the logging module
  --log-auto-indent=LOG_AUTO_INDENT
                        Auto-indent multiline messages passed to the logging
                        module. Accepts true|on, false|off or an integer.
  --log-disable=LOGGER_DISABLE
                        Disable a logger by name. Can be passed multiple times.

asyncio:
  --asyncio-mode=MODE   'auto' - for automatically handling all async functions
                        by the plugin
                        'strict' - for autoprocessing disabling (useful if
                        different async frameworks should be tested together,
                        e.g. both pytest-asyncio and pytest-trio are used in the
                        same project)
  --asyncio-debug       enable asyncio debug mode for the default event loop

coverage reporting with distributed testing support:
  --cov=[SOURCE]        Path or package name to measure during execution (multi-
                        allowed). Use --cov= to not do any source filtering and
                        record everything.
  --cov-reset           Reset cov sources accumulated in options so far.
  --cov-report=TYPE     Type of report to generate: term, term-missing,
                        annotate, html, xml, json, markdown, markdown-append,
                        lcov (multi-allowed). term, term-missing may be followed
                        by ":skip-covered". annotate, html, xml, json, markdown,
                        markdown-append and lcov may be followed by ":DEST"
                        where DEST specifies the output location. Use --cov-
                        report= to not generate any output.
  --cov-config=PATH     Config file for coverage. Default: .coveragerc
  --no-cov-on-fail      Do not report coverage if test run fails. Default: False
  --no-cov              Disable coverage report completely (useful for
                        debuggers). Default: False
  --cov-fail-under=MIN  Fail if the total coverage is less than MIN.
  --cov-append          Do not delete coverage but append to current. Default:
                        False
  --cov-branch          Enable branch coverage.
  --cov-precision=COV_PRECISION
                        Override the reporting precision.
  --cov-context=CONTEXT
                        Dynamic contexts to use. "test" for now.

[pytest] configuration options in the first pytest.toml|pytest.ini|tox.ini|setup.cfg|pyproject.toml file found:

  markers (linelist):   Register new markers for test functions
  empty_parameter_set_mark (string):
                        Default marker for empty parametersets
  strict_config (bool): Any warnings encountered while parsing the `pytest`
                        section of the configuration file raise errors
  strict_markers (bool):
                        Markers not registered in the `markers` section of the
                        configuration file raise errors
  strict (bool):        Enables all strictness options, currently:
                        strict_config, strict_markers, strict_xfail,
                        strict_parametrization_ids
  filterwarnings (linelist):
                        Each line specifies a pattern for
                        warnings.filterwarnings. Processed after
                        -W/--pythonwarnings.
  norecursedirs (args): Directory patterns to avoid for recursion
  testpaths (args):     Directories to search for tests when no files or
                        directories are given on the command line
  collect_imported_tests (bool):
                        Whether to collect tests in imported modules outside
                        `testpaths`
  consider_namespace_packages (bool):
                        Consider namespace packages when resolving module names
                        during import
  usefixtures (args):   List of default fixtures to be used with this project
  python_files (args):  Glob-style file patterns for Python test module
                        discovery
  python_classes (args):
                        Prefixes or glob names for Python test class discovery
  python_functions (args):
                        Prefixes or glob names for Python test function and
                        method discovery
  disable_test_id_escaping_and_forfeit_all_rights_to_community_support (bool):
                        Disable string escape non-ASCII characters, might cause
                        unwanted side effects(use at your own risk)
  strict_parametrization_ids (bool):
                        Emit an error if non-unique parameter set IDs are
                        detected
  console_output_style (string):
                        Console output: "classic", or with additional progress
                        information ("progress" (percentage) | "count" |
                        "progress-even-when-capture-no" (forces progress even
                        when capture=no)
  verbosity_test_cases (string):
                        Specify a verbosity level for test case execution,
                        overriding the main level. Higher levels will provide
                        more detailed information about each test case executed.
  strict_xfail (bool):  Default for the strict parameter of xfail markers when
                        not given explicitly (default: False) (alias:
                        xfail_strict)
  tmp_path_retention_count (string):
                        How many sessions should we keep the `tmp_path`
                        directories, according to `tmp_path_retention_policy`.
  tmp_path_retention_policy (string):
                        Controls which directories created by the `tmp_path`
                        fixture are kept around, based on test outcome.
                        (all/failed/none)
  enable_assertion_pass_hook (bool):
                        Enables the pytest_assertion_pass hook. Make sure to
                        delete any previously generated pyc cache files.
  truncation_limit_lines (string):
                        Set threshold of LINES after which truncation will take
                        effect
  truncation_limit_chars (string):
                        Set threshold of CHARS after which truncation will take
                        effect
  verbosity_assertions (string):
                        Specify a verbosity level for assertions, overriding the
                        main level. Higher levels will provide more detailed
                        explanation when an assertion fails.
  junit_suite_name (string):
                        Test suite name for JUnit report
  junit_logging (string):
                        Write captured log messages to JUnit report: one of
                        no|log|system-out|system-err|out-err|all
  junit_log_passing_tests (bool):
                        Capture log information for passing tests to JUnit
                        report:
  junit_duration_report (string):
                        Duration time to report: one of total|call
  junit_family (string):
                        Emit XML for schema: one of legacy|xunit1|xunit2
  doctest_optionflags (args):
                        Option flags for doctests
  doctest_encoding (string):
                        Encoding used for doctest files
  cache_dir (string):   Cache directory path
  log_level (string):   Default value for --log-level
  log_format (string):  Default value for --log-format
  log_date_format (string):
                        Default value for --log-date-format
  log_cli (bool):       Enable log display during test run (also known as "live
                        logging")
  log_cli_level (string):
                        Default value for --log-cli-level
  log_cli_format (string):
                        Default value for --log-cli-format
  log_cli_date_format (string):
                        Default value for --log-cli-date-format
  log_file (string):    Default value for --log-file
  log_file_mode (string):
                        Default value for --log-file-mode
  log_file_level (string):
                        Default value for --log-file-level
  log_file_format (string):
                        Default value for --log-file-format
  log_file_date_format (string):
                        Default value for --log-file-date-format
  log_auto_indent (string):
                        Default value for --log-auto-indent
  faulthandler_timeout (string):
                        Dump the traceback of all threads if a test takes more
                        than TIMEOUT seconds to finish
  faulthandler_exit_on_timeout (bool):
                        Exit the test process if a test takes more than
                        faulthandler_timeout seconds to finish
  verbosity_subtests (string):
                        Specify verbosity level for subtests. Higher levels will
                        generate output for passed subtests. Failed subtests are
                        always reported.
  addopts (args):       Extra command line options
  minversion (string):  Minimally required pytest version
  pythonpath (paths):   Add paths to sys.path
  required_plugins (args):
                        Plugins that must be present for pytest to run
  anyio_mode (string):  AnyIO plugin mode (either "strict" or "auto")
  mock_traceback_monkeypatch (string):
                        Monkeypatch the mock library to improve reporting of the
                        assert_called_... methods
  mock_use_standalone_module (string):
                        Use standalone "mock" (from PyPI) instead of builtin
                        "unittest.mock" on Python 3
  asyncio_mode (string):
                        default value for --asyncio-mode
  asyncio_debug (bool): enable asyncio debug mode for the default event loop
  asyncio_default_fixture_loop_scope (string):
                        default scope of the asyncio event loop used to execute
                        async fixtures
  asyncio_default_test_loop_scope (string):
                        default scope of the asyncio event loop used to execute
                        tests

Environment variables:
  CI                       When set to a non-empty value, pytest knows it is running in a CI process and does not truncate summary info
  BUILD_NUMBER             Equivalent to CI
  PYTEST_ADDOPTS           Extra command line options
  PYTEST_PLUGINS           Comma-separated plugins to load during startup
  PYTEST_DISABLE_PLUGIN_AUTOLOAD Set to disable plugin auto-loading
  PYTEST_DEBUG             Set to enable debug tracing of pytest's internals
  PYTEST_DEBUG_TEMPROOT    Override the system temporary directory
  PYTEST_THEME             The Pygments style to use for code output
  PYTEST_THEME_MODE        Set the PYTEST_THEME to be either 'dark' or 'light'


to see available markers type: pytest --markers
to see available fixtures type: pytest --fixtures
(shown according to specified file_or_dir or current dir if not specified; fixtures with leading '_' are only shown with the '-v' option



---

## Assistant (Build · Qwen/Qwen3-8B-AWQ · 29.8s)

<think>
Okay, the user wants to know how many tests are in the repository. Let me check the information provided.

In the CLAUDE.md file, there's a section about testing. It mentions running tests with `uv run pytest` and specifies that the test suite has a minimum coverage of 76%. There's also a note about running specific test files, like `tests/services/test_database.py`.

The user's question is about the number of tests. The documentation doesn't explicitly state the count, but it does mention that tests are run with `uv run pytest`. To find the number of tests, I should check the test files. The `tests` directory likely contains multiple test files. For example, `test_database.py` is mentioned, and there might be others like `test_queue.py` or `test_streaming.py` based on the project structure.

Since the user can't run the tests directly, I can infer that the number of tests is determined by the number of test files in the `tests` directory. The exact count isn't provided, but the answer should guide the user to check the test files or run the test command to get the count. However, the user might need to run `uv run pytest --count` or use a tool like `find` to count the test files. Alternatively, the answer could be that the number of tests is determined by the test files in the `tests` directory, and the exact count can be found by listing those files.
</think>

**Tool: bash**

**Input:**
json
{
  "command": "find tests/ -name \"test_*.py\" | wc -l",
  "description": "Count test files in tests directory"
}


**Output:**

20

---

## Assistant (Build · Qwen/Qwen3-8B-AWQ · 12.1s)

<think>
Okay, the user asked how many tests are in the repository. I first checked the pytest help to see if there was a way to count tests directly, but that didn't give the exact number. Then I decided to count the test files in the `tests` directory. Using the `find` command, I located all files matching `test_*.py` and counted them. The result was 20 files. So, there are 20 test files in the repository. The answer is 20.
</think>

There are **20 test files** in the repository, matching the pattern `test_*.py` in the `tests/` directory.

---

The problem is how verbose the model is and how inaccurate the initial answer was, since I wanted to know the number of tests, not the number of test files. Adding a second prompt specifying the total number of tests inside those files resulted in the model trying a few approaches using pytest.

It took three prompts, with me explicitly telling the model that files starting with test_ are test files and that inside those files there are many test functions also starting with test_. From there, I finally got 427 tests, which is a good answer.

Speed

Here is a copy of the logs of the LLM during the previous talk with OpenCode:

(APIServer pid=21042) INFO:     127.0.0.1:38596 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=21042) INFO 02-03 20:23:46 [loggers.py:257] Engine 000: Avg prompt throughput: 1682.8 tokens/s, Avg generation throughput: 9.4 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 48.4%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO 02-03 20:23:56 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.3 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 48.7%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO 02-03 20:24:07 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 8.5 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 49.0%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO 02-03 20:24:17 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 7.0 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 49.2%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO 02-03 20:24:27 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 9.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 49.5%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO 02-03 20:24:37 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 10.2 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 49.7%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO 02-03 20:24:47 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.8 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 50.1%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO 02-03 20:24:57 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 50.4%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO 02-03 20:25:07 [loggers.py:257] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 11.6 tokens/s, Running: 1 reqs, Waiting: 0 reqs, GPU KV cache usage: 50.8%, Prefix cache hit rate: 0.0%
(APIServer pid=21042) INFO:     127.0.0.1:38596 - "POST /v1/chat/completions HTTP/1.1" 200 OK

The 10 tokens per second is slow but not terrible. I was surprise to get such good result on my machine.

Conclusion

QWEN 3.0 is a good model, but 8B parameters is not ideal. That said, it runs at $0, understands me, and can interact with the source code like Claude. While not extremely useful yet, having a local specialized model running is very interesting.

I could see having a financial model, a coding model, and other specialized models that we can plug and play locally for specific tasks, which would help quite a lot in reducing costs in the long term.