try playing with the new spreadsheet querying feature, it’s duckdb based and should be pretty cool

  1. Listing spreadsheets 3 spreadsheets 18 sheets
    gold_standard_v12.xlsx
    1. bid_level 3700 rows x 1 columns
    2. tender_level 1278 rows x 17 columns
    3. company_id_map 1038 rows x 7 columns
    4. population 913 rows x 34 columns
    5. IUB_metadata 80 rows x 27 columns
    6. special_cases_guide 7 rows x 2 columns
    7. VARIABLE_dictionary_v2 5 rows x 4 columns
    8. variable_dictionary 41 rows x 7 columns
    9. Primary_sources 17 rows x 2 columns
    10. normalization_rules 10 rows x 5 columns
    inventory.xlsx
    1. Sheet1 17098 rows x 17 columns
    inventory_v3.xlsx
    1. tender_level (2) 29 rows x 15 columns
    2. Inventory 330 rows x 16 columns
    3. IUB 28 rows x 54 columns
    4. bid_level 151 rows x 11 columns
    5. tender_level 28 rows x 8 columns
    6. Pivot1 8 rows x 2 columns
    7. CPV codes 44 rows x 2 columns
  2. Quick profile of available procurement spreadsheet tables and missingness-relevant fields 6 rows of 6
    WITH sheets AS (
      SELECT 'gold.population' AS sheet, count(*) AS rows FROM xlsx_gold_standard_v12_b56445cb_4
      UNION ALL SELECT 'gold.tender_level', count(*) FROM xlsx_gold_standard_v12_b56445cb_2
      UNION ALL SELECT 'gold.bid_level', count(*) FROM xlsx_gold_standard_v12_b56445cb_1
      UNION ALL SELECT 'inventory_v3.tender_level', count(*) FROM xlsx_inventory_v3_4b022d33_5
      UNION ALL SELECT 'inventory_v3.bid_level', count(*) FROM xlsx_inventory_v3_4b022d33_4
      UNION ALL SELECT 'inventory.files', count(*) FROM xlsx_inventory_830a48f1_1
    )
    SELECT * FROM sheets ORDER BY rows DESC;
    sheet rows
    inventory.files 17098
    gold.bid_level 3700
    gold.tender_level 1278
    gold.population 913
    inventory_v3.bid_level 151
    inventory_v3.tender_level 28
    Open saved result #SPELA2
  3. Explore distribution of gold-standard population by year, KIBS class, gold-standard flag, and document availability 16 rows of 16
    SELECT
      source_year,
      kibs_text_class,
      gold_standard,
      count(*) AS tenders,
      sum(CASE WHEN lower(coalesce(has_report_bool,'')) IN ('true','1','yes','jā','ja') THEN 1 ELSE 0 END) AS has_report,
      round(avg(try_cast(total_docs AS DOUBLE)), 2) AS avg_docs,
      sum(CASE WHEN lower(coalesce(meat_rich,'')) IN ('true','1','yes','jā','ja') THEN 1 ELSE 0 END) AS meat_rich_n,
      sum(CASE WHEN lower(coalesce(likely_framework,'')) IN ('true','1','yes','jā','ja') THEN 1 ELSE 0 END) AS likely_framework_n
    FROM xlsx_gold_standard_v12_b56445cb_4
    GROUP BY 1,2,3
    ORDER BY try_cast(source_year AS INT), kibs_text_class, gold_standard;
    source_year kibs_text_class gold_standard tenders has_report avg_docs meat_rich_n likely_framework_n
    2024 non_kibs 0 255 227 15.37 73 0
    2024 non_kibs 1 23 23 15.09 9 1
    2024 possible_kibs 0 45 40 21.33 22 0
    2024 possible_kibs 1 8 8 16.75 3 0
    2024 strict_kibs 0 141 120 16.41 51 0
    2024 strict_kibs 1 21 19 19.43 13 1
    2024 weak_signal 0 87 80 16.52 46 0
    2024 weak_signal 1 7 6 22.57 5 0
    2025 non_kibs 0 115 95 14.83 38 1
    2025 non_kibs 1 5 5 17.6 1 1
    2025 possible_kibs 0 38 32 17.18 19 0
    2025 possible_kibs 1 1 1 10 1 0
    2025 strict_kibs 0 74 58 15.04 29 0
    2025 strict_kibs 1 11 11 19.09 6 1
    2025 weak_signal 0 78 66 13.86 31 1
    2025 weak_signal 1 4 4 12 0 0
    Open saved result #8GEEAD
  4. Summarise hand-coded 28-tender sample by preparation time, bids, award value, and single-bidder rate 2 rows of 2
    SELECT
      procedure_type,
      count(*) AS tenders,
      round(avg(try_cast(days_to_prepare AS DOUBLE)), 1) AS avg_days_to_prepare,
      median(try_cast(days_to_prepare AS DOUBLE)) AS median_days_to_prepare,
      round(avg(try_cast(total_bids AS DOUBLE)), 2) AS avg_total_bids,
      median(try_cast(total_bids AS DOUBLE)) AS median_total_bids,
      sum(CASE WHEN try_cast(total_bids AS INT)=1 THEN 1 ELSE 0 END) AS single_bid_tenders,
      round(avg(try_cast(award_amount AS DOUBLE)), 0) AS avg_award_amount
    FROM xlsx_inventory_v3_4b022d33_5
    GROUP BY 1
    ORDER BY tenders DESC;
    procedure_type tenders avg_days_to_prepare median_days_to_prepare avg_total_bids median_total_bids single_bid_tenders avg_award_amount
    Atklāts konkurss 14 22.5 19 8.21 3.5 2 28873672
    Mazie iepirkumi 14 16.6 15 2.57 2 4 27770
    Open saved result #3RULBU
  5. Check whether shorter preparation periods in the hand-coded sample are associated with fewer bids 3 rows of 3
    WITH t AS (
      SELECT
        tender_id,
        procedure_type,
        try_cast(days_to_prepare AS DOUBLE) AS days_to_prepare,
        try_cast(total_bids AS DOUBLE) AS total_bids,
        try_cast(award_amount AS DOUBLE) AS award_amount,
        general_agreement
      FROM xlsx_inventory_v3_4b022d33_5
      WHERE try_cast(days_to_prepare AS DOUBLE) IS NOT NULL
        AND try_cast(total_bids AS DOUBLE) IS NOT NULL
    )
    SELECT
      CASE
        WHEN days_to_prepare < 10 THEN '<10 days'
        WHEN days_to_prepare < 20 THEN '10-19 days'
        WHEN days_to_prepare < 30 THEN '20-29 days'
        ELSE '30+ days'
      END AS prep_window,
      count(*) AS tenders,
      round(avg(total_bids), 2) AS avg_bids,
      median(total_bids) AS median_bids,
      sum(CASE WHEN total_bids=1 THEN 1 ELSE 0 END) AS single_bidder,
      string_agg(tender_id, ', ' ORDER BY tender_id) AS tender_ids
    FROM t
    GROUP BY 1
    ORDER BY min(days_to_prepare);
    prep_window tenders avg_bids median_bids single_bidder tender_ids
    10-19 days 19 2.79 2 5 143415, 144471, 144476, 144555, 146863, 151019, 151080, 151178, 153192, 153292, 153433, 153570, 153631, 154274, 155433, 156048, 157384, 157387, 157474
    20-29 days 6 14.67 2.5 1 144497, 148738, 150902, 152877, 155963, 156355
    30+ days 3 3.33 3 0 133703, 144117, 148991
    Open saved result #NL8Z4X
  6. Test cross-sheet join: attach document inventory counts to the 28 tender-level sample 10 rows of 28
    WITH docs AS (
      SELECT
        procurement_id AS tender_id,
        count(*) AS n_files,
        count(DISTINCT lower(file_type)) AS n_file_types,
        sum(try_cast(bytes AS BIGINT)) AS total_bytes,
        string_agg(DISTINCT lower(file_type), ', ' ORDER BY lower(file_type)) AS file_types
      FROM xlsx_inventory_v3_4b022d33_2
      GROUP BY procurement_id
    ), tenders AS (
      SELECT
        tender_id,
        procedure_type,
        try_cast(days_to_prepare AS DOUBLE) AS days_to_prepare,
        try_cast(total_bids AS DOUBLE) AS total_bids,
        try_cast(award_amount AS DOUBLE) AS award_amount
      FROM xlsx_inventory_v3_4b022d33_5
    )
    SELECT
      t.tender_id,
      t.procedure_type,
      t.days_to_prepare,
      t.total_bids,
      d.n_files,
      d.n_file_types,
      round(d.total_bytes / 1000000.0, 2) AS mb_files,
      d.file_types
    FROM tenders t
    LEFT JOIN docs d USING (tender_id)
    ORDER BY d.n_files DESC NULLS LAST, t.total_bids ASC;
    tender_id procedure_type days_to_prepare total_bids n_files n_file_types mb_files file_types
    152877 Atklāts konkurss 21 6 22 3 2.37 doc, docx, pdf
    146863 Mazie iepirkumi 19 4 20 6 0.65 doc, docx, edoc, id146863_mimetype, pdf, xlsx
    153433 Atklāts konkurss 14 5 19 5 2.24 doc, docx, edoc, id153433_mimetype, pdf
    153631 Mazie iepirkumi 17 3 18 2 0.81 docx, pdf
    144471 Mazie iepirkumi 15 2 17 4 0.8 docx, edoc, id144471_mimetype, pdf
    151019 Mazie iepirkumi 15 2 16 2 1.47 docx, pdf
    144555 Atklāts konkurss 16 4 16 5 0.75 doc, docx, edoc, id144555_mimetype, pdf
    148738 Atklāts konkurss 29 74 15 4 3.02 docx, pdf, xlsx, zip
    144117 Atklāts konkurss 41 5 14 5 0.95 docx, edoc, id144117_mimetype, pdf, xlsx
    157387 Atklāts konkurss 15 2 12 3 4.12 docx, pdf, xlsx
    Open saved result #V53U6W

Played with it — it’s genuinely useful. I found 3 workbooks / 18 sheets, including:

  • gold_standard_v12.xlsx: population, tender-level, company map, variable dictionary

  • inventory.xlsx: large file inventory, 17,098 rows

  • inventory_v3.xlsx: hand-coded tender/bid sample, IUB metadata, document inventory

A few quick DuckDB tests worked well:

1. Table size profiling

Saved result:

Largest available sheets:

sheet rows
inventory.files
17,098
gold.bid_level
3,700
gold.tender_level
1,278
gold.population
913
inventory_v3.bid_level
151
inventory_v3.tender_level
28

2. Gold-standard population by KIBS class/year

Saved result:

Useful sanity check: the gold.population sheet has 913 tenders, split across 2024–2025 and KIBS categories. Example:

  • 2024 strict KIBS: 162 total, of which 21 are gold-standard

  • 2025 strict KIBS: 85 total, of which 11 are gold-standard

  • meat_rich and likely_framework are queryable as participation-relevant tender-design flags

This is useful for checking whether the gold-standard sample is over/under-representing strict KIBS, MEAT-rich tenders, frameworks, or document-heavy cases.

3. Hand-coded 28-tender sample summary

Saved result:

procedure type tenders avg days to prepare avg total bids single bid tenders
Atklāts konkurss
14
22.5
8.21
2
Mazie iepirkumi
14
16.6
2.57
4

Nice immediately thesis-relevant signal: small procurements have shorter prep windows and fewer bids in this small sample.

4. Prep time vs bids

Saved result:

prep window tenders avg bids median bids single bidder
10–19 days
19
2.79
2
5
20–29 days
6
14.67
2.5
1
30+ days
3
3.33
3
0

The 20–29 day average is inflated by one big outlier: tender 148738 with 74 bids. Median is more stable.

5. Cross-sheet join: tender sample + document inventory

Saved result:

DuckDB joins work nicely. I joined the 28-tender sample to the document inventory and got document-burden indicators:

tender id procedure days bids n files MB files file types
152877
Atklāts konkurss
21
6
22
2.37
doc docx pdf
146863
Mazie iepirkumi
19
4
20
0.65
doc docx edoc pdf xlsx…
153433
Atklāts konkurss
14
5
19
2.24
doc docx edoc pdf…
148738
Atklāts konkurss
29
74
15
3.02
docx pdf xlsx zip

This is probably the coolest thesis use-case: build operational variables like:

  • n_files

  • n_file_types

  • total_MB

  • has_xlsx

  • has_zip

  • has_edoc

  • days_to_prepare

  • total_bids

  • single_bidder

Then test whether document complexity / bid preparation burden is associated with lower participation.

Main practical note: spreadsheet columns are imported as text, so numeric analysis needs try_cast(...), but otherwise the DuckDB workflow is smooth.

Reply