try playing with the new spreadsheet querying feature, it’s duckdb based and should be pretty cool
-
Listing spreadsheets 3 spreadsheets 18 sheets
gold_standard_v12.xlsx- bid_level 3700 rows x 1 columns
- tender_level 1278 rows x 17 columns
- company_id_map 1038 rows x 7 columns
- population 913 rows x 34 columns
- IUB_metadata 80 rows x 27 columns
- special_cases_guide 7 rows x 2 columns
- VARIABLE_dictionary_v2 5 rows x 4 columns
- variable_dictionary 41 rows x 7 columns
- Primary_sources 17 rows x 2 columns
- normalization_rules 10 rows x 5 columns
inventory.xlsx- Sheet1 17098 rows x 17 columns
inventory_v3.xlsx- tender_level (2) 29 rows x 15 columns
- Inventory 330 rows x 16 columns
- IUB 28 rows x 54 columns
- bid_level 151 rows x 11 columns
- tender_level 28 rows x 8 columns
- Pivot1 8 rows x 2 columns
- CPV codes 44 rows x 2 columns
-
Quick profile of available procurement spreadsheet tables and missingness-relevant fields 6 rows of 6
WITH sheets AS ( SELECT 'gold.population' AS sheet, count(*) AS rows FROM xlsx_gold_standard_v12_b56445cb_4 UNION ALL SELECT 'gold.tender_level', count(*) FROM xlsx_gold_standard_v12_b56445cb_2 UNION ALL SELECT 'gold.bid_level', count(*) FROM xlsx_gold_standard_v12_b56445cb_1 UNION ALL SELECT 'inventory_v3.tender_level', count(*) FROM xlsx_inventory_v3_4b022d33_5 UNION ALL SELECT 'inventory_v3.bid_level', count(*) FROM xlsx_inventory_v3_4b022d33_4 UNION ALL SELECT 'inventory.files', count(*) FROM xlsx_inventory_830a48f1_1 ) SELECT * FROM sheets ORDER BY rows DESC;Open saved result #SPELA2sheet rows inventory.files 17098 gold.bid_level 3700 gold.tender_level 1278 gold.population 913 inventory_v3.bid_level 151 inventory_v3.tender_level 28 -
Explore distribution of gold-standard population by year, KIBS class, gold-standard flag, and document availability 16 rows of 16
SELECT source_year, kibs_text_class, gold_standard, count(*) AS tenders, sum(CASE WHEN lower(coalesce(has_report_bool,'')) IN ('true','1','yes','jā','ja') THEN 1 ELSE 0 END) AS has_report, round(avg(try_cast(total_docs AS DOUBLE)), 2) AS avg_docs, sum(CASE WHEN lower(coalesce(meat_rich,'')) IN ('true','1','yes','jā','ja') THEN 1 ELSE 0 END) AS meat_rich_n, sum(CASE WHEN lower(coalesce(likely_framework,'')) IN ('true','1','yes','jā','ja') THEN 1 ELSE 0 END) AS likely_framework_n FROM xlsx_gold_standard_v12_b56445cb_4 GROUP BY 1,2,3 ORDER BY try_cast(source_year AS INT), kibs_text_class, gold_standard;Open saved result #8GEEADsource_year kibs_text_class gold_standard tenders has_report avg_docs meat_rich_n likely_framework_n 2024 non_kibs 0 255 227 15.37 73 0 2024 non_kibs 1 23 23 15.09 9 1 2024 possible_kibs 0 45 40 21.33 22 0 2024 possible_kibs 1 8 8 16.75 3 0 2024 strict_kibs 0 141 120 16.41 51 0 2024 strict_kibs 1 21 19 19.43 13 1 2024 weak_signal 0 87 80 16.52 46 0 2024 weak_signal 1 7 6 22.57 5 0 2025 non_kibs 0 115 95 14.83 38 1 2025 non_kibs 1 5 5 17.6 1 1 2025 possible_kibs 0 38 32 17.18 19 0 2025 possible_kibs 1 1 1 10 1 0 2025 strict_kibs 0 74 58 15.04 29 0 2025 strict_kibs 1 11 11 19.09 6 1 2025 weak_signal 0 78 66 13.86 31 1 2025 weak_signal 1 4 4 12 0 0 -
Summarise hand-coded 28-tender sample by preparation time, bids, award value, and single-bidder rate 2 rows of 2
SELECT procedure_type, count(*) AS tenders, round(avg(try_cast(days_to_prepare AS DOUBLE)), 1) AS avg_days_to_prepare, median(try_cast(days_to_prepare AS DOUBLE)) AS median_days_to_prepare, round(avg(try_cast(total_bids AS DOUBLE)), 2) AS avg_total_bids, median(try_cast(total_bids AS DOUBLE)) AS median_total_bids, sum(CASE WHEN try_cast(total_bids AS INT)=1 THEN 1 ELSE 0 END) AS single_bid_tenders, round(avg(try_cast(award_amount AS DOUBLE)), 0) AS avg_award_amount FROM xlsx_inventory_v3_4b022d33_5 GROUP BY 1 ORDER BY tenders DESC;Open saved result #3RULBUprocedure_type tenders avg_days_to_prepare median_days_to_prepare avg_total_bids median_total_bids single_bid_tenders avg_award_amount Atklāts konkurss 14 22.5 19 8.21 3.5 2 28873672 Mazie iepirkumi 14 16.6 15 2.57 2 4 27770 -
Check whether shorter preparation periods in the hand-coded sample are associated with fewer bids 3 rows of 3
WITH t AS ( SELECT tender_id, procedure_type, try_cast(days_to_prepare AS DOUBLE) AS days_to_prepare, try_cast(total_bids AS DOUBLE) AS total_bids, try_cast(award_amount AS DOUBLE) AS award_amount, general_agreement FROM xlsx_inventory_v3_4b022d33_5 WHERE try_cast(days_to_prepare AS DOUBLE) IS NOT NULL AND try_cast(total_bids AS DOUBLE) IS NOT NULL ) SELECT CASE WHEN days_to_prepare < 10 THEN '<10 days' WHEN days_to_prepare < 20 THEN '10-19 days' WHEN days_to_prepare < 30 THEN '20-29 days' ELSE '30+ days' END AS prep_window, count(*) AS tenders, round(avg(total_bids), 2) AS avg_bids, median(total_bids) AS median_bids, sum(CASE WHEN total_bids=1 THEN 1 ELSE 0 END) AS single_bidder, string_agg(tender_id, ', ' ORDER BY tender_id) AS tender_ids FROM t GROUP BY 1 ORDER BY min(days_to_prepare);Open saved result #NL8Z4Xprep_window tenders avg_bids median_bids single_bidder tender_ids 10-19 days 19 2.79 2 5 143415, 144471, 144476, 144555, 146863, 151019, 151080, 151178, 153192, 153292, 153433, 153570, 153631, 154274, 155433, 156048, 157384, 157387, 157474 20-29 days 6 14.67 2.5 1 144497, 148738, 150902, 152877, 155963, 156355 30+ days 3 3.33 3 0 133703, 144117, 148991 -
Test cross-sheet join: attach document inventory counts to the 28 tender-level sample 10 rows of 28
WITH docs AS ( SELECT procurement_id AS tender_id, count(*) AS n_files, count(DISTINCT lower(file_type)) AS n_file_types, sum(try_cast(bytes AS BIGINT)) AS total_bytes, string_agg(DISTINCT lower(file_type), ', ' ORDER BY lower(file_type)) AS file_types FROM xlsx_inventory_v3_4b022d33_2 GROUP BY procurement_id ), tenders AS ( SELECT tender_id, procedure_type, try_cast(days_to_prepare AS DOUBLE) AS days_to_prepare, try_cast(total_bids AS DOUBLE) AS total_bids, try_cast(award_amount AS DOUBLE) AS award_amount FROM xlsx_inventory_v3_4b022d33_5 ) SELECT t.tender_id, t.procedure_type, t.days_to_prepare, t.total_bids, d.n_files, d.n_file_types, round(d.total_bytes / 1000000.0, 2) AS mb_files, d.file_types FROM tenders t LEFT JOIN docs d USING (tender_id) ORDER BY d.n_files DESC NULLS LAST, t.total_bids ASC;Open saved result #V53U6Wtender_id procedure_type days_to_prepare total_bids n_files n_file_types mb_files file_types 152877 Atklāts konkurss 21 6 22 3 2.37 doc, docx, pdf 146863 Mazie iepirkumi 19 4 20 6 0.65 doc, docx, edoc, id146863_mimetype, pdf, xlsx 153433 Atklāts konkurss 14 5 19 5 2.24 doc, docx, edoc, id153433_mimetype, pdf 153631 Mazie iepirkumi 17 3 18 2 0.81 docx, pdf 144471 Mazie iepirkumi 15 2 17 4 0.8 docx, edoc, id144471_mimetype, pdf 151019 Mazie iepirkumi 15 2 16 2 1.47 docx, pdf 144555 Atklāts konkurss 16 4 16 5 0.75 doc, docx, edoc, id144555_mimetype, pdf 148738 Atklāts konkurss 29 74 15 4 3.02 docx, pdf, xlsx, zip 144117 Atklāts konkurss 41 5 14 5 0.95 docx, edoc, id144117_mimetype, pdf, xlsx 157387 Atklāts konkurss 15 2 12 3 4.12 docx, pdf, xlsx
Played with it — it’s genuinely useful. I found 3 workbooks / 18 sheets, including:
gold_standard_v12.xlsx: population, tender-level, company map, variable dictionaryinventory.xlsx: large file inventory, 17,098 rowsinventory_v3.xlsx: hand-coded tender/bid sample, IUB metadata, document inventory
A few quick DuckDB tests worked well:
1. Table size profiling
Saved result:
Largest available sheets:
| sheet | rows |
|---|---|
|
inventory.files
|
17,098
|
|
gold.bid_level
|
3,700
|
|
gold.tender_level
|
1,278
|
|
gold.population
|
913
|
|
inventory_v3.bid_level
|
151
|
|
inventory_v3.tender_level
|
28
|
2. Gold-standard population by KIBS class/year
Saved result:
Useful sanity check: the gold.population sheet has 913 tenders, split across 2024–2025 and KIBS categories. Example:
2024 strict KIBS: 162 total, of which 21 are gold-standard
2025 strict KIBS: 85 total, of which 11 are gold-standard
meat_richandlikely_frameworkare queryable as participation-relevant tender-design flags
This is useful for checking whether the gold-standard sample is over/under-representing strict KIBS, MEAT-rich tenders, frameworks, or document-heavy cases.
3. Hand-coded 28-tender sample summary
Saved result:
| procedure type | tenders | avg days to prepare | avg total bids | single bid tenders |
|---|---|---|---|---|
|
Atklāts konkurss
|
14
|
22.5
|
8.21
|
2
|
|
Mazie iepirkumi
|
14
|
16.6
|
2.57
|
4
|
Nice immediately thesis-relevant signal: small procurements have shorter prep windows and fewer bids in this small sample.
4. Prep time vs bids
Saved result:
| prep window | tenders | avg bids | median bids | single bidder |
|---|---|---|---|---|
|
10–19 days
|
19
|
2.79
|
2
|
5
|
|
20–29 days
|
6
|
14.67
|
2.5
|
1
|
|
30+ days
|
3
|
3.33
|
3
|
0
|
The 20–29 day average is inflated by one big outlier: tender 148738 with 74 bids. Median is more stable.
5. Cross-sheet join: tender sample + document inventory
Saved result:
DuckDB joins work nicely. I joined the 28-tender sample to the document inventory and got document-burden indicators:
| tender id | procedure | days | bids | n files | MB files | file types |
|---|---|---|---|---|---|---|
|
152877
|
Atklāts konkurss
|
21
|
6
|
22
|
2.37
|
doc
docx
pdf
|
|
146863
|
Mazie iepirkumi
|
19
|
4
|
20
|
0.65
|
doc
docx
edoc
pdf
xlsx…
|
|
153433
|
Atklāts konkurss
|
14
|
5
|
19
|
2.24
|
doc
docx
edoc
pdf…
|
|
148738
|
Atklāts konkurss
|
29
|
74
|
15
|
3.02
|
docx
pdf
xlsx
zip
|
This is probably the coolest thesis use-case: build operational variables like:
n_filesn_file_typestotal_MBhas_xlsxhas_ziphas_edocdays_to_preparetotal_bidssingle_bidder
Then test whether document complexity / bid preparation burden is associated with lower participation.
Main practical note: spreadsheet columns are imported as text, so numeric analysis needs try_cast(...), but otherwise the DuckDB workflow is smooth.