Cookbook
Real recipes for real data problems. No artificial ingredients.
Log Analysis
How many processes is each user running?
The classic "who's hogging the system" query:
recs fromps \
| recs collate --key uid -a count \
| recs sort --key count=-n \
| recs totableBefore recs: ps aux | awk '{print $1}' | sort | uniq -c | sort -rn — and good luck if your usernames have spaces.
After recs: Clean, readable, and the output is a proper table.
Process count by user and priority
Want a pivot table? Coming right up:
recs fromps \
| recs collate --key uid,priority -a count \
| recs toptable --x priority --y uid --v countThis gives you a 2D table with priority across the top, users down the side, and counts in each cell. Try doing that with awk.
Find the slowest API endpoints
recs frommultire \
--re 'latency=TIME: (\d*)' \
--re 'method,url="([^" ]*) ([^" ?]*)' access.log \
| recs collate -k url --perfect -a 'avg,latency' -a count \
| recs sort -k 'avg_latency=-n' \
| head -10 \
| recs totableParse the access log, group by URL, compute average latency and request count, sort by slowest, take the top 10. The ancient sea wyrm of Seattle trembles before your data-fu.
Xorg module log analysis
Which modules are logging to Xorg, and at what level?
recs frommultire \
--re 'type,module=\((\S*)\) ([^:]+):' /var/log/Xorg.0.log \
| recs collate --key type,module -a ct \
| recs sort --key ct=-n \
| recs tocsv --headerOutput goes straight to CSV for your spreadsheet-loving colleagues.
CSV Wrangling
Filter and transform a CSV
recs fromcsv --header employees.csv \
| recs grep '{{department}} === "Engineering"' \
| recs xform '{{annual}} = {{salary}} * 12' \
| recs sort --key annual=-n \
| recs tocsv --header -k name,department,annualDeduplicate by email
recs fromcsv --header users.csv \
| recs xform '{{email}} = {{email}}.toLowerCase().trim()' \
| recs sort --key email \
| recs collate --key email -a 'firstrec' \
| recs tocsv --headerJoin two CSVs
recs fromcsv --header orders.csv \
| recs join --key customer_id \
<(recs fromcsv --header customers.csv) \
| recs totable -k customer_name,order_id,amountJSON Processing
Flatten a JSON API response
curl -s https://api.example.com/users \
| recs fromjsonarray \
| recs xform '{{city}} = {{address/city}}' \
| recs totable -k name,email,cityAggregate nested data
cat events.json \
| recs fromjsonarray \
| recs collate --key type \
-a count \
-a 'avg,duration_ms' \
-a 'perc,95,duration_ms' \
| recs sort --key count=-n \
| recs totableConvert JSON to CSV for spreadsheet users
cat data.json | recs fromjsonarray | recs tocsv --header > data.csvYour PM just asked for the data in a spreadsheet. You didn't even have to open Python.
Data Exploration
Quick look at the shape of your data
recs fromcsv --header mystery-data.csv \
| head -5 \
| recs toprettyprintCount unique values per field
recs fromcsv --header data.csv \
| recs collate -a 'dct,status' -a 'dct,region' -a count \
| recs toprettyprintFind outliers
recs fromcsv --header metrics.csv \
| recs collate -a 'avg,latency' -a 'sd,latency' \
| recs eval '"Outlier threshold: " + ({{avg_latency}} + 3 * {{sd_latency}}).toFixed(2) + "ms"'Then filter for them:
THRESHOLD=$(recs fromcsv --header metrics.csv \
| recs collate -a 'avg,latency' -a 'sd,latency' \
| recs eval '{{avg_latency}} + 3 * {{sd_latency}}')
recs fromcsv --header metrics.csv \
| recs grep "{{latency}} > $THRESHOLD" \
| recs totableXML and Structured Data
Parse XML configuration
recs fromxml config.xml \
| recs grep '{{enabled}} === "true"' \
| recs totable -k name,valueExtract data from key-value logs
recs fromkv --delim '=' < app.properties \
| recs grep '{{key}}.startsWith("db.")' \
| recs totablePipeline Patterns
The "Top N" pattern
recs fromcsv --header data.csv \
| recs sort --key score=-n \
| recs topn --key score -n 10 \
| recs totableThe "Group and Rank" pattern
recs fromcsv --header sales.csv \
| recs collate --key region -a 'sum,revenue' -a count \
| recs sort --key sum_revenue=-n \
| recs xform '{{rank}} = $line' \
| recs totableThe "Delta" pattern
See how values change between records:
recs fromcsv --header timeseries.csv \
| recs sort --key timestamp \
| recs delta --key value \
| recs totable -k timestamp,value,value_deltaThe "Annotate" pattern
Add running statistics to each record:
recs fromcsv --header data.csv \
| recs annotate -k category -a 'count' -a 'avg,score' \
| recs totableEach record gets the current running count and average for its category — no separate aggregation step needed.
