resume/cv/202001-202310.yaml.origin-main

530 lines
15 KiB
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

- when: 2020
what: edits healthcheck as a modular stack monorepo before it was cool that does visibility+remediation
how: []
why:
- pride-in-craft
- when: 2020
what: BoQ hosting 2020..2023
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2020
what: GOBS from p2p to stateless (via redis, killing leveldb)
how: []
why:
- scalability
- when: 2020
what: nexpose-remediations and blackduck sme
how: []
why:
- scalability
- when: 2020
what: scale test rems vs re
how: []
why:
- failfast
- when: 2020
what: mongo sme designed and released rems-mongo
how: []
why:
- scalability
- when: 2020
what: re, terminator to nomad
how: []
why:
- scalability
- when: 2020
what: design GOBS off of Couchbase
how: []
why:
- scalability
- when: 2020
what: REMS migration implementation
how: []
why:
- pride-in-craft
- when: 2020
what: DRS for REMS
how: []
why:
- scalability
- when: 2020
what: isolation for REMS
how: []
why:
- scalability
- when: 2020
what: FSDef,FSIndex to GOBS from Couchbase
how: []
why:
- scalability
- when: 2020
what: GOBS on Mongo with Xongo w/ Ryan intern
how: []
why:
- pride-in-craft
- when: 2020
what: Mongo on TLS SME/pilot
how: []
why:
- scalability
- when: 2020
what: Mongosback V2 for cron > rundeck
how: []
why:
- pride-in-craft
- when: 2020
what: REMS+DSCat Mongo SME
how: []
why:
- pride-in-craft
- when: 2021
what: systems review planning
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2021
what: mentored new hire until he left the team for his starter project (S3)
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2021
what: mentored sr engineer on bash, rundeck, in-house metrics and alerting, ssh...
how: []
why:
- pride-in-craft
- when: 2021
what: on-call training with chaos testing, hands-on log perusing
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2021
what: s2se; scripted Galera with safety for multi-team
how: []
why:
- scalability
- pride-in-craft
- when: 2021
what: dr backup check to monitor s3 compliance w/ 19 teams onboarded and eventually handed to dbteam
how: []
why:
- scalability
- when: 2021
what: Mongosback V2.1 autorelease after bake time, indexes
how: []
why:
- pride-in-craft
- when: 2021
what: REMS on SSDs analysis, budget proposal, approval, deploy, mock traffic
how: []
why:
- scalability
- when: 2021
what: Took REMS migration implementaion back from handoff and reduced ETA from inf to 3w at max speed w/ visibility and parallelism
how: []
why:
- pride-in-craft
- when: 2021
what: found Go likes lots of small > few big RAM nodes
how: []
why:
- pride-in-craft
- when: 2021
what: |
REMS quality of life
* idempotency test
* brand/user/issuer/byte limiting
* squishing
* lazy JSON parsing
* resumable jobs
* heartbeating jobs
how: []
why:
- scalability
- when: 2021
what: DSCat mongo query analysis and optimization
how: []
why:
- pride-in-craft
- when: 2021
what: cross-team Mongo incident remediation, support, guidance, SME
how: []
why:
- pride-in-craft
- when: 2021
what: couchsback to v2 as rclone
how: []
why:
- scalability
- when: 2021
what: pushed against proposed optimizations (rems cleaning of old edit fields on stale edits) and proved gains (.3% data saved on wire to TS) wouldnt pay off but complied when commanded
how: []
why:
- scalability
- when: 2021
what: Mongo multi-phase, multi-timezone interactive training with offline reading and video + online chaos testing + forum for anonymous feedback
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2021
what: LegacyPublicAPI; they wanted to hand to us, so I executed what it'd take to shift ownership to us and documented the gotchas, and it was so bad that they reverted my complete code and revisited so this handoff wouldnt repeat with other teams
how: []
why:
- pride-in-craft
- when: 2021
what: healthcheck platform design approved but implementaion priority rejected
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2022
what: champion of quality: suspected and saw symptoms of data incorrectness in REMS snapshots, insisted and provided more and more evidence despite willful holiday ignorance, eventually recognized as p1
how: []
why:
- pride-in-craft
- when: 2022
what: became team lead
how: []
why:
- scalability
- when: 2022
what: cost-benefit of geni on ddb: 10x the cost but reduces hardlyAnyOperationalBurdenQuantified
how: []
why:
- pride-in-craft
- when: 2022
what: geni iops -> i insist and tune docker when team wants to ignore call to action
how: []
why:
- pride-in-craft
- when: 2022
what: response-files OOMs image resizing sidecar proposed + open source used
how: []
why:
- scalability
- pride-in-craft
- when: 2022
what: generic aws migration scripts w/ mentee leveraged by tens of teams for s3, ddb, lambda, sns, sqs
how: []
why:
- pride-in-craft
- pride-for-others
- scalability
- when: 2022
what: cicd for team; onboarding + converting + creating continuous testing framework
how: []
why:
- scalability
- when: 2022
what: sahithig + itony mentorships; spead asks what's wrong with onboarding? what onboarding!
how: []
why:
- pride-in-craft
- pride-for-others
- scalability
- when: 2022
what: monorepo and parallelizing and caching packages = Jenkins from 10m to 2m
how: []
why:
- pride-in-craft
- scalability
- when: 2022
what: autopatching for vuln remediation via scheduled builds for team w/ stable, quiet cicd
how: []
- scalability
- when: 2022
what: |
The REMS Data Loss Incident
* mongo bug around leaked oplog lock = no disk persistence = total loss
* upstream replayed jobs or shared their mongodb oplog so i could rebuild
* forward-facing communication; instead of sorry, this is our root cause and future prevention
how: []
why:
- pride-in-craft
- when: 2022
what: miss; jfe needs faster virus scanning so I give 'em 10%. They want 10x because they retry all N files of their batch of M every time. Losers.
how: []
why:
- scalability
- when: 2022
what: every aws migration solo or nearly
how: []
why:
- scalability
- when: 2022
what: ajw initial release from 25% e2e test to 75%
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2022
what: became team lead :sparkles: and promoted to l5
how: []
why:
- role-model-dad
- when: 2022
what: coda doc for planning splits owners from contributors w/ weights
how: []
why:
- scalability
- when: 2022
what: miss; davidc exported to orcs despite wishes to stay
how: []
why:
- pride-in-craft
- role-model-dad
- when: 2022
what: swimlanes of rems; byte write/read rate limits, terminator specific pool
how: []
why:
- scalability
- pride-in-craft
- when: 2022
what: |
tested REMS no-ops when carter ignored me asking him to
* "please write 1 test before i get back from vacation"
* 0 forever
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2022
what: generic nomad cost analysis grafana
how: []
why:
- scalability
- when: 2023
what: learning the performance feedback game; my perception is no one else's reality; make a rubric and define specific examples against it
how: []
why:
- pride-in-craft
- pride-for-others
- role-model-dad
- when: 2023
what: miss; horizons doc review wasnt generalized/brief enough
how: []
why:
- pride-in-craft
- pride-for-others
- customer-obsesssion
- when: 2023
what: 2nd highest contributor to runbook blitz
how: []
why:
- pride-in-craft
- pride-for-others
- when: 2023
what: when overloaded with ops, told team and offloaded + handed off threads
how: []
why:
- pride-in-craft
- when: 2023
what: fairness for rems; if attempt to use N threads per box, defer to low prio queue
how: []
why:
- pride-in-craft
- scalability
- when: 2023
what: |
interactive cicd tutorial with dpie so they could execute side-by-side
* not my fault they didnt
how: []
why:
- pride-for-others
- scalability
- when: 2023
what: chaos test gameday to train new teammate oncall
how: []
why:
- pride-in-craft
- when: 2023
what: |
Couchbase-aggedon
* i told 'em how to patch that shit motherfuckers are usual
* i go to office because that team insists
* i stop 'em from terminating early many times
* a hash means we dont need to check, right?
* ive got a script it's good enough i wrote it
* ive got v2 of my script it's good enough i wrote it
* this is a lotta pain, we should give up
* taught 8 teammates how to sed/grep/script/bash
* delegating threads; spiking accesslogs, spiking redis dumps, spiking couchbase backup/restore
* discovered bugs that meant some threads were not viable
* reduced problem to safest state for round 1, next safest for round 2, ...
how: []
why:
- pride-in-craft
- when: 2023
what: BoQ final
how: []
why:
- scalability
- when: 2023
what: generic datastore customers could opt into us doing stateramp for them in GENI if they set jwts
how: []
why:
- pride-in-craft
- scalability
- when: 2023
what: REMS /partitions, /entrypoints for TS to parallel data load via index scan+hash live vs. keithc INSISTED on not live :eye_roll:
how: []
why:
- pride-in-craft
- scalability
- when: 2023
what: proposed AtlasQMP as bugfixed singleton, parallelized nomad, or lambda cost and speed and devcost and deliverytime
how: []
why:
- pride-in-craft
- scalability
- when: 2023
what: response-files split from library-files so we can move to our own database without sideaffect
how: []
why:
- scalability
- pride-in-craft
- when: 2023
what: |
challenge; q2/q3 planning without knowing what medical leave teammate would do
* 1. offboard what mattered that he was doing
* 2. ask him repeatedly to offboard early and ask for updates how it's going
* 3. guess things he really wants and assume he won't be here for forseeable future even if he does return
* coordinate with mathis on expectations upon his return
how: []
why:
- pride-for-others
- when: 2023
what: |
REMS vs Translations
* Translatsions gets 500 rows from AE without translations and translates those
* prone to eventual consistency, blocks, brand deprioritizing, random bugs
* REMS got a backlog so we told them first
* and we bumped over and over for them to look
* and it escalated to a snafu
* root cause was squishing taking 90% of our cpu on this backlog of repeat work so sync squishing expedited to full release
* REMS emit a bug to TS that missed edits, so Translatsions kept re-translating what REMS perceived to be no-ops
how: []
why:
- pride-in-craft
- when: 2023
what: insist on oncall covers during stressful weeks, high effort windows, and okr release time
how: []
why:
- role-model-dad
- pride-in-craft
- when: 2023
what: still SME on mongo backups and use-case-specific performance optimization
how: []
why:
- pride-in-craft
- scalability
- when: 2023
what: more E2E tests on previously E2E test free repos because old mentee sahithig didn't feel comfortable joining the team fearing she's break stuff
how: []
why:
- pride-in-craft
- scalability
- pride-for-others
- when: 2023
what: navigated a teammate getting exported to a team he didnt want to join AND later getting exported from that team and almost someone else getting exported too
how: []
why:
- pride-in-craft
- role-model-dad
- when: 2023
what: |
CSchmalzle
* burnt out in q1 from too many projects in-flight
* bi-weekly "are you closing stuff?"
* daily "yaknow that 2 day thing? is it done? when will it be done? what do we need to do to ship it?" for 2 months
* insisted he pick things to handoff and we got 2 from him
* released a lotta stuff untested and broken and i doubled back to fix it
* entire team adds quality as key result
* terrible mr of copy-pasting + 2k lines of code
* "learn2git"
* multi-mr
* refactors separate
* wants to release his mass changes that include customer-facing system behavior changes because "we'll be more correct"
* and lots of support to remediate kanban
* and i say NO u fok
how: []
why:
- pride-for-others
- scalability
- when: 2023
what: i get team approval on a design to stop not-deleting customer data, they say we should fix at system level, so I spike and prove system level, just for other team to nope.avi out (REMS MoM delete)
how: []
why:
- pride-in-craft
- when: 2023
what: |
XMD Contact Consolidation Consumer
* "read our kafka topic like this and call your apis with it"
* ezpz BUT i dont wanna own your business logic by proxy
* "but our manager said you would, and something about throttling"
* handle our 429s and you'll be k
* "but our manager..."
* ...3 weeks later...
* listen here m8, u guys own your own 1 write per second shit, ya hear?
* "we didnt even want that, y'all just took 2 years to get back to us >:("
* o
how: []
why:
- pride-in-craft
- when: 2023
what: test everything; atlas qmp canary from ignored to release blocking via librdkafka configs, sleep deletions
how: []
why:
- pride-in-craft
- when: 2023
what: test everything; atlas data loader first e2e test
how: []
why:
- pride-in-craft
- when: 2023
what: test everything; block dev if merging release isn't noop
how: []
why:
- scalability
- when: 2023
what: test everything; legacy responses first e2e test
how: []
why:
- scalability
- when: 2023
what: test everything; except don't; response-files keepalives cross-dc would expire and break response-files allocs permanently, so tests couldn't pass to release fix
how: []
why:
- pride-in-craft
- when: 2023
what: test everything; our tests found FSCS outages and then FSCS team found they had no visibility
how: []
why:
- pride-in-craft
- when: 2023
what: test everything; janus cruddy e2e tests
how: []
why:
- pride-for-others
- when: 2023
what: high availability; 2 instances of singleton with distributed lock as cheap and good enough path forward
how: []
why:
- pride-in-craft
- scalability
- when: 2023
what: designed rems mom deleting parallel with atlas, proposed drs team fixes it and impacted team volume, got deferred indefinitely and solved the same problem yet again but for rems mom
how: []
why:
- pride-in-craft
- scalability
- when: 2022
what: feedback; told sean implying we should spend QED time on ops work is against the spirit of QED time but he is an authority figure and makes it uncomfortable not to
- when: 2022
what: feedback; when i needed to ask michaelp for a remote exception, i had to share i was hesistant because he made possibly leaving engineers sound ostracized and ejected immediately