SharePoint — What We Found & What We Built

CDT’s SharePoint had grown unchecked for nearly a decade — 67,760 files across 22 sites, almost half duplicated, and heritage media locked behind Microsoft authentication. Over seven weeks (February–March 2026) we audited everything, extracted intelligence that was previously invisible, and built systems to make it accessible. The cleanup was the enabling step, not the goal.

Timeline

9 Feb
First SharePoint discussion during M365 email setup
17–26 Feb
Three weeks of authentication struggles — m365 CLI, Graph API, device registration
3 Mar
Breakthrough: Azure AD app registered, first full audit (67,760 files mapped)
3–4 Mar
Duplication discovered (57.3 GB). Heritage Conservation Framework found (3.8 GB)
4–6 Mar
R2 migration begins. Compact JSON format solves 78 MB deployment problem
9–11 Mar
Lease, policy, and M365 audits. Timestamp dedup executed (392 files, 419 MB)
12 Mar
Funding folder consolidation. 265 docs analysed → 55 grants, £4.1M tracked
17 Mar
Board Portal created. Solar dashboard built. Inbox cleaned. Calendar set up
18 Mar
Camp layout images extracted for physical signboard design
19 Mar
Blair OneDrive migrated (1,414 items). Scripts reorganised. Portal /sharepoint page built
20 Mar
Repository restructured. 5 commits deployed to Vercel
23 Mar
SharePoint page rewritten as before/after narrative. QA found data errors, corrected

What We Found

67,760
Files catalogued
across all folders and sites
123.9 GB
Total storage
CDT site alone
57.3 GB
Duplicates
46% of all storage
22
SharePoint sites
most empty or abandoned

Three weeks of authentication work (device registration, Azure AD app setup, Graph API permissions) were needed before any audit could begin. Once access was established on 3 March, the first full scan revealed the scale of the problem.

Duplication patterns

PatternWastedWhat happened
Triple finance copies31.1 GBThree officers (SD, SDawe, AH) each had full or partial copies of the same finance records in separate folders
Heritage Group mirror23.0 GBEntire VISITOR STREAM heritage collection duplicated wholesale into WORKING GROUPS AREA
OneDrive sync artifacts1.0 GBTimestamped copies created automatically during staff account migrations
Cross-folder copies2.2 GBSame documents appearing in PROJECTS, OTHER, and VISITOR STREAM folders

What We Unlocked

The real value wasn’t in the cleanup — it was in the intelligence buried across thousands of documents. Five areas of operational data were extracted, structured, and surfaced on the portal for the first time.

Leases & occupancy

PROPERTY/Leases/

65 active leased units, 16 already expired

~£65k/yr unrealised rent identified from vacant or expired units

Policies & governance

ADMINISTRATION/Policies & Procedures/

26 policies (POL-001–026) and 37 procedures found, well-structured numbering

No Safeguarding or Whistleblowing policy — high-risk gaps flagged

Funding history

FINANCE/Funding - Grants/

55 grants across 31 funders, 265 documents analysed

~£4.1M awarded from ~£6.5M applied — full track record now visible

Heritage assessments

PROJECTS/Heritage Conservation Framework 2025/

89 building surveys + 1,148 evidence photos extracted from 3.8 GB of ZIPs

Condition data, repair priorities, and costs now structured and searchable

Solar project

PROJECTS/SolarPV&BESS/

£254K cost breakdown, heritage constraints, contractor details

Full project dashboard built — previously scattered across 20+ documents

What We Built

New systems were created to make the extracted data accessible and to replace SharePoint as the presentation layer for heritage and operational content.

Board Portal
SharePoint Communication Site for trustees — governance docs, meeting papers, action tracker, register of interests. External access via Azure AD B2B guest invitations
Heritage CDN
31 videos (23.4 GB) + 4,700+ heritage files (photos, assessments, archive items) migrated to Cloudflare R2 — publicly accessible, fast loading, £0.35/month hosting vs SharePoint authentication barriers
Portal pages
8 new pages surfacing structured data: funding browser, solar dashboard, camp map, gallery, board pack, tasks, mailbox audit, facilities overview
Email governance
Admin inbox cleaned (4,847 → 81 actionable emails), 6 mail rules to prevent re-accumulation, shared team calendar created
Automation
75 scripts organised into 9 domains (sharepoint, migration, m365, board-portal, funding, portal-data, scraping, util, archive) for repeatable operations

Cleanup — The Enabling Work

ActionDetailFreed
Duplicate identification27,650 duplicate files (57.3 GB) mapped by hash. 392 timestamp artifacts (0.42 GB) removed. Remaining duplicates flagged for manual review0.42 GB
Site consolidation21 satellite sites deleted (10 empty, 11 with minimal content). Energy site archived first74 MB
Funding folders19 COF files merged from 3 locations, 8 empty folders deleted, pre-2023 grants archived
Blair OneDrive1,414 items (photos + videos) migrated from personal OneDrive to R2 gallery

Where We Are Now (March 2026)

4
Active sites
CDT, Team Site, Board Portal, Meter Reading
137 GB
CDT site storage
heritage originals not yet deleted
~33 GB
On R2 CDN
publicly accessible heritage media
8+
Portal pages
surfacing structured data

SharePoint remains the operational filing system for finance, administration, and HR. Heritage media and project data have moved to purpose-built systems (R2 CDN and this portal) that are faster, public where needed, and structured for search and reporting.

Still To Do

ItemDetailImpact
Delete heritage originals from SharePointPhotos and videos have been copied to R2 but originals remain on SharePoint~33 GB reclaimable
Clear recycle binDeleted files remain recoverable for 93 days — recycle bin currently holds ~56 GBFree quota back to tenant
Investigate storage growthCDT site grew from 123.9 GB (Feb audit) to 137 GB (Mar admin centre) despite cleanup — cause unknown13 GB unexplained
Re-run full auditCurrent file count and folder structure not verified since February — need fresh numbersAccurate baseline for ongoing management

Data sourced from Graph API audit (February 2026), conversation history analysis (March 2026), and SharePoint admin centre · Full audit logs retained