Tag: wqa-api

  • POST Mortem: How Azure Application Gateways’s Missing 308 Killed Our Linked Data API

    In the Linked Data world, cool urls don’t change. That means in the RDF world you’re coining URIs that should be resolvable, and you pick the easiest one. Most people in the linked data world use http:// when coining URIs, even though today’s modern internet lives on https:// with upgrades handled by the service’s web stack.

    The Water Quality service launching on the environment.data.gov.uk portal has a RESTful, Hydra API, and it supports a combination of GET and POST methods to retrieve data. The most useful endpoints living at /data are POST, as they can receive GeoJSON bounding boxes to query both geographic and observation data, though some uses don’t require a body.

    In our testing we discovered that Python clients break when navigating the pagination of our service, but JavaScript works. WTF?

    The HTTP Redirect Status Code Landscape

    The 300 series of HTTP Status Codes defined in RFC 7231, with 308 added in RFC 7538, help people navigate the internet automatically when resources move, protocols change, and they’re all quite useful.

    • 301 (Moved Permanently): The old guardโ€”allows method changes
    • 302 (Found): Temporary and method-flexible
    • 303 (See Other): Forces GET (useful for POST-Redirect-GET pattern)
    • 307 (Temporary Redirect): Preserves method but temporary semantics
    • 308 (Permanent Redirect): The hero we needโ€”permanent + method preservation

    The issue It’s not a resource move (different URI), it’s a protocol upgrade (same resource, different scheme).

    Link Data APIs need 308

    Our canonical URIs often use http:// scheme as protocol agnostic identifiers; however transport security requires HTTPS. Content negotiation and RDF payloads reference http:// URIs, and we don’t select the protocol on the fly in our responses. Both the link headers and the Hydra pagination links in our endpoint use the same URIs to help people navigate our pagination setup.

    So you see how this is going to go? With POST getting redirected to GET will cause things to fall over when we erroneously get a 301 from Microsoft’s Application Gateway?

    The Azure Application Gateway Gap

    The current available responses for a HTTP to HTTPS upgrade in Azure’s Application Gateway service are 301, 302, 304, and 307. It’s missing the semantically accurate and method-preserving 308. Not only that, we can’t target specific paths or entry points in the service. We are forced to chose between wrong semantics (i.e. temporary redirects) or broken clients (POST gets converted to GET).

    Real-World Impact: Client Behaviour Broken

    Let’s be honest, the problem here is that Python is full of pedants (see: Pydantic), and the interpretation of RFC 7231 by the authors of itsย requestsย library have correctly implemented their redirect flag in theย post()ย method. When a 301 redirect is encountered,ย requests converts the POST to a GETโ€”which ourย /dataย endpoint doesn’t support, returning aย 405 Method Not Allowedย error.

    What should be a simple for loop navigating the link headers to collect a paginated dataset now requires custom redirect handling. What should be the simple contents of theย while next_url:ย loop.

    # What breaks with 301:
    response = requests.post(next_url, headers=headers, data="")
    # requests converts POST โ†’ GET on 301 redirect
    # Server responds: 405 Method Not Allowed
    # Pagination fails immediately

    Becomes the more convoluted:

    # Manual redirect handling to preserve POST method:
    response = requests.post(
        next_url, 
        headers=headers, 
        auth=auth, 
        data="", 
        allow_redirects=False  # Disable automatic redirect
    )
    
    # Handle redirect manually to keep POST
    if response.status_code in (301, 302, 307, 308):
        next_url = response.headers['Location']
        continue  # Re-POST to new URL

    Now I have to build my own redirect handling in Python because Microsoft has the semantics of their response codes wrong. I’m fine with it, but I want people to be able to be able to use our endpoint easily.

    Our front-end developers didn’t experience the same problem, which means that JavaScript’s fetch doesn’t do the same thing. This gives us an inconsistent API experience, and even with documentation being clear what’s going wrong with their code I’m still going to get support tickets that the thing is broken.

    Microsoft: Fix Your Shit

    Your Application Gateway redirect options aren’t complete. Give us a 308 code, allow us to be the pedants I want us to be. It would make a massive impact for the semantic web, improve our RESTful APIs, and follow modern HTTP patterns without breaking it for everyone else.

    Standards exist for a reason; it’s not a niche concern, as LLMs and Agentic AI usage becomes more and more common, having modern ways of accessing knowledge graphs and FAIR Data requires getting the semantics right everywhere โ€” including in our HTTP response codes.

    @Azure: gimme the response code 308.


    Note: I have a support request asking for this behaviour. I expect Microsoft to change nothing.

  • The Art of Semantic Procrastination: Why I Use Blank Nodes for Concepts That Aren’t Mine

    In the linked data world, there is always a temptation to boil the ocean. When building out a new API or even just a new dataset, there are so many concepts (SKOS:Concept and otherwise) that are undefined and uncoined which provide human context and you feel the pressure to define it in your RDF – at the risk of taking on too much and straying outside of your authority. I’ve faced that in the past while building out a linked data service at Office for National Statistics, and having been burnt by the numerous kettles we had going to define everything semantically. I’ve been determined to not make that mistake again.

    The new API I’ve been developing for DEFRA is a Hydra/SOSA vocabulary based RESTful, content negotiated API for observational water quality data in England. The architecture of the service is FastAPI+PostGIS with a Next.JS frontend: the API doesn’t know anything about RDF; however it responds via JSON-LD by default with the JSON written in a way that people not familiar with RDF would appreciate.

    The main payload of the API is sampling points (sosa:FeatureOfInterest) have samples & samplings (sosa:Samplesosa:Sampling), which in turn have observations (sosa:Observation). Each of these levels have domain-specific types, classifications, and annotations which are necessary for the interpretation and discovery of these data; however no authoritative, public resource currently exists of these concepts.

    As someone who lives FAIR, linked data, but knows most consumers of data neither understand nor care about it, what should I do? The answer isn’t to avoid these concepts – it’s to represent them responsibly until someone with actual authority shows up.

    Procrastination by way of blank nodes

    My solution is deterministic blank nodes. Instead of coining URIs for concepts I don’t own, I generate consistent blank nodes that can be reconciled later when authoritative sources emerge. This keeps my API stable while avoiding coining URIs I may eventually regret. Let me explain.

    Previously I would have attempted to coin URIs for all my concepts, either at the dataset or higher level scope. For example, capturing the concept of running surface water from a river. In the source data for the API I have a table with a key and a label, the key acts as a notation.

    // You have no authority here, Jackie Weaver
    {
      "@id": "http://environment.data.gov.uk/id/sample-material/2AZZ",
      "@type": ["skos:Concept", "sosa:FeatureOfInterest"],
      "skos:prefLabel": "RIVER / RUNNING SURFACE WATER",
      "skos:notation": "2AZZ"
    }

    The issue is I currently don’t have responsibility of the concept scheme for sample materials, and it’s also not online. I know all the values, and I have a copy of it to make the service work but it’s not within the scope of delivery for the water quality API. So instead of speaking with authority I’ve shifted to getting it down in code first to serve it via the API. How about as a blank node?

    // Procrastinating via blank nodes
    {
      "@id": "_:sampleMaterial-2AZZ",
      "@type": ["skos:Concept", "sosa:FeatureOfInterest"],
      "skos:prefLabel": "RIVER / RUNNING SURFACE WATER",
      "skos:notation": "2AZZ"
    }

    The key here isn’t just using any blank node – it’s using a deterministic blank node identifier. By concatenating the concept scheme name with the notation (_:sampleMaterial-2AZZ), I ensure that every time this concept appears in my API responses, it gets the same blank node identifier.

    Note: This isn’t standard RDF blank node syntax – it’s my deterministic generation pattern from my source data. When serialized to actual RDF formats, these become proper blank nodes, but the consistent string ensures they all resolve to the same node across serializations. This isn’t just semantic pedantry – it has real practical benefits.

    When someone downloads multiple API responses and converts them to Turtle or N-Triples, all instances of _:sampleMaterial-2AZZ will be recognized as the same entity. Without this deterministic approach, you’d end up with multiple disconnected blank nodes for what should be the same concept, creating an unforgivable mess.

    Here’s what this looks like in practice – a real API response converted to Turtle:

    curl -sSL --fail 'http://localhost:8000/sampling-point/53130070/sample?skip=0&limit=3&sampleMaterialType=2AZZ&complianceOnly=false' | rdfpipe -i json-ld -o ttl -
    @prefix dcterms: <http://purl.org/dc/terms/> .
    @prefix hydra: <http://www.w3.org/ns/hydra/core#> .
    @prefix skos: <http://www.w3.org/2004/02/skos/core#> .
    @prefix sosa1: <http://www.w3.org/ns/sosa#> .
    @prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
    
    <http://localhost:8000/sampling-point/53130070/sampling/1506412> a sosa1:Sampling ;
        dcterms:type _:samplingPurpose-CA ;
        sosa1:hasFeatureOfInterest <http://localhost:8000/sampling-point/53130070> ;
        sosa1:hasResult <http://localhost:8000/sampling-point/53130070/sample/1506412> ;
        sosa1:resultTime "2001-08-08"^^xsd:date ;
        sosa1:startTime "2000-08-18T12:20:00"^^xsd:dateTime .
    
    <http://localhost:8000/sampling-point/53130070/sampling/1510110> a sosa1:Sampling ;
        dcterms:type _:samplingPurpose-CA ;
        sosa1:hasFeatureOfInterest <http://localhost:8000/sampling-point/53130070> ;
        sosa1:hasResult <http://localhost:8000/sampling-point/53130070/sample/1510110> ;
        sosa1:resultTime "2000-10-05"^^xsd:date ;
        sosa1:startTime "2000-09-20T12:00:00"^^xsd:dateTime .
    
    <http://localhost:8000/sampling-point/53130070/sampling/2303318> a sosa1:Sampling ;
        dcterms:type _:samplingPurpose-CA ;
        sosa1:hasFeatureOfInterest <http://localhost:8000/sampling-point/53130070> ;
        sosa1:hasResult <http://localhost:8000/sampling-point/53130070/sample/2303318> ;
        sosa1:resultTime "2001-06-07"^^xsd:date ;
        sosa1:startTime "2000-11-29T00:01:00"^^xsd:dateTime .
    
    <http://localhost:8000/sampling-point/53130070/sample/1506412> a sosa1:Sample ;
        sosa1:isResultOf <http://localhost:8000/sampling-point/53130070/sampling/1506412> ;
        sosa1:isSampleOf _:sampleMaterial-2AZZ,
            <http://localhost:8000/sampling-point/53130070> .
    
    <http://localhost:8000/sampling-point/53130070/sample/1510110> a sosa1:Sample ;
        sosa1:isResultOf <http://localhost:8000/sampling-point/53130070/sampling/1510110> ;
        sosa1:isSampleOf _:sampleMaterial-2AZZ,
            <http://localhost:8000/sampling-point/53130070> .
    
    <http://localhost:8000/sampling-point/53130070/sample/2303318> a sosa1:Sample ;
        sosa1:isResultOf <http://localhost:8000/sampling-point/53130070/sampling/2303318> ;
        sosa1:isSampleOf _:sampleMaterial-2AZZ,
            <http://localhost:8000/sampling-point/53130070> .
    
    [] a hydra:Collection ;
        hydra:member <http://localhost:8000/sampling-point/53130070/sample/1506412>,
            <http://localhost:8000/sampling-point/53130070/sample/1510110>,
            <http://localhost:8000/sampling-point/53130070/sample/2303318> ;
        hydra:totalItems 129 ;
        hydra:view [ hydra:first <http://localhost:8000/sampling-point/53130070/sample?skip=0&limit=3&sampleMaterialType=2AZZ&complianceOnly=false> ;
                hydra:last <http://localhost:8000/sampling-point/53130070/sample?skip=126&limit=3&sampleMaterialType=2AZZ&complianceOnly=false> ;
                hydra:next <http://localhost:8000/sampling-point/53130070/sample?skip=3&limit=3&sampleMaterialType=2AZZ&complianceOnly=false> ] .
    
    _:sampleMaterial-2AZZ a skos:Concept,
            sosa1:FeatureOfInterest ;
        skos:notation "2AZZ" ;
        skos:prefLabel "RIVER / RUNNING SURFACE WATER" .
    
    _:samplingPurpose-CA a skos:Concept ;
        skos:notation "CA" ;
        skos:prefLabel "COMPLIANCE AUDIT (PERMIT)" .

    Notice howย _:sampleMaterial-2AZZย appears once in the graph but is referenced by multiple samples – exactly what we want.

    When the kettles come out: reconciliation without regret

    The beauty of this approach is that when the authoritative concept scheme eventually goes online (and it will, because I’m also building that service), I can simply add reconciliation triples without breaking anything. This is where semantic versioning becomes your friend – adding triples is a patch-level change at most. It neither changes the shape of the API’s JSON, nor previously coined URIs.

    // Future state - same identifier, now with authority
    {
      "@id": "_:sampleMaterial-2AZZ",
      "@type": ["skos:Concept", "sosa:FeatureOfInterest"],
      "skos:prefLabel": "RIVER / RUNNING SURFACE WATER",
      "skos:notation": "2AZZ",
      "skos:exactMatch": "http://environment.data.gov.uk/def/sample-material/2AZZ",
      "rdfs:definedBy": "http://environment.data.gov.uk/def/sample-material/"
    }

    Now I can fire up those kettles I avoided earlier. The blank node stays the same, existing API consumers continue to work, but new consumers can follow the skos:exactMatch to the authoritative source. Cool URIs don’t change, and neither will these deterministic blank nodes.

    This approach scales beautifully across different concept schemes. Whether it’s determinands that eventually align with QUDT vocabularies, geographic regions that get proper Ordnance Survey URIs, or measurement units that find their way into authoritative registries – the pattern remains the same. Add the reconciliation triples when you have them, leave the blank nodes as stable anchors within the service.

    // And it even supports multiple reconciliation targets
    {
      "@id": "_:sampleMaterial-2AZZ",
      "@type": ["skos:Concept", "sosa:FeatureOfInterest"],
      "skos:prefLabel": "RIVER / RUNNING SURFACE WATER",
      "skos:notation": "2AZZ",
      "skos:exactMatch": "http://environment.data.gov.uk/def/sample-material/2AZZ",
      "rdfs:definedBy": "http://environment.data.gov.uk/def/sample-material/",
      "skos:closeMatch": "http://purl.obolibrary.org/obo/ENVO_00000022"
    }

    In a perfect world, every concept would have an authoritative URI from day one. In the real world, sometimes the most responsible thing you can do is admit you’re not the authority – yet. Deterministic blank nodes let you build useful services today while keeping the door open for proper reconciliation tomorrow. It’s procrastination with a purpose.