Schematization

The IJF collects procurement data from sources around Canada, each of which structures their data differently. To present a federal fighter jet contract next to a Vancouver contract for a beachfront concession worker, we have to transform both records into the same shape, called a schema.

Our schema is made of general attributes, like the opening date of bidding or the address of the winning bidder. When processing a data source, we find where in its records they put the open date, or the address, and point them to the fields in the IJF’s schema. This mapping of fields from source to destination, called schematization, is the bedrock of all the IJF’s databases, including procurement.

In most cases these mappings are obvious to anyone comparing the source to our product. And the most important details about a contract — its bidding dates, the buyer name, or any awarded amounts — are rarely missing. But there are gaps. Some gaps exist in whole facets of procurement data missing at the source (see Summary of Sources); Vancouver has no supplier name or address because we can’t collect its awards at all. 

The fields available in each source will be catalogued in future additions to this document.

Fields that a source does report will often be empty, or even be filled incorrectly. TBS data, like other data sources, sometimes reports Standing Offers or Supply Arrangements with their awards, which are not awarded contracts but a preliminary step to future awards. In these cases they can specify “SOSA” in the “instrument_type” field (#35) in their dataset, indicating they have not yet decided to award any money to a supplier. But we found that the Department of National Defence and Transport Canada use this field incorrectly in all cases, labelling real contracts and their award amounts as only theoretical. A National Defence spokesperson acknowledged this discrepancy by email, saying that “report entries will continue to be monitored and adjusted to comply with standards.”

Amendments

TBS

Within TBS award data, the buyer name and contract ID can be used to group contracts and their amendments. The data does not number the amendments or include their dates, so we order them by the first day of the quarter that the amendment was published. For example, if an amendment was published in Q1 of 2023-2024, the date used will be April 1, 2023. 

When two amendments are published in the same quarter, we sort them in ascending order of cumulative value. As a result, if an amendment with a negative value is published in the same quarter as another amendment, the two will not be ordered correctly.

If a department does not use a consistent ID to refer to a contract and its amendments, they will appear as separate contracts. In these cases, the value of the contract will be double-counted in the calculation of aggregate totals.

The data only includes amendments that alter the value of the contract by more than $10,000.

PSPC

The PSPC awards dataset reflects only the most recent amendment to an award. The IJF’s database records new amendments as they are published but does not contain amendments that predate the start of our data gathering.

Standardization

Buyers

The IJF standardized nearly all names of contracting governments and their departments and agencies, referred to as “buyers” in the procurement database. For example, the values “Department of National Defense,” “DND,” and “National Defence” are all standardized to a single item, “National Defence.” 

In the federal data, 99.5 per cent of tenders and more than 99.9 per cent of awards have standardized buyers. In British Columbia, 87.6% of tenders and 95.4 per cent of awards have standardized buyers. The only exceptions are records where the buyer's name was unavailable or could not be easily resolved to a single entity.

Suppliers

The IJF also standardized a subset of the names of companies that received awards, referred to as “suppliers.” The source data contained 235,000 unique supplier name strings, which were cleaned and consolidated in a multi-step process:

  • First, names were cleaned by stripping out variations in punctuation, capitalization and spacing, as well as removing words like “Inc” and “LLC” from company names. This reduced the pool of unique values to 175,000.
  • Then, IJF reporters manually grouped together cleaned names into standardized company names.
    • Obvious variations and typos were consolidated. (For example: “EY”, “Ernest and Young”, “Ernst & Young”)
    • In most cases, subsidiaries and acquisitions were grouped with their present-day owner. Occasionally, subsidiaries were given their own entries because they are well-known or significant companies in their own right.
    • When a contract was awarded to a joint venture, it was grouped with the first company named in the joint venture.
  • Finally, each grouping was reviewed for correctness and completeness.

Companies were selected for standardization if they met any of four criteria:

  1. Being among the top suppliers by the dollar value of contracts
  2. Being among the top suppliers by number of contracts
  3. Key players in significant industries (for example, the Big Four accounting firms)
  4. Companies that are otherwise significant, influential or newsworthy (for example, Facebook).

Standardized supplier names are used to aggregate records together and calculate totals. Within individual records, the original, unstandardized supplier name is shown.

The list of companies standardized by the IJF at launch time accounts for 29.5 per cent of awards, which make up 74.2 per cent of the total value of awards in the database. Supplier names will continue to be standardized on an ongoing basis.