Skip to content

Commit 568d4ce

Browse files
authored
feat(gatsby-source-drupal): Use the collection count from JSON:API extras to enable parallel API requests for cold builds (#32883)
* feat(gatsby-source-drupal): Use the collection count from JSON:API extras to construct URLs Otherwise, we have to wait to start querying each page until the previous one finishes. This change lets us query all pages in parallel. So instead of fetching one collection page at a time, we can fetch up to the maximum concurrency (default 20). For a test site with ~3200 entities, this PR dropped sourcing time from ~14s to 4s. * use new browser-based URL parser * Comment code more * Use the page size the site has set instead of assuming 50 * Use the original type that's set as that's always there * Log out updates while sourcing * Encourage people to enable this setting in the README * Update gatsby-node.js
1 parent 41f5337 commit 568d4ce

File tree

2 files changed

+65
-1
lines changed

2 files changed

+65
-1
lines changed

packages/gatsby-source-drupal/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,12 @@ module.exports = {
3434
}
3535
```
3636

37+
On the Drupal side, we highly recommend installing [JSON:API
38+
Extras](https://www.drupal.org/project/jsonapi_extras) and enabling "Include
39+
count in collection queries" `/admin/config/services/jsonapi/extras` as that
40+
[speeds up fetching data from Drupal by around
41+
4x](https://github.com/gatsbyjs/gatsby/pull/32883).
42+
3743
### Filters
3844

3945
You can use the `filters` option to limit the data that is retrieved from Drupal. Filters are applied per JSON API collection. You can use any [valid JSON API filter query](https://www.drupal.org/docs/8/modules/jsonapi/filtering). For large data sets this can reduce the build time of your application by allowing Gatsby to skip content you'll never use.

packages/gatsby-source-drupal/src/gatsby-node.js

Lines changed: 59 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,28 @@ const agent = {
2222
// http2: new http2wrapper.Agent(),
2323
}
2424

25+
let start
26+
let apiRequestCount = 0
27+
let initialSourcing = true
28+
let globalReporter
2529
async function worker([url, options]) {
30+
// Log out progress during the initial sourcing.
31+
if (initialSourcing) {
32+
apiRequestCount += 1
33+
if (!start) {
34+
start = Date.now()
35+
}
36+
const queueLength = requestQueue.length()
37+
if (apiRequestCount % 50 === 0) {
38+
globalReporter.verbose(
39+
`gatsby-source-drupal has ${queueLength} API requests queued and the current request rate is ${(
40+
apiRequestCount /
41+
((Date.now() - start) / 1000)
42+
).toFixed(2)} requests / second`
43+
)
44+
}
45+
}
46+
2647
return got(url, {
2748
agent,
2849
cache: false,
@@ -72,6 +93,7 @@ exports.sourceNodes = async (
7293
},
7394
pluginOptions
7495
) => {
96+
globalReporter = reporter
7597
const {
7698
baseUrl,
7799
apiBase = `jsonapi`,
@@ -293,6 +315,7 @@ exports.sourceNodes = async (
293315
drupalFetchActivity.start()
294316

295317
let allData
318+
const typeRequestsQueued = new Set()
296319
try {
297320
const res = await requestQueue.push([
298321
urlJoin(baseUrl, apiBase),
@@ -370,7 +393,39 @@ exports.sourceNodes = async (
370393
if (d.body.included) {
371394
dataArray.push(...d.body.included)
372395
}
373-
if (d.body.links && d.body.links.next) {
396+
397+
// If JSON:API extras is configured to add the resource count, we can queue
398+
// all API requests immediately instead of waiting for each request to return
399+
// the next URL. This lets us request resources in parallel vs. sequentially
400+
// which is much faster.
401+
if (d.body.meta?.count) {
402+
// If we hadn't added urls yet
403+
if (d.body.links.next?.href && !typeRequestsQueued.has(type)) {
404+
typeRequestsQueued.add(type)
405+
406+
// Get count of API requests
407+
// We round down as we've already gotten the first page at this point.
408+
const pageSize = new URL(d.body.links.next.href).searchParams.get(
409+
`page[limit]`
410+
)
411+
const requestsCount = Math.floor(d.body.meta.count / pageSize)
412+
413+
reporter.verbose(
414+
`queueing ${requestsCount} API requests for type ${type} which has ${d.body.meta.count} entities.`
415+
)
416+
417+
const newUrl = new URL(d.body.links.next.href)
418+
await Promise.all(
419+
_.range(requestsCount).map(pageOffset => {
420+
// We're starting 1 ahead.
421+
pageOffset += 1
422+
// Construct URL with new pageOffset.
423+
newUrl.searchParams.set(`page[offset]`, pageOffset * pageSize)
424+
return getNext(newUrl.toString())
425+
})
426+
)
427+
}
428+
} else if (d.body.links?.next) {
374429
await getNext(d.body.links.next)
375430
}
376431
}
@@ -480,6 +535,9 @@ exports.sourceNodes = async (
480535
createNode(node)
481536
}
482537

538+
// We're now done with the initial sourcing.
539+
initialSourcing = false
540+
483541
return
484542
}
485543

0 commit comments

Comments
 (0)