Parser: Incremental HTML parser with script-pause + document.write #13

Closed
opened 2026-06-17 13:37:44 +00:00 by Artur · 1 comment
Owner

Goal

Implement incremental HTML parser with script-pause support and document.write() injection.

What to Build

src/dom/parser.ts

export interface ParserConfig {
  baseUrl: string;
  onScriptTag: (script: HTMLScriptElement) => Promise<void>;
  onDocumentWrite: (html: string) => void;
}

export class IncrementalParser {
  constructor(config: ParserConfig);

  // Start parsing HTML from the beginning
  start(html: string): Promise<void>;

  // Feed additional chunks (for streaming)
  feedChunk(chunk: string): void;

  // Check if parser is still active
  isActive(): boolean;

  // Called by ScriptLoader when blocking script finishes -> resume parser
  resume(): void;

  // Handle document.write() during parsing
  handleDocumentWrite(html: string): void;

  // Get final Document
  getDocument(): Document;
}

Parsing Behavior

1. start(html) is called with full HTML string
2. Parser builds DOM incrementally (in chunks if possible, or full parse)
3. When parser encounters a <script> tag without async/defer:
   -> Pause parsing
   -> Call config.onScriptTag(scriptElement)
   -> ScriptLoader handles fetch + execute
   -> When ScriptLoader finishes, calls parser.resume()
   -> Parser resumes, processing remaining HTML
4. When parser encounters <script async> or <script defer>:
   -> Do NOT pause
   -> Call config.onScriptTag(scriptElement) (fires and forget)
   -> ScriptLoader starts fetching in background
   -> Parser continues
5. When parser encounters <script type="module">:
   -> Do NOT pause
   -> Call config.onScriptTag(scriptElement)
   -> ScriptLoader handles module resolution
   -> Parser continues
6. document.write() during parsing:
   -> Inject the written HTML into the parser's input stream
   -> Parser processes it as if it were in the original HTML
   -> Position is at the injection point
7. After all HTML parsed:
   -> Parser finishes
   -> ScriptLoader executes defer scripts
   -> DOMContentLoaded fires

Document.write Handling

According to spec:
- If document.write() is called during page load (parser is active):
  -> The written HTML is parsed as if it were at the current position
  -> The parser continues from the injection point
- If document.write() is called after page load:
  -> It overwrites the document (document.open() -> write -> close())
- Implementation:
  -> During parsing: buffer the written HTML, inject into parse stream
  -> After parsing: call document.open(), write(), close()

Tests

Unit Tests

Test Verifies
parser.basic.test.ts Simple HTML with div/span parses to DOM
parser.script-pause.test.ts Classic script pauses parser
parser.script-resume.test.ts Parser resumes after script execution
parser.async-script.test.ts Async script does NOT pause parser
parser.defer-script.test.ts Defer script does NOT pause parser
parser.module-script.test.ts Module script does NOT pause parser
parser.document-write.test.ts document.write injects at current position
parser.document-write-after-load.test.ts document.write after load overwrites document
parser.multiple-scripts.test.ts Multiple scripts pause/resume correctly
parser.nested-elements.test.ts Deeply nested HTML parses correctly
parser.attributes.test.ts All attribute types (quoted, unquoted, boolean)

Edge Cases

Test Verifies
parser.empty.test.ts Empty string produces empty document
parser.comments.test.ts HTML comments are ignored
parser.script-content.test.ts Script content with inside escaped
parser.cdata.test.ts CDATA sections handled
parser.doctype.test.ts DOCTYPE parsed (stored as document.doctype)
parser.encoding.test.ts charset meta tag detected

Definition of Done

  • src/dom/parser.ts with IncrementalParser class
  • Script-pause + resume mechanism
  • document.write() support
  • Integration with ScriptLoader via config.onScriptTag
  • All tests pass
  • 100% line + branch coverage
## Goal Implement incremental HTML parser with script-pause support and document.write() injection. ## What to Build ### src/dom/parser.ts ``` export interface ParserConfig { baseUrl: string; onScriptTag: (script: HTMLScriptElement) => Promise<void>; onDocumentWrite: (html: string) => void; } export class IncrementalParser { constructor(config: ParserConfig); // Start parsing HTML from the beginning start(html: string): Promise<void>; // Feed additional chunks (for streaming) feedChunk(chunk: string): void; // Check if parser is still active isActive(): boolean; // Called by ScriptLoader when blocking script finishes -> resume parser resume(): void; // Handle document.write() during parsing handleDocumentWrite(html: string): void; // Get final Document getDocument(): Document; } ``` ### Parsing Behavior ``` 1. start(html) is called with full HTML string 2. Parser builds DOM incrementally (in chunks if possible, or full parse) 3. When parser encounters a <script> tag without async/defer: -> Pause parsing -> Call config.onScriptTag(scriptElement) -> ScriptLoader handles fetch + execute -> When ScriptLoader finishes, calls parser.resume() -> Parser resumes, processing remaining HTML 4. When parser encounters <script async> or <script defer>: -> Do NOT pause -> Call config.onScriptTag(scriptElement) (fires and forget) -> ScriptLoader starts fetching in background -> Parser continues 5. When parser encounters <script type="module">: -> Do NOT pause -> Call config.onScriptTag(scriptElement) -> ScriptLoader handles module resolution -> Parser continues 6. document.write() during parsing: -> Inject the written HTML into the parser's input stream -> Parser processes it as if it were in the original HTML -> Position is at the injection point 7. After all HTML parsed: -> Parser finishes -> ScriptLoader executes defer scripts -> DOMContentLoaded fires ``` ### Document.write Handling ``` According to spec: - If document.write() is called during page load (parser is active): -> The written HTML is parsed as if it were at the current position -> The parser continues from the injection point - If document.write() is called after page load: -> It overwrites the document (document.open() -> write -> close()) - Implementation: -> During parsing: buffer the written HTML, inject into parse stream -> After parsing: call document.open(), write(), close() ``` ## Tests ### Unit Tests | Test | Verifies | |------|----------| | parser.basic.test.ts | Simple HTML with div/span parses to DOM | | parser.script-pause.test.ts | Classic script pauses parser | | parser.script-resume.test.ts | Parser resumes after script execution | | parser.async-script.test.ts | Async script does NOT pause parser | | parser.defer-script.test.ts | Defer script does NOT pause parser | | parser.module-script.test.ts | Module script does NOT pause parser | | parser.document-write.test.ts | document.write injects at current position | | parser.document-write-after-load.test.ts | document.write after load overwrites document | | parser.multiple-scripts.test.ts | Multiple scripts pause/resume correctly | | parser.nested-elements.test.ts | Deeply nested HTML parses correctly | | parser.attributes.test.ts | All attribute types (quoted, unquoted, boolean) | ### Edge Cases | Test | Verifies | |------|----------| | parser.empty.test.ts | Empty string produces empty document | | parser.comments.test.ts | HTML comments are ignored | | parser.script-content.test.ts | Script content with </script> inside escaped | | parser.cdata.test.ts | CDATA sections handled | | parser.doctype.test.ts | DOCTYPE parsed (stored as document.doctype) | | parser.encoding.test.ts | charset meta tag detected | ## Definition of Done - [ ] src/dom/parser.ts with IncrementalParser class - [ ] Script-pause + resume mechanism - [ ] document.write() support - [ ] Integration with ScriptLoader via config.onScriptTag - [ ] All tests pass - [ ] 100% line + branch coverage
Author
Owner

Parser: Incremental HTML parser with script-pause + document.write. Implementiert in src/dom/parser.ts. Tests: parser.test.ts.

Parser: Incremental HTML parser with script-pause + document.write. ✅ Implementiert in src/dom/parser.ts. Tests: parser.test.ts.
Artur closed this issue 2026-06-18 06:28:04 +00:00
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
glow-all/true-headless-browser#13
No description provided.