Data Ingestion Layer

The Data Ingestion Layer functions as the initial filtering layer into ORN's pipeline. It is responsible for validating that every submission entering the system is authentic, compliant with technical requirements, and properly logged for downstream processing. At this stage, the focus is not on labeling or understanding videos, but rather on ensuring that only valid, non-AI generated, authentic user-generated content is admitted into the ecosystem.

Every uploaded video is immediately standardized into a consistent format, regardless of the recording device. Baseline checks ensure that submissions meet minimum technical thresholds such as frame rate, resolution, and duration. For instance, smart glasses typically record video between 29 and 31 frames per second (fps) at a minimum resolution of 720p. Videos that fall outside these parameters, or that are abnormally short or excessively long, are automatically rejected before consuming downstream processing resources.

Eligible submissions are not limited to smart-glass devices, smartphone recordings are also accepted, provided they are captured from an egocentric (first-person) perspective using the appropriate supporting tools and setup.

Alongside these technical validations, the ingestion layer performs metadata inspection to ensure content authenticity. Device identifiers, frame-rate logs, and encoding profiles are analyzed to detect patterns consistent with AI-generated or non-authentic videos, such as those downloaded from social media or other external sources. This process also helps determine whether a video was recorded using smart glasses or a smartphone, ensuring transparency and accurate device classification. Any anomalies or signs of tampering; for example, artificially modified metadata intended to bypass platform requirements are automatically flagged and rejected.

To maintain integrity across the dataset, the ingestion system also incorporates light deduplication at this early stage. Basic hashing and frame-level fingerprinting prevent users from uploading identical copies of the same video in an attempt to double-claim rewards. More advanced duplicate detection occurs later in the pipeline, but this first pass acts as a fast filter against obvious abuse.

Finally, each accepted video is temporarily buffered for validation and paired with a structured metadata record (device type, resolution, frame rate, anonymized user ID, and video identifier). Permanent storage occurs only after pre-processing, once all privacy filters and anonymization have been applied, ensuring no raw or sensitive footage is retained. This ensures that every submission entering ORN is properly indexed and traceable from the outset, without compromising contributor privacy.

By design, the Data Ingestion Layer is deliberately narrow in scope: its purpose is to act as the gatekeeper. Videos that clear this stage are guaranteed to be authentic, device-compliant, and technically valid, setting the stage for deeper refinement and transformation in the subsequent Pre-Processing, Evaluation, and Post-Processing layers.

All thresholds, parameters, and detection methods described are subject to continuous refinement as technology advances and as the requirements of the ecosystem evolve.

Last updated