Initial commit

89e60e48 · wanglch · 89e60e48 · 89e60e48 · 89e60e48 · 89e60e48
Commit 89e60e48 authored Mar 13, 2025 by wanglch
20 changed files
--- a/olmocr/bench/sample_data/olmocr_base_temp0_1/openstax_caculus_pg_273_pg1_repeat3.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_1/openstax_caculus_pg_273_pg1_repeat3.md
+3.4 EXERCISES
+
+For the following exercises, the given functions represent the position of a particle traveling along a horizontal line.
+
+a. Find the velocity and acceleration functions.
+
+b. Determine the time intervals when the object is slowing down or speeding up.
+
+150. \( s(t) = 2t^3 - 3t^2 - 12t + 8 \)
+
+151. \( s(t) = 2t^3 - 15t^2 + 36t - 10 \)
+
+152. \( s(t) = \frac{t}{1 + t^2} \)
+
+153. A rocket is fired vertically upward from the ground. The distance \( s \) in feet that the rocket travels from the ground after \( t \) seconds is given by \( s(t) = -16t^2 + 560t \).
+
+a. Find the velocity of the rocket 3 seconds after being fired.
+
+b. Find the acceleration of the rocket 3 seconds after being fired.
+
+154. A ball is thrown downward with a speed of 8 ft/s from the top of a 64-foot-tall building. After \( t \) seconds, its height above the ground is given by \( s(t) = -16t^2 - 8t + 64 \).
+
+a. Determine how long it takes for the ball to hit the ground.
+
+b. Determine the velocity of the ball when it hits the ground.
+
+155. The position function \( s(t) = t^2 - 3t - 4 \) represents the position of the back of a car backing out of a driveway and then driving in a straight line, where \( s \) is in feet and \( t \) is in seconds. In this case, \( s(t) = 0 \) represents the time at which the back of the car is at the garage door, so \( s(0) = -4 \) is the starting position of the car, 4 feet inside the garage.
+
+a. Determine the velocity of the car when \( s(t) = 0 \).
+
+b. Determine the velocity of the car when \( s(t) = 14 \).
+
+156. The position of a hummingbird flying along a straight line in \( t \) seconds is given by \( s(t) = 3t^3 - 7t \) meters.
+
+a. Determine the velocity of the bird at \( t = 1 \) sec.
+
+b. Determine the acceleration of the bird at \( t = 1 \) sec.
+
+c. Determine the acceleration of the bird when the velocity equals 0.
+
+157. A potato is launched vertically upward with an initial velocity of 100 ft/s from a potato gun at the top of an 85-foot-tall building. The distance in feet that the potato travels from the ground after \( t \) seconds is given by \( s(t) = -16t^2 + 100t + 85 \).
+
+a. Find the velocity of the potato after 0.5 s and 5.75 s.
+
+b. Find the speed of the potato at 0.5 s and 5.75 s.
+
+c. Determine when the potato reaches its maximum height.
+
+d. Find the acceleration of the potato at 0.5 s and 1.5 s.
+
+e. Determine how long the potato is in the air.
+
+f. Determine the velocity of the potato upon hitting the ground.
+
+158. The position function \( s(t) = t^3 - 8t \) gives the position in miles of a freight train where east is the positive direction and \( t \) is measured in hours.
+
+a. Determine the direction the train is traveling when \( s(t) = 0 \).
+
+b. Determine the direction the train is traveling when \( s(t) = 0 \).
+
+c. Determine the time intervals when the train is slowing down or speeding up.
+
+159. The following graph shows the position \( y = s(t) \) of an object moving along a straight line.
+
+![Graph of position function](image)
+
+a. Use the graph of the position function to determine the time intervals when the velocity is positive, negative, or zero.
+
+b. Sketch the graph of the velocity function.
+
+c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.
+
+d. Determine the time intervals when the object is speeding up or slowing down.
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_1/openstax_caculus_pg_273_pg1_repeat4.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_1/openstax_caculus_pg_273_pg1_repeat4.md
+3.4 EXERCISES
+
+For the following exercises, the given functions represent the position of a particle traveling along a horizontal line.
+
+a. Find the velocity and acceleration functions.
+
+b. Determine the time intervals when the object is slowing down or speeding up.
+
+150. \( s(t) = 2t^3 - 3t^2 - 12t + 8 \)
+
+151. \( s(t) = 2t^3 - 15t^2 + 36t - 10 \)
+
+152. \( s(t) = \frac{t}{1 + t^2} \)
+
+153. A rocket is fired vertically upward from the ground. The distance \( s \) in feet that the rocket travels from the ground after \( t \) seconds is given by \( s(t) = -16t^2 + 560t \).
+
+a. Find the velocity of the rocket 3 seconds after being fired.
+
+b. Find the acceleration of the rocket 3 seconds after being fired.
+
+154. A ball is thrown downward with a speed of 8 ft/s from the top of a 64-foot-tall building. After \( t \) seconds, its height above the ground is given by \( s(t) = -16t^2 - 8t + 64 \).
+
+a. Determine how long it takes for the ball to hit the ground.
+
+b. Determine the velocity of the ball when it hits the ground.
+
+155. The position function \( s(t) = t^2 - 3t - 4 \) represents the position of the back of a car backing out of a driveway and then driving in a straight line, where \( s \) is in feet and \( t \) is in seconds. In this case, \( s(t) = 0 \) represents the time at which the back of the car is at the garage door, so \( s(0) = -4 \) is the starting position of the car, 4 feet inside the garage.
+
+a. Determine the velocity of the car when \( s(t) = 0 \).
+
+b. Determine the velocity of the car when \( s(t) = 14 \).
+
+156. The position of a hummingbird flying along a straight line in \( t \) seconds is given by \( s(t) = 3t^3 - 7t \) meters.
+
+a. Determine the velocity of the bird at \( t = 1 \) sec.
+
+b. Determine the acceleration of the bird at \( t = 1 \) sec.
+
+c. Determine the acceleration of the bird when the velocity equals 0.
+
+157. A potato is launched vertically upward with an initial velocity of 100 ft/s from a potato gun at the top of an 85-foot-tall building. The distance in feet that the potato travels from the ground after \( t \) seconds is given by \( s(t) = -16t^2 + 100t + 85 \).
+
+a. Find the velocity of the potato after 0.5 s and 5.75 s.
+
+b. Find the speed of the potato at 0.5 s and 5.75 s.
+
+c. Determine when the potato reaches its maximum height.
+
+d. Find the acceleration of the potato at 0.5 s and 1.5 s.
+
+e. Determine how long the potato is in the air.
+
+f. Determine the velocity of the potato upon hitting the ground.
+
+158. The position function \( s(t) = t^3 - 8t \) gives the position in miles of a freight train where east is the positive direction and \( t \) is measured in hours.
+
+a. Determine the direction the train is traveling when \( s(t) = 0 \).
+
+b. Determine the direction the train is traveling when \( s(t) = 0 \).
+
+c. Determine the time intervals when the train is slowing down or speeding up.
+
+159. The following graph shows the position \( y = s(t) \) of an object moving along a straight line.
+
+![Graph of position function](image)
+
+a. Use the graph of the position function to determine the time intervals when the velocity is positive, negative, or zero.
+
+b. Sketch the graph of the velocity function.
+
+c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.
+
+d. Determine the time intervals when the object is speeding up or slowing down.
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_1/openstax_caculus_pg_273_pg1_repeat5.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_1/openstax_caculus_pg_273_pg1_repeat5.md
+3.4 EXERCISES
+
+For the following exercises, the given functions represent the position of a particle traveling along a horizontal line.
+
+a. Find the velocity and acceleration functions.
+
+b. Determine the time intervals when the object is slowing down or speeding up.
+
+150. \( s(t) = 2t^3 - 3t^2 - 12t + 8 \)
+
+151. \( s(t) = 2t^3 - 15t^2 + 36t - 10 \)
+
+152. \( s(t) = \frac{t}{1 + t^2} \)
+
+153. A rocket is fired vertically upward from the ground. The distance \( s \) in feet that the rocket travels from the ground after \( t \) seconds is given by \( s(t) = -16t^2 + 560t \).
+
+a. Find the velocity of the rocket 3 seconds after being fired.
+
+b. Find the acceleration of the rocket 3 seconds after being fired.
+
+154. A ball is thrown downward with a speed of 8 ft/s from the top of a 64-foot-tall building. After \( t \) seconds, its height above the ground is given by \( s(t) = -16t^2 - 8t + 64 \).
+
+a. Determine how long it takes for the ball to hit the ground.
+
+b. Determine the velocity of the ball when it hits the ground.
+
+155. The position function \( s(t) = t^2 - 3t - 4 \) represents the position of the back of a car backing out of a driveway and then driving in a straight line, where \( s \) is in feet and \( t \) is in seconds. In this case, \( s(t) = 0 \) represents the time at which the back of the car is at the garage door, so \( s(0) = -4 \) is the starting position of the car, 4 feet inside the garage.
+
+a. Determine the velocity of the car when \( s(t) = 0 \).
+
+b. Determine the velocity of the car when \( s(t) = 14 \).
+
+156. The position of a hummingbird flying along a straight line in \( t \) seconds is given by \( s(t) = 3t^3 - 7t \) meters.
+
+a. Determine the velocity of the bird at \( t = 1 \) sec.
+
+b. Determine the acceleration of the bird at \( t = 1 \) sec.
+
+c. Determine the acceleration of the bird when the velocity equals 0.
+
+157. A potato is launched vertically upward with an initial velocity of 100 ft/s from a potato gun at the top of an 85-foot-tall building. The distance in feet that the potato travels from the ground after \( t \) seconds is given by \( s(t) = -16t^2 + 100t + 85 \).
+
+a. Find the velocity of the potato after 0.5 s and 5.75 s.
+
+b. Find the speed of the potato at 0.5 s and 5.75 s.
+
+c. Determine when the potato reaches its maximum height.
+
+d. Find the acceleration of the potato at 0.5 s and 1.5 s.
+
+e. Determine how long the potato is in the air.
+
+f. Determine the velocity of the potato upon hitting the ground.
+
+158. The position function \( s(t) = t^3 - 8t \) gives the position in miles of a freight train where east is the positive direction and \( t \) is measured in hours.
+
+a. Determine the direction the train is traveling when \( s(t) = 0 \).
+
+b. Determine the direction the train is traveling when \( s(t) = 0 \).
+
+c. Determine the time intervals when the train is slowing down or speeding up.
+
+159. The following graph shows the position \( y = s(t) \) of an object moving along a straight line.
+
+![Graph of position function](image)
+
+a. Use the graph of the position function to determine the time intervals when the velocity is positive, negative, or zero.
+
+b. Sketch the graph of the velocity function.
+
+c. Use the graph of the velocity function to determine the time intervals when the acceleration is positive, negative, or zero.
+
+d. Determine the time intervals when the object is speeding up or slowing down.
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat1.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat1.md
+Master - 7 1/4' - 36''
+Master Bath - 7 1/4' - 30''
+Laundry - 4 3/4' - 36''
+Bath - 7 1/4' - 24''
+MUD - 7 - 36''
+UTIL - 8 1/4' - 36''
+DWN BATH - 7 1/4' - 32 1/2''
+BUT KIT - 6 3/4' - 30
+PANTRY - 4 3/4' - 24
+6 WEST - 52 9/16'' - 32
+6 WEST BATH - 5' - 24''
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat2.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat2.md
+Master - 7 3/4 - 36
+Master Bath - 7 3/4 - 30
+Laundry - 4 3/4 - 36
+Bath - 7 3/4 - 24
+MUD - 7 - 36
+UTILITY - 8 3/4 - 36
+DINN BATH - 7 3/4 - 32
+BUT KIT - 6 3/4 - 30
+PANTRY - 4 3/4 - 24
+
+6 W/EAST - 32 9/16 - 32
+6 WEST BATH 5" 24
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat3.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat3.md
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat4.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat4.md
+Master -7 1/4" - 30"
+Master Bath -7 1/4" - 30"
+Laundry -4 3/4" - 36"
+Bath -7 1/4" - 24"
+MUD -7 - 36"
+UTIL -8 1/4" - 36"
+Dwn Bath -7 1/4" - 32 1/2"
+But Kit -6 3/4" - 30
+PANTRY -4 3/4" - 24
+6 West - 22 9/8" - 32
+6 West Bath 5" - 24"
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat5.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/buildingnotes_pg1_repeat5.md
+Master - 7 4/4 - 36"
+Master Bath - 7 1/4 - 30"
+Laundry - 4 3/4 - 36"
+Bath - 7 1/4 - 24"
+MUD - 7 - 36"
+UTIL - 8 1/4 - 36"
+
+DOWN BATH - 7 4/4 - 32"
+
+BUT KIT - 6 3/4 - 30
+
+PANTRY - 4 3/4 - 24
+
+6 WEST - 32 9/16 - 32
+5 WEST BATH 5" - 24"
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat1.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat1.md
+Table 4: Baseline model performance on each of the three scoring metrics (task completion, task process, explanatory knowledge discovery) across all 24 DISCOVERY WORLD tasks. Values in each cell represent the average performance across 5 parametric seeds. Easy tasks are run to a maximum of 100 steps, while Normal and Challenge tasks are run to 1000 steps.
+
+| #   | Topic               | Task                        | ReACT Preachase | ReACT Completion | ReACT Knowledge | Plan+Execute Preachase | Plan+Execute Completion | Plan+Execute Knowledge | Hypothesizer Preachase | Hypothesizer Completion | Hypothesizer Knowledge |
+|-----|---------------------|-----------------------------|----------------|------------------|----------------|------------------------|-------------------------|-------------------------|------------------------|--------------------------|------------------------|
+| 1   | Proteomics          | Clustering                  | 0.87           | 0.20             | 0.20           | 0.80                    | 0.00                    | 0.00                    | 0.90                   | 0.40                     | 1.00                   |
+| 2   | Chemistry           | Exploring Combinations and Hill Climbing | 0.88           | 0.40             | 0.40           | 0.88                    | 0.40                    | 0.40                    | 0.93                   | 0.40                     | 0.40                   |
+| 3   | Archaeology         | Simple instrument           | 0.87           | 1.00             | 1.00           | 0.70                    | 0.60                    | 0.40                    | 0.90                   | 0.00                     | 0.40                   |
+| 4   | Reactor Lab         | Regression                  | 0.42           | 0.00             | 0.40           | 0.44                    | 0.00                    | 0.10                    | 0.38                   | 0.00                     | 0.20                   |
+| 5   | Plant Nutrients     | Uncovering systems of rules | 0.80           | 0.20             | 0.20           | 0.70                    | 0.20                    | 0.20                    | 0.60                   | 0.00                     | 0.00                   |
+| 6   | Space Sick          | Open-ended discovery        | 0.78           | 0.60             | 0.00           | 0.68                    | 0.40                    | 0.10                    | 0.80                   | 1.00                     | 0.60                   |
+| 7   | Plant Nutrients     | Novel instruments           | 0.58           | 0.00             | 0.13           | 0.45                    | 0.00                    | 0.13                    | 0.16                   | 0.00                     | 0.33                   |
+| 8   | Rocket Science     | Multi-step measurements and applying formulas | 0.55           | 0.00             | 0.00           | 0.26                    | 0.00                    | 0.00                    | 0.20                   | 0.00                     | 0.00                   |
+| 9   | Translation         | Rosetta stone style linguistic discovery of ancient language | 0.33           | 0.00             | 0.00           | 0.53                    | 0.00                    | 0.07                    | 0.13                   | 0.40                     | 0.00                   |
+| 10  | Average (Easy)      |                             | 0.40           | 0.40             | 0.20           | 0.30                    | 0.00                    | 0.00                    | 0.20                   | 0.20                     | 0.00                   |
+| 11  | Average (Normal)    |                             | 0.20           | 0.00             | 0.00           | 0.68                    | 0.40                    | 0.00                    | 0.84                   | 0.40                     | 0.00                   |
+| 12  | Average (Challenge) |                             | 0.49           | 0.00             | 0.00           | 0.55                    | 0.20                    | 0.05                    | 0.15                   | 0.00                     | 0.00                   |
+
+Table 5: Baseline model performance on each of the three scoring metrics (task completion, task process, explanatory knowledge discovery) across all 10 unit test tasks. Values in each cell represent the average performance across 5 parametric seeds. Unit tests tasks are run to a maximum of 100 steps.
+
+| #   | Unit Test Topic                      | ReACT Preachase | ReACT Completion | ReACT Knowledge | Plan+Execute Preachase | Plan+Execute Completion | Plan+Execute Knowledge | Hypothesizer Preachase | Hypothesizer Completion | Hypothesizer Knowledge |
+|-----|--------------------------------------|----------------|------------------|----------------|------------------------|-------------------------|------------------------|------------------------|--------------------------|------------------------|
+| 25  | Multi-turn dialog with an agent       | 1.00           | 1.00             | 1.00           | 1.00                   | 1.00                    | 1.00                   | 1.00                   | 1.00                     | 1.00                   |
+| 26  | Measure an object with an instrument | 0.87           | 0.60             | 0.73           | 0.40                   | 1.00                    | 1.00                   | 1.00                   | 1.00                     | 1.00                   |
+| 27  | Pick-and-place object                | 0.90           | 0.80             | 0.80           | 0.60                   | 1.00                    | 1.00                   | 1.00                   | 1.00                     | 1.00                   |
+| 28  | Navigate to a specific room in a house | 0.55           | 0.20             | 0.20           | 0.20                   | 0.20                    | 0.20                   | 0.20                   | 0.20                     | 0.20                   |
+| 29  | Read DiscoveryFeed posts             | 1.00           | 1.00             | 0.90           | 0.80                   | 1.00                    | 1.00                   | 1.00                   | 1.00                     | 1.00                   |
+| 30  | Move through doors                   | 0.58           | 0.20             | 0.25           | 0.00                   | 0.30                    | 0.00                   | 0.30                   | 0.00                     | 0.00                   |
+| 31  | Using keys with doors                | 0.69           | 0.20             | 0.54           | 0.00                   | 0.69                    | 0.00                   | 0.69                   | 0.00                     | 0.00                   |
+| 32  | Navigate to a specific room in a house | 0.80           | 0.80             | 0.60           | 0.60                   | 1.00                    | 1.00                   | 1.00                   | 1.00                     | 1.00                   |
+| 33  | Search an environment for an object  | 0.60           | 0.20             | 0.53           | 0.00                   | 0.53                    | 0.20                   | 0.53                   | 0.20                     | 0.20                   |
+| 34  | Interact with a moving agent         | 0.76           | 0.60             | 0.66           | 0.44                   | 0.77                    | 0.64                   | 0.77                   | 0.64                     | 0.00                   |
+
+4.2 Baseline Agent Models
+
+The baseline agents are described below, with model performance on Discovery tasks shown in Table 4, and performance on Unit Tests shown in Table 5. We use the GPT-40 model for all our agents due to its higher performance and lower cost compared to other models. For space we provide
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat2.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat2.md
+Table 4: Baseline model performance on each of the three scoring metrics (task completion, task process, explanatory knowledge discovery) across all 24 DISCOVERYWORLD tasks. Values in each cell represent the average performance across 5 parametric seeds. Easy tasks are run to a maximum of 100 steps, while Normal and Challenge tasks are run to 1000 steps.
+
+| # | Topic   | Task                                | React Plan+Execute | Hypothesizer |
+|---|---------|-------------------------------------|-------------------|--------------|
+|   | Proteomics | Clustering                          | Pressure Completion Knowledge | Pressure Completion Knowledge | Pressure Completion Knowledge |
+| 1 | Easy     | Simplified Clustering               | 0.87 0.20 0.20    | 0.80 0.00 0.00 | 0.90 0.40 1.00 |
+| 2 | Normal   | Clustering (2D)                     | 0.88 0.40 0.40    | 0.68 0.20 0.00 | 0.93 0.40 0.40 |
+| 3 | Challenge| Clustering (3D)                     | 0.88 0.40 0.60    | 0.55 0.20 0.00 | 0.93 0.40 0.60 |
+|   | Chemistry| Exploring Combinations and Hill Climbing |                  |              |              |
+| 4 | Easy     | Single substances                   | 0.87 1.00 1.00    | 0.70 0.60 0.40 | 0.90 0.00 0.40 |
+| 5 | Normal   | Mix of 3 substances                 | 0.82 0.00 0.00    | 0.87 0.40 0.00 | 0.93 0.00 0.40 |
+| 6 | Challenge| Mix of 4 substances                 | 0.80 0.40 0.00    | 0.90 0.40 0.00 | 0.87 0.00 0.00 |
+|   | Archaeology| Correlations                       |                  |              |              |
+| 7 | Easy     | Simple instrument                   | 0.27 0.60 0.00    | 0.33 0.20 0.00 | 0.60 0.20 0.50 |
+| 8 | Normal   | Instrument Use                      | 0.72 0.40 0.30    | 0.74 0.00 0.00 | 0.64 0.40 0.40 |
+| 9 | Challenge| Correlation                          | 0.46 0.20 0.00    | 0.46 0.00 0.05 | 0.55 0.20 0.05 |
+|   | Reactor Lab| Regression                          |                  |              |              |
+| 10| Easy     | Slope only                          | 0.42 0.00 0.40    | 0.44 0.00 0.10 | 0.38 0.00 0.20 |
+| 11| Normal   | Linear regression                   | 0.44 0.00 0.20    | 0.49 0.00 0.00 | 0.51 0.00 0.00 |
+| 12| Challenge| Quadratic regression                | 0.43 0.00 0.20    | 0.39 0.00 0.00 | 0.39 0.00 0.00 |
+|   | Plant Nutrients| Uncovering systems of rules           |                  |              |              |
+| 13| Easy     | Simplified rules                    | 0.80 0.20 0.20    | 0.70 0.20 0.20 | 0.60 0.00 0.00 |
+| 14| Normal   | Presence rules                      | 0.91 0.60 0.00    | 0.84 0.40 0.00 | 0.56 0.00 0.00 |
+| 15| Challenge| Logical Rules                       | 0.89 0.40 0.00    | 0.73 0.40 0.00 | 0.62 0.00 0.00 |
+|   | Space Sick| Open-ended discovery                |                  |              |              |
+| 16| Easy     | Single instrument                   | 0.78 0.60 0.00    | 0.68 0.40 0.10 | 0.80 1.00 0.60 |
+| 17| Normal   | Multiple instruments                | 0.58 0.00 0.13    | 0.45 0.00 0.13 | 0.16 0.00 0.33 |
+| 18| Challenge| Novel instruments                   | 0.55 0.00 0.00    | 0.26 0.00 0.00 | 0.20 0.00 0.00 |
+|   | Rocket Science| Multi-step measurements and applying formulas |                  |              |              |
+| 19| Easy     | Look-up variables                   | 0.33 0.00 0.00    | 0.53 0.00 0.07 | 0.13 0.40 0.00 |
+| 20| Normal   | Measure 2 variables                 | 0.51 0.00 0.05    | 0.34 0.00 0.00 | 0.11 0.00 0.00 |
+| 21| Challenge| Measure 5 variables                 | 0.43 0.00 0.00    | 0.15 0.00 0.00 | 0.22 0.00 0.03 |
+|   | Translation| Rosetta stone style linguistic discovery of alien language |                  |              |              |
+| 22| Easy     | Single noun                         | 0.40 0.40 0.20    | 0.30 0.00 0.00 | 0.20 0.20 0.00 |
+| 23| Normal   | Noun and verb                       | 0.20 0.00 0.00    | 0.68 0.40 0.00 | 0.84 0.40 0.00 |
+| 24| Challenge| Noun, adj., and verb                | 0.49 0.00 0.00    | 0.55 0.20 0.05 | 0.15 0.00 0.00 |
+
+Table 5: Baseline model performance on each of the three scoring metrics (task completion, task process, explanatory knowledge discovery) across all 10 unit test tasks. Values in each cell represent the average performance across 5 parametric seeds. Unit tests tasks are run to a maximum of 100 steps.
+
+| # | Unit Test Topic | React Plan+Execute | Hypothesizer |
+|---|----------------|-------------------|-------------|
+|   |                | Pressure Completion | Pressure Completion | Pressure Completion |
+| 25| Multi-turn dialog with an agent | 1.00 1.00 | 1.00 1.00 | 1.00 1.00 |
+| 26| Measure an object with an instrument | 0.87 0.60 | 0.73 0.40 | 1.00 1.00 |
+| 27| Pick-and-place object | 0.90 0.80 | 0.80 0.60 | 1.00 1.00 |
+| 28| Read DiscoveryFeed posts | 1.00 1.00 | 0.90 0.80 | 1.00 1.00 |
+| 30| Move through doors | 0.58 0.20 | 0.25 0.00 | 0.30 0.00 |
+| 31| Using keys with doors | 0.69 0.20 | 0.54 0.00 | 0.69 0.00 |
+| 32| Navigate to a specific room in a house | 0.20 0.20 | 0.20 0.00 | 0.20 0.20 |
+| 33| Search an environment for an object | 0.80 0.80 | 0.60 0.60 | 1.00 1.00 |
+| 34| Interact with a moving agent | 0.60 0.20 | 0.53 0.00 | 0.53 0.20 |
+|   | Average (Unit Tests) | 0.76 0.60 | 0.66 0.44 | 0.72 0.64 |
+
+4.2 Baseline Agent Models
+
+The baseline agents are described below, with model performance on Discovery tasks shown in Table 4, and performance on Unit Tests shown in Table 5. We use the GPT-4O model for all our agents due to its higher performance and lower cost compared to other models. For space we provide
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat3.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat3.md
+Table 4: Baseline model performance on each of the three scoring metrics (task completion, task process, explanatory knowledge discovery) across all 24 DISCOVERY WORLD tasks. Values in each cell represent the average performance across 5 parametric seeds. Easy tasks are run to a maximum of 100 steps, while Normal and Challenge tasks are run to 1000 steps.
+
+| # | Topic         | Task                        | ReACT  | Plan+Execute | Hypothizer |
+|---|---------------|-----------------------------|--------|--------------|------------|
+|   |               | Procedure                  | Completion | Knowledge | Procedure | Completion | Knowledge | Procedure | Completion | Knowledge |
+|  1 | Proteomics    | Easy Simplified Clustering | 0.87  | 0.20         | 0.20       | 0.80       | 0.00       | 0.00       | 0.90       | 0.40       | 1.00       |
+|  2 |              | Normal Clustering (2D)      | 0.88  | 0.40         | 0.40       | 0.68       | 0.20       | 0.00       | 0.93       | 0.40       | 0.40       |
+|  3 |              | Challenge Clustering (3D)   | 0.88  | 0.40         | 0.40       | 0.55       | 0.20       | 0.00       | 0.93       | 0.40       | 0.60       |
+|  4 | Chemistry     | Easy Exploring Combinations and Hill Climbing | 0.87 | 1.00 | 1.00 | 0.70 | 0.60 | 0.40 | 0.90 | 0.00 | 0.40 |
+|  5 |              | Normal Mix of 3 substances  | 0.82  | 0.00         | 0.00       | 0.87       | 0.40       | 0.00       | 0.93       | 0.60       | 0.40       |
+|  6 |              | Challenge Mix of 4 substances | 0.90  | 0.40         | 0.00       | 0.90       | 0.40       | 0.00       | 0.87       | 0.00       | 0.00       |
+|  7 | Archaeology   | Easy Correlations           | 0.27  | 0.60         | 0.00       | 0.33       | 0.20       | 0.00       | 0.60       | 0.20       | 0.50       |
+|  8 |              | Normal Instrument Use       | 0.72  | 0.40         | 0.30       | 0.74       | 0.00       | 0.00       | 0.64       | 0.40       | 0.40       |
+|  9 |              | Challenge Correlation       | 0.46  | 0.20         | 0.00       | 0.46       | 0.00       | 0.05       | 0.55       | 0.20       | 0.05       |
+| 10 | Reactor Lab   | Easy Regression             | 0.42  | 0.00         | 0.40       | 0.44       | 0.00       | 0.10       | 0.38       | 0.00       | 0.20       |
+| 11 |              | Normal Quadratic regression | 0.44  | 0.00         | 0.20       | 0.49       | 0.00       | 0.00       | 0.51       | 0.00       | 0.00       |
+| 12 |              | Challenge Quadratic regression | 0.43 | 0.00        | 0.20       | 0.39       | 0.00       | 0.00       | 0.39       | 0.00       | 0.00       |
+| 13 | Plant Nutrients | Easy Simplified rules       | 0.80  | 0.20         | 0.20       | 0.70       | 0.20       | 0.20       | 0.60       | 0.00       | 0.00       |
+| 14 |              | Normal Presence rules       | 0.91  | 0.60         | 0.00       | 0.84       | 0.40       | 0.00       | 0.56       | 0.00       | 0.00       |
+| 15 |              | Challenge Logical Rules     | 0.89  | 0.40         | 0.00       | 0.73       | 0.40       | 0.00       | 0.62       | 0.00       | 0.00       |
+| 16 | Space Sick    | Easy Open-ended discovery    | 0.78  | 0.60         | 0.00       | 0.68       | 0.40       | 0.10       | 0.80       | 1.00       | 0.60       |
+| 17 |              | Normal Multiple instruments | 0.58  | 0.00         | 0.13       | 0.45       | 0.00       | 0.13       | 0.16       | 0.00       | 0.33       |
+| 18 |              | Challenge Novel instruments | 0.55  | 0.00         | 0.00       | 0.26       | 0.00       | 0.00       | 0.20       | 0.00       | 0.00       |
+| 19 | Rocket Science | Easy Look-up variables    | 0.33  | 0.00         | 0.00       | 0.53       | 0.00       | 0.07       | 0.13       | 0.40       | 0.00       |
+| 20 |              | Normal Measure 2 variables  | 0.51  | 0.00         | 0.05       | 0.34       | 0.00       | 0.00       | 0.11       | 0.00       | 0.00       |
+| 21 |              | Challenge Measure 5 variables | 0.43 | 0.00        | 0.00       | 0.15       | 0.00       | 0.00       | 0.22       | 0.00       | 0.03       |
+| 22 | Translation   | Easy Rosetta-stone style linguistic discovery of alien language | 0.40 | 0.40 | 0.20 | 0.30 | 0.00 | 0.00 | 0.20 | 0.20 | 0.00 |
+| 23 |              | Normal Noun and verb        | 0.20  | 0.00         | 0.00       | 0.68       | 0.40       | 0.00       | 0.84       | 0.40       | 0.00       |
+| 24 |              | Challenge Noun, adj., and verb | 0.49 | 0.00        | 0.00       | 0.55       | 0.20       | 0.05       | 0.15       | 0.00       | 0.00       |
+
+Average (Easy) | 0.59  | 0.38  | 0.25     | 0.56  | 0.18  | 0.11    | 0.56  | 0.28  | 0.34    |
+Average (Normal) | 0.63  | 0.18  | 0.14     | 0.64  | 0.18  | 0.02    | 0.58  | 0.23  | 0.19    |
+Average (Challenge) | 0.63  | 0.18  | 0.10     | 0.50  | 0.15  | 0.01    | 0.49  | 0.08  | 0.08    |
+
+Table 5: Baseline model performance on each of the three scoring metrics (task completion, task process, explanatory knowledge discovery) across all 10 unit test tasks. Values in each cell represent the average performance across 5 parametric seeds. Unit tests tasks are run to a maximum of 100 steps.
+
+| # | Unit Test Topic | ReACT  | Plan+Execute | Hypothizer |
+|---|----------------|--------|--------------|------------|
+|   |                | Procedure | Completion | Procedure | Completion | Knowledge | Procedure | Completion | Knowledge |
+| 25 | Multi-turn dialog with an agent | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
+| 26 | Measure an object with an instrument | 0.87 | 0.60 | 0.73 | 0.40 | 1.00 | 1.00 |
+| 27 | Pick-and-place object | 0.90 | 0.80 | 0.80 | 0.60 | 1.00 | 1.00 |
+| 28 | 29 | Pick-and-give object | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
+| 29 | Read DiscoveryFeed posts | 1.00 | 1.00 | 0.90 | 0.80 | 1.00 | 1.00 |
+| 30 | Move through doors | 0.58 | 0.20 | 0.25 | 0.00 | 0.30 | 0.00 |
+| 31 | Using keys with doors | 0.69 | 0.20 | 0.54 | 0.00 | 0.69 | 0.00 |
+| 32 | Navigate to a specific room in a house | 0.2 | 0.20 | 0.20 | 0.00 | 0.20 | 0.20 |
+| 33 | Search an environment for an object | 0.80 | 0.80 | 0.60 | 0.60 | 1.00 | 1.00 |
+| 34 | Interact with a moving agent | 0.60 | 0.20 | 0.53 | 0.00 | 0.53 | 0.20 |
+
+Average (Unit Tests) | 0.76 | 0.60 | 0.66 | 0.44 | 0.77 | 0.64 |
+
+4.2 Baseline Agent Models
+
+The baseline agents are described below, with model performance on Discovery tasks shown in Table 4, and performance on Unit Tests shown in Table 5. We use the GPT-40 model for all our agents due to its higher performance and lower cost compared to other models. For space we provide
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat4.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat4.md
+Table 4: Baseline model performance on each of the three scoring metrics (task completion, task process, explanatory knowledge discovery) across all 24 DISCOVERY WORLD tasks. Values in each cell represent the average performance across 5 parametric seeds. Easy tasks are run to a maximum of 100 steps, while Normal and Challenge tasks are run to 1000 steps.
+
+| # | Topic | Task | Task Completion | Knowledge | Procedure | Task Completion | Knowledge | Procedure | Task Completion | Knowledge | Procedure |
+|---|---|---|---|---|---|---|---|---|---|---|---|
+| 1 | Proteomics | Clustering | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 | 0.20 | 0.90 | 0.40 | 1.00 |
+| 2 | Chemistry | Exploring Combinations and Hill Climbing | 0.88 | 0.40 | 0.88 | 0.40 | 0.88 | 0.40 | 0.91 | 0.40 | 0.60 |
+| 4 | Archaeology | Correlations | 0.87 | 1.00 | 0.87 | 1.00 | 0.87 | 1.00 | 0.87 | 1.00 | 0.87 |
+| 7 | Reactor Lab | Regression | 0.27 | 0.60 | 0.27 | 0.60 | 0.27 | 0.60 | 0.60 | 0.20 | 0.50 |
+| 9 | Space Sick | Single instrument | 0.72 | 0.40 | 0.72 | 0.40 | 0.72 | 0.40 | 0.64 | 0.40 | 0.40 |
+| 10 | Plant Nutrients | Simplified rules | 0.46 | 0.20 | 0.46 | 0.20 | 0.46 | 0.20 | 0.55 | 0.20 | 0.05 |
+| 13 | Normal | Multiple instruments | 0.73 | 0.40 | 0.73 | 0.40 | 0.73 | 0.40 | 0.66 | 0.20 | 0.50 |
+| 15 | Challenge | Novel instruments | 0.21 | 0.40 | 0.21 | 0.40 | 0.21 | 0.40 | 0.56 | 0.40 | 0.40 |
+| 17 | Chemistry | Clustering (2D) | 0.31 | 0.40 | 0.31 | 0.40 | 0.31 | 0.40 | 0.52 | 0.40 | 0.40 |
+| 19 | Space Sick | Open-ended discovery | 0.88 | 0.40 | 0.88 | 0.40 | 0.88 | 0.40 | 0.88 | 0.40 | 0.88 |
+| 21 | Normal | Mix of 3 substances | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 |
+| 23 | Challenge | Mix of 4 substances | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 |
+| 26 | Plant Nutrients | Quadratic regression | 0.39 | 0.40 | 0.39 | 0.40 | 0.39 | 0.40 | 0.39 | 0.40 | 0.39 |
+
+Table 5: Baseline model performance on each of the three scoring metrics (task completion, task process, explanatory knowledge discovery) across all 10 unit test tasks. Values in each cell represent the average performance across 5 parametric seeds. Unit tests tasks are run to a maximum of 100 steps.
+
+| # | Unit Test Topic | Task Completion | Knowledge | Procedure | Task Completion | Knowledge | Procedure |
+|---|---|---|---|---|---|---|---|
+| 25 | Multi-turn dialog with an agent | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
+| 26 | Measure an object with an instrument | 0.88 | 0.60 | 0.88 | 0.60 | 0.88 | 0.60 |
+| 27 | Pick-and-place object | 0.55 | 0.40 | 0.55 | 0.40 | 0.55 | 0.40 |
+| 28 | Read DiscoveryFeed posts | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 | 1.00 |
+| 30 | Move through doors | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 | 0.20 |
+| 31 | Using keys with doors | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 | 0.20 |
+| 32 | Navigate to a specific room in a house | 0.80 | 0.20 | 0.80 | 0.20 | 0.80 | 0.20 |
+| 33 | Interact with a moving agent | 0.60 | 0.20 | 0.60 | 0.20 | 0.60 | 0.20 |
+
+4.2 Baseline Agent Models
+
+The baseline agents are described below, with model performance on Discovery tasks shown in Table 4, and performance on Unit Tests shown in Table 5. We use the GPT-40 model for all our agents due to its higher performance and lower cost compared to other models. For space we provide
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat5.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/discoverworld_crazy_table4_pg1_repeat5.md
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat1.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat1.md
+Recently Issued Accounting Pronouncements
+
+Recently Adopted Accounting Pronouncement
+
+In November 2023, the Financial Accounting Standards Board, or FASB, issued a new accounting standard requiring disclosures of significant expenses in operating segments. We adopted this standard in our fiscal year 2025 annual report. Refer to Note 16 of the Notes to the Consolidated Financial Statements in Part IV, Item 15 of this Annual Report on Form 10-K for further information.
+
+Recent Accounting Pronouncements Not Yet Adopted
+
+In December 2023, the FASB issued a new accounting standard which includes new and updated income tax disclosures, including disaggregation of information in the rate reconciliation and income taxes paid. We expect to adopt this standard in our fiscal year 2026 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+In November 2024, the FASB issued a new accounting standard requiring disclosures of certain additional expense information on an annual and interim basis, including, among other items, the amounts of purchases of inventory, employee compensation, depreciation and intangible asset amortization included within each income statement expense caption, as applicable. We expect to adopt this standard in our fiscal year 2028 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+Note 2 - Business Combination
+
+Termination of the Arm Share Purchase Agreement
+
+In February 2022, NVIDIA and SoftBank Group Corp, or SoftBank, announced the termination of the Share Purchase Agreement whereby NVIDIA would have acquired Arm from SoftBank. The parties agreed to terminate it due to significant regulatory challenges preventing the completion of the transaction. We recorded an acquisition termination cost of $1.4 billion in fiscal year 2023 reflecting the write-off of the prepayment provided at signing.
+
+Note 3 - Stock-Based Compensation
+
+Stock-based compensation expense is associated with RSUs, PSUs, market-based PSUs, and our ESPP.
+
+Consolidated Statements of Income include stock-based compensation expense, net of amounts capitalized into inventory and subsequently recognized to cost of revenue, as follows:
+
+|                      | Year Ended | Jan 29, 2023 |
+|----------------------|------------|--------------|
+|                      | Jan 28, 2024 | Jan 29, 2023 |
+| Cost of revenue      | $178       | $141         | $138         |
+| Research and development | 3,423     | 2,532        | 1,892        |
+| Sales, general and administrative | 1,136      | 876          | 680          |
+| Total                | $4,737     | $3,549       | $2,710       |
+
+Stock-based compensation capitalized in inventories was not significant during fiscal years 2025, 2024, and 2023.
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat2.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat2.md
+Recently Issued Accounting Pronouncements
+
+Recently Adopted Accounting Pronouncement
+
+In November 2023, the Financial Accounting Standards Board, or FASB, issued a new accounting standard requiring disclosures of significant expenses in operating segments. We adopted this standard in our fiscal year 2025 annual report. Refer to Note 16 of the Notes to the Consolidated Financial Statements in Part IV, Item 15 of this Annual Report on Form 10-K for further information.
+
+Recent Accounting Pronouncements Not Yet Adopted
+
+In December 2023, the FASB issued a new accounting standard which includes new and updated income tax disclosures, including disaggregation of information in the rate reconciliation and income taxes paid. We expect to adopt this standard in our fiscal year 2026 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+In November 2024, the FASB issued a new accounting standard requiring disclosures of certain additional expense information on an annual and interim basis, including, among other items, the amounts of purchases of inventory, employee compensation, depreciation and intangible asset amortization included within each income statement expense caption, as applicable. We expect to adopt this standard in our fiscal year 2028 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+Note 2 - Business Combination
+
+Termination of the Arm Share Purchase Agreement
+
+In February 2022, NVIDIA and SoftBank Group Corp, or SoftBank, announced the termination of the Share Purchase Agreement whereby NVIDIA would have acquired Arm from SoftBank. The parties agreed to terminate it due to significant regulatory challenges preventing the completion of the transaction. We recorded an acquisition termination cost of $1.4 billion in fiscal year 2023 reflecting the write-off of the prepayment provided at signing.
+
+Note 3 - Stock-Based Compensation
+
+Stock-based compensation expense is associated with RSUs, PSUs, market-based PSUs, and our ESPP.
+
+Consolidated Statements of Income include stock-based compensation expense, net of amounts capitalized into inventory and subsequently recognized to cost of revenue, as follows:
+
+|                     | Jan 29, 2025 | Jan 28, 2024 | Jan 29, 2023 |
+|---------------------|-------------|-------------|-------------|
+| Cost of revenue     | $178        | $141        | $138        |
+| Research and development | 3,423       | 2,532       | 1,892       |
+| Sales, general and administrative | 1,136       | 876         | 680         |
+| Total               | $4,737      | $3,549      | $2,710      |
+
+Stock-based compensation capitalized in inventories was not significant during fiscal years 2025, 2024, and 2023.
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat3.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat3.md
+Recently Issued Accounting Pronouncements
+
+Recently Adopted Accounting Pronouncement
+
+In November 2023, the Financial Accounting Standards Board, or FASB, issued a new accounting standard requiring disclosures of significant expenses in operating segments. We adopted this standard in our fiscal year 2025 annual report. Refer to Note 16 of the Notes to the Consolidated Financial Statements in Part IV, Item 15 of this Annual Report on Form 10-K for further information.
+
+Recent Accounting Pronouncements Not Yet Adopted
+
+In December 2023, the FASB issued a new accounting standard which includes new and updated income tax disclosures, including disaggregation of information in the rate reconciliation and income taxes paid. We expect to adopt this standard in our fiscal year 2026 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+In November 2024, the FASB issued a new accounting standard requiring disclosures of certain additional expense information on an annual and interim basis, including, among other items, the amounts of purchases of inventory, employee compensation, depreciation and intangible asset amortization included within each income statement expense caption, as applicable. We expect to adopt this standard in our fiscal year 2025 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+Note 2 - Business Combination
+
+Termination of the Arm Share Purchase Agreement
+
+In February 2022, NVIDIA and SoftBank Group Corp, or SoftBank, announced the termination of the Share Purchase Agreement whereby NVIDIA would have acquired Arm from SoftBank. The parties agreed to terminate it due to significant regulatory challenges preventing the completion of the transaction. We recorded an acquisition termination cost of $1.4 billion in fiscal year 2023 reflecting the write-off of the prepayment provided at signing.
+
+Note 3 - Stock-Based Compensation
+
+Stock-based compensation expense is associated with RSUs, PSUs, market-based PSUs, and our ESPP.
+
+Consolidated Statements of Income include stock-based compensation expense, net of amounts capitalized into inventory and subsequently recognized to cost of revenue, as follows:
+
+|                         | Jan 26, 2025 | Jan 28, 2024 | Jan 29, 2023 |
+|-------------------------|-------------|-------------|-------------|
+| Cost of revenue         | $178        | $141        | $138        |
+| Research and development| 3,423       | 2,532       | 1,892       |
+| Sales, general and administrative | 1,136 | 876 | 680 |
+| Total                   | $4,737      | $3,549      | $2,710      |
+
+Stock-based compensation capitalized in inventories was not significant during fiscal years 2025, 2024, and 2023.
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat4.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat4.md
+Recently Issued Accounting Pronouncements
+
+Recently Adopted Accounting Pronouncement
+
+In November 2023, the Financial Accounting Standards Board, or FASB, issued a new accounting standard requiring disclosures of significant expenses in operating segments. We adopted this standard in our fiscal year 2025 annual report. Refer to Note 16 of the Notes to the Consolidated Financial Statements in Part IV, Item 15 of this Annual Report on Form 10-K for further information.
+
+Recent Accounting Pronouncements Not Yet Adopted
+
+In December 2023, the FASB issued a new accounting standard which includes new and updated income tax disclosures, including disaggregation of information in the rate reconciliation and income taxes paid. We expect to adopt this standard in our fiscal year 2026 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+In November 2024, the FASB issued a new accounting standard requiring disclosures of certain additional expense information on an annual and interim basis, including, among other items, the amounts of purchases of inventory, employee compensation, depreciation and intangible asset amortization included within each income statement expense caption, as applicable. We expect to adopt this standard in our fiscal year 2026 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+Note 2 - Business Combination
+
+Termination of the Arm Share Purchase Agreement
+
+In February 2022, NVIDIA and SoftBank Group Corp, or SoftBank, announced the termination of the Share Purchase Agreement whereby NVIDIA would have acquired Arm from SoftBank. The parties agreed to terminate it due to significant regulatory challenges preventing the completion of the transaction. We recorded an acquisition termination cost of $1.4 billion in fiscal year 2023 reflecting the write-off of the prepayment provided at signing.
+
+Note 3 - Stock-Based Compensation
+
+Stock-based compensation expense is associated with RSUs, PSUs, market-based PSUs, and our ESPP.
+
+Consolidated Statements of Income include stock-based compensation expense, net of amounts capitalized into inventory and subsequently recognized to cost of revenue, as follows:
+
+|                      | Jan 29, 2023 | Jan 28, 2024 | Jan 29, 2023 |
+|----------------------|-------------|-------------|-------------|
+| Cost of revenue      | $ 138       | $ 141       | $ 182       |
+| Research and development | 1,892    | 1,892       | 1,892       |
+| Sales, general and administrative | 680    | 680         | 680         |
+| Total                | $ 4,737     | $ 3,549     | $ 2,710     |
+
+Stock-based compensation capitalized in inventories was not significant during fiscal years 2025, 2024, and 2023.
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat5.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/earnings_pg1_repeat5.md
+Recently Issued Accounting Pronouncements
+
+Recently Adopted Accounting Pronouncement
+In November 2023, the Financial Accounting Standards Board, or FASB, issued a new accounting standard requiring disclosures of significant expenses in operating segments. We adopted this standard in our fiscal year 2025 annual report. Refer to Note 16 of the Notes to the Consolidated Financial Statements in Part IV, Item 15 of this Annual Report on Form 10-K for further information.
+
+Recent Accounting Pronouncements Not Yet Adopted
+In December 2023, the FASB issued a new accounting standard which includes new and updated income tax disclosures, including disaggregation of information in the rate reconciliation and income taxes paid. We expect to adopt this standard in our fiscal year 2026 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+In November 2024, the FASB issued a new accounting standard requiring disclosures of certain additional expense information on an annual and interim basis, including, among other items, the amounts of purchases of inventory, employee compensation, depreciation and intangible asset amortization included within each income statement expense caption, as applicable. We expect to adopt this standard in our fiscal year 2026 annual report. We do not expect the adoption of this standard to have a material impact on our Consolidated Financial Statements other than additional disclosures.
+
+Note 2 - Business Combination
+
+Termination of the Arm Share Purchase Agreement
+In February 2022, NVIDIA and SoftBank Group Corp, or SoftBank, announced the termination of the Share Purchase Agreement whereby NVIDIA would have acquired Arm from SoftBank. The parties agreed to terminate it due to significant regulatory challenges preventing the completion of the transaction. We recorded an acquisition termination cost of $1.4 billion in fiscal year 2023 reflecting the write-off of the prepayment provided at signing.
+
+Note 3 - Stock-Based Compensation
+
+Stock-based compensation expense is associated with RSUs, PSUs, market-based PSUs, and our ESPP.
+
+Consolidated Statements of Income include stock-based compensation expense, net of amounts capitalized into inventory and subsequently recognized to cost of revenue, as follows:
+
+| Year Ended | Jan 29, 2023 | Jan 28, 2024 | Jan 29, 2023 |
+|------------|--------------|--------------|--------------|
+| (In millions) | (In millions) | (In millions) | (In millions) |
+| Cost of revenue | $178 | $141 | $138 |
+| Research and development | 3,423 | 2,532 | 1,892 |
+| Sales, general and administrative | 1,136 | 876 | 680 |
+| Total | $4,737 | $3,549 | $2,710 |
+
+Stock-based compensation capitalized in inventories was not significant during fiscal years 2025, 2024, and 2023.
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/lincoln_letter_pg1_repeat1.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/lincoln_letter_pg1_repeat1.md
+Executive Mansion,
+
+Washington City,
+
+January 15th, 1864
+
+Major General Hitchcock, Commissioner of Exchanges, is authorized and directed to offer Brigadier General Trimble, now a prisoner of war in Fort McHenry, in exchange for Major White, who is held as a prisoner at Richmond. He is also directed to send forward the offer of exchange by Henry M. Warfield, Esq. of Baltimore, under a flag of truce, and give him a pass to City Point.
+
+Abraham Lincoln
\ No newline at end of file
--- a/olmocr/bench/sample_data/olmocr_base_temp0_8/lincoln_letter_pg1_repeat2.md
+++ b/olmocr/bench/sample_data/olmocr_base_temp0_8/lincoln_letter_pg1_repeat2.md
+Executive Mansion,
+
+Washington City,
+
+January 15th, 1864
+
+Major General Hitchcock, Commissioner of Exchanges, is authorized and directed to offer Brigadier General Trimble, now a prisoner of war in Fort McHenry, in exchange for Major White, who is held as a prisoner at Richmond. He is also directed to send forward the offer of exchange by Henry M. Warfield, Esq. of Baltimore, under a flag of truce, and give him a pass to City Point.
+
+Abraham Lincoln
\ No newline at end of file