Pricing

from $0.10 / financial statements fetched

Taiwan Listed Companies — Quarterly Financials

Fetch quarterly income statements, balance sheets, and cash flow statements for every TWSE/OTC-listed Taiwanese company — translated into English with typed numeric fields. Data sourced directly from Taiwan's official financial disclosure system (MOPS), updated each quarter.

Pricing

from $0.10 / financial statements fetched

Rating

0.0

(0)

Developer

Steven C

Actor stats

Bookmarked

Total users

Monthly active users

18 days ago

Last modified

SEC Financials Scraper

xtech/sec-growth-metrics

Extract comprehensive financial statements (balance sheets, income statements, cash flow) for US public companies. Ideal for financial analysis, investment research, and competitive intelligence.

Xtech

Financial Statements Scraper

automation-lab/financial-statements-scraper

Extract clean financial data from SEC EDGAR XBRL API. Enter stock tickers, get structured income statements, balance sheets, and cash flow data. Supports annual and quarterly periods with automatic concept fallback resolution.

Stas Persiianenko

Financial Datasets MCP Server

agentify/financial-datasets-mcp-server

This MCP server provides access to stock market data from Financial Datasets. It allows Claude and other AI assistants to retrieve income statements, balance sheets, cash flow statements, stock prices, and market news directly through the MCP interface.

agentify

Taiwan Government Data API

lentic_clockss/taiwan-data-search

Search Taiwan market, health, environment, nonprofit, and public-data sources in one run. Get structured Taiwan records fast.

kane liu

Poland KRS Financial Statements Scraper

regdata/poland-krs-financial-scraper

Extract official financial statements - balance sheets, income statements, assets, equity, revenue, net profit - from Poland's National Court Register (KRS). Requires residential proxy - portal protected by Incapsula WAF.

regdata

Taiwan Stock Institutional Trades (TWSE) - 三大法人買賣超

chamarix/twse-institutional-trades

Daily foreign/trust/dealer net buy-sell data for all 1,300+ Taiwan Stock Exchange (TWSE) listed stocks. Official T86 data feed, structured JSON, since 2012.

chris

Stockanalysis Scraper

fortuitous_pirate/stockanalysis-scraper

Scrape financial data from StockAnalysis.com — income statements, balance sheets, cash flow, ratios, statistics, forecasts, and more. Supports any US stock ticker. No API key required.

Fortuitous Pirate

SEC 13F Investment Quarterly Report Scraper

kenshinsee/sec-13f-investment-quarterly-report-scraper

Scrapes the quarterly report for a given investment's name and quarter year from http://13f.info. Download your data as JSON, CSV, Excel, RSS and etc.

Hong Hu

SEC 13F Manager Quarterly Report Scraper

kenshinsee/sec-13f-manager-quarterly-report-scraper

Scrapes the quarterly report for a given manager's name and quarter year from http://13f.info. Download your data as JSON, CSV, Excel, RSS and etc.

Hong Hu

Taiwan ADR Premium Tracker

quiet_dictionary/taiwan-adr-premium-tracker

Track real-time premium and discount between Taiwan ADRs (NYSE/NASDAQ) and their underlying TWSE shares with FX normalization.

Steven C

{ "actorSpecification": 1, "name": "taiwan-mops-financial-statements", "title": "Taiwan MOPS Financial Statements (English)", "description": "Fetches quarterly income statements, balance sheets, and cash flow statements for all TWSE/OTC-listed Taiwanese companies from MOPS, translated into English with typed numeric fields.", "version": "1.0", "buildTag": "latest", "environmentVariables": {}, "defaultRunOptions": { "build": "latest", "memoryMbytes": 256, "timeoutSecs": 120 }, "storages": { "dataset": { "actorSpecification": 1, "fields": {}, "views": { "overview": { "title": "Financial Statements", "transformation": { "fields": [ "period", "statement_type", "stock_id", "company_name", "exchange", "revenue", "net_income", "eps_basic", "total_assets", "total_equity", "operating_cash_flow" ] }, "display": { "component": "table", "properties": { "stock_id": { "label": "Stock ID", "format": "text" }, "company_name": { "label": "Company", "format": "text" }, "period": { "label": "Period", "format": "text" }, "statement_type": { "label": "Statement", "format": "text" }, "exchange": { "label": "Exchange", "format": "text" }, "revenue": { "label": "Revenue (TWD k)", "format": "number" }, "net_income": { "label": "Net Income (TWD k)", "format": "number" }, "eps_basic": { "label": "EPS (TWD)", "format": "number" }, "total_assets": { "label": "Total Assets", "format": "number" }, "total_equity": { "label": "Total Equity", "format": "number" }, "operating_cash_flow": { "label": "Operating CF", "format": "number" } } } } } } } }

{ "title": "Taiwan MOPS Financial Statements Input", "description": "Fetches quarterly financial statements (income, balance sheet, cash flow) for all TWSE/OTC listed companies from Taiwan's MOPS system.", "type": "object", "schemaVersion": 1, "properties": { "statement_types": { "title": "Statement types to fetch", "type": "array", "description": "Which financial statements to download. Options: income (income statement), balance (balance sheet), cashflow (cash flow statement). Leave blank to fetch all three.", "editor": "json", "default": ["income", "balance", "cashflow"], "prefill": ["income", "balance", "cashflow"] }, "year": { "title": "Year (Gregorian)", "type": "integer", "description": "Gregorian year of the report (e.g. 2024). Defaults to the most recent completed quarter's year.", "editor": "number", "minimum": 1912, "maximum": 2100 }, "season": { "title": "Quarter (1–4)", "type": "integer", "description": "Quarter number: 1 = Q1 (Jan–Mar), 2 = Q2, 3 = Q3, 4 = Q4. Defaults to the most recent completed quarter.", "editor": "number", "minimum": 1, "maximum": 4 }, "exchange": { "title": "Exchange", "type": "string", "description": "Which exchange to fetch: 'listed' = TWSE main board, 'otc' = OTC/TPEX.", "editor": "select", "default": "listed", "prefill": "listed", "enum": ["listed", "otc"] } } }

1""" 2MOPS data fetcher. 3 42-step flow: 5 1. POST to /mops/api/redirectToOld with JSON body → receive signed redirect URL 6 2. GET the signed URL → returns HTML with the financial data table 7 8No CAPTCHA, no CSRF, no session cookies needed. 9The Origin header is required to bypass basic referrer checks. 10""" 11 12from __future__ import annotations 13 14import logging 15 16import httpx 17 18from parser import API_NAME_MAP, TYPEK_MAP, gregorian_to_roc 19 20logger = logging.getLogger(__name__) 21 22MOPS_BASE = "https://mops.twse.com.tw" 23REDIRECT_ENDPOINT = f"{MOPS_BASE}/mops/api/redirectToOld" 24 25# Timeout for each HTTP request (seconds) 26REQUEST_TIMEOUT = 30 27 28HEADERS = { 29 "Origin": MOPS_BASE, 30 "Referer": f"{MOPS_BASE}/mops/web/index", 31 "User-Agent": ( 32 "Mozilla/5.0 (compatible; ProjectPI-MOPS-Fetcher/1.0; " 33 "+https://apify.com/quiet_dictionary)" 34 ), 35} 36 37 38def _build_payload( 39 statement_type: str, 40 year: int, 41 season: int, 42 exchange: str, 43) -> dict: 44 """Build the POST body for the MOPS redirectToOld endpoint.""" 45 api_name = API_NAME_MAP[statement_type] 46 typek = TYPEK_MAP[exchange] 47 roc_year = gregorian_to_roc(year) 48 49 return { 50 "apiName": api_name, 51 "parameters": { 52 "encodeURIComponent": 1, 53 "firstin": 1, 54 "off": 1, 55 "step": 1, 56 "isQuery": "Y", 57 "TYPEK": typek, 58 "year": str(roc_year), 59 "season": f"{season:02d}", 60 }, 61 } 62 63 64def fetch_statement_html( 65 statement_type: str, 66 year: int, 67 season: int, 68 exchange: str, 69 client: httpx.Client | None = None, 70) -> str: 71 """ 72 Fetch the raw HTML for a MOPS financial statement table. 73 74 Args: 75 statement_type: "income", "balance", or "cashflow". 76 year: Gregorian year (e.g. 2024). 77 season: Quarter 1–4. 78 exchange: "listed" or "otc". 79 client: Optional httpx.Client for connection reuse / testing. 80 81 Returns: 82 Raw HTML string of the data page. 83 84 Raises: 85 httpx.HTTPStatusError: On non-2xx response. 86 ValueError: If redirect URL not found in response. 87 """ 88 payload = _build_payload(statement_type, year, season, exchange) 89 own_client = client is None 90 if own_client: 91 # MOPS uses a non-standard SSL certificate (missing Subject Key Identifier) 92 # that Python's ssl module rejects. verify=False is intentional here. 93 client = httpx.Client(headers=HEADERS, timeout=REQUEST_TIMEOUT, follow_redirects=True, verify=False) 94 95 try: 96 # Step 1 — POST to get signed redirect URL 97 logger.debug("MOPS step 1: POST %s api=%s", REDIRECT_ENDPOINT, payload["apiName"]) 98 r1 = client.post(REDIRECT_ENDPOINT, json=payload) 99 r1.raise_for_status() 100 101 data = r1.json() 102 redirect_url = ( 103 data.get("url") 104 or data.get("redirectUrl") 105 or data.get("data", {}).get("url") 106 or data.get("result", {}).get("url") 107 ) 108 if not redirect_url: 109 raise ValueError( 110 f"MOPS did not return a redirect URL. " 111 f"Response keys: {list(data.keys())}. Body: {str(data)[:300]}" 112 ) 113 114 # Ensure absolute URL 115 if redirect_url.startswith("/"): 116 redirect_url = f"{MOPS_BASE}{redirect_url}" 117 118 # Step 2 — GET the signed URL → HTML table 119 logger.debug("MOPS step 2: GET %s", redirect_url) 120 r2 = client.get(redirect_url) 121 r2.raise_for_status() 122 123 return r2.text 124 125 finally: 126 if own_client: 127 client.close()

1""" 2Actor 04 — MOPS Financial Statements (English) 3 4Fetches quarterly financial statements for all TWSE/OTC listed companies 5from Taiwan's MOPS system and returns them in English with typed fields. 6 7PPE pricing: $0.10 per statement type fetched (one charge per successful 8statement download, regardless of how many companies are in the result). 9""" 10 11from __future__ import annotations 12 13import datetime 14import logging 15 16from apify import Actor 17 18from fetcher import fetch_statement_html 19from parser import ( 20 TYPEK_MAP, 21 VALID_STATEMENT_TYPES, 22 gregorian_to_roc, 23 parse_table, 24) 25 26logger = logging.getLogger(__name__) 27 28# Default: most recent completed quarter 29_NOW = datetime.datetime.utcnow() 30_DEFAULT_YEAR = _NOW.year if _NOW.month > 3 else _NOW.year - 1 31_DEFAULT_SEASON = ((_NOW.month - 1) // 3) or 4 # 1-4; lag one quarter 32 33 34def _resolve_defaults(year: int | None, season: int | None) -> tuple[int, int]: 35 """Return (year, season) falling back to the most recent completed quarter.""" 36 if year is None: 37 year = _DEFAULT_YEAR 38 if season is None: 39 season = _DEFAULT_SEASON 40 return year, season 41 42 43async def main() -> None: 44 async with Actor: 45 raw = await Actor.get_input() or {} 46 47 # ------------------------------------------------------------------ 48 # Input validation 49 # ------------------------------------------------------------------ 50 statement_types = raw.get("statement_types", VALID_STATEMENT_TYPES) 51 if not isinstance(statement_types, list) or not statement_types: 52 await Actor.fail( 53 status_message=( 54 "Input 'statement_types' must be a non-empty list. " 55 f"Valid options: {VALID_STATEMENT_TYPES}" 56 ) 57 ) 58 return 59 60 statement_types = [s for s in statement_types if isinstance(s, str)] 61 invalid = [s for s in statement_types if s not in VALID_STATEMENT_TYPES] 62 if invalid: 63 await Actor.fail( 64 status_message=( 65 f"Unknown statement type(s): {invalid}. " 66 f"Valid options: {VALID_STATEMENT_TYPES}" 67 ) 68 ) 69 return 70 71 exchange = raw.get("exchange", "listed") 72 if exchange not in TYPEK_MAP: 73 await Actor.fail( 74 status_message=( 75 f"Invalid exchange '{exchange}'. " 76 f"Valid options: {list(TYPEK_MAP.keys())}" 77 ) 78 ) 79 return 80 81 year_raw = raw.get("year") 82 season_raw = raw.get("season") 83 84 try: 85 year_input = int(year_raw) if year_raw is not None else None 86 season_input = int(season_raw) if season_raw is not None else None 87 except (TypeError, ValueError): 88 await Actor.fail(status_message="'year' and 'season' must be integers.") 89 return 90 91 year, season = _resolve_defaults(year_input, season_input) 92 93 if not (1912 <= year <= 2100): 94 await Actor.fail(status_message=f"'year' must be between 1912 and 2100, got {year}.") 95 return 96 97 if season not in (1, 2, 3, 4): 98 await Actor.fail(status_message=f"'season' must be 1–4, got {season}.") 99 return 100 101 roc_year = gregorian_to_roc(year) 102 Actor.log.info( 103 "Fetching %s for %s listed companies — ROC %d Q%d (%d Q%d)", 104 statement_types, exchange, roc_year, season, year, season, 105 ) 106 107 # ------------------------------------------------------------------ 108 # Fetch and parse each requested statement type 109 # ------------------------------------------------------------------ 110 all_rows: list[dict] = [] 111 successful_types: list[str] = [] 112 errors: list[str] = [] 113 114 for stype in statement_types: 115 try: 116 Actor.log.info("Fetching %s statement...", stype) 117 html = fetch_statement_html(stype, year, season, exchange) 118 rows = parse_table(html, stype, year, season, exchange) 119 120 if not rows: 121 errors.append(f"{stype}: parsed 0 rows (MOPS may not have released this data yet)") 122 Actor.log.warning("No rows parsed for %s", stype) 123 continue 124 125 # Tag each row with statement_type for multi-type runs 126 for row in rows: 127 row["statement_type"] = stype 128 129 all_rows.extend(rows) 130 successful_types.append(stype) 131 Actor.log.info(" → %d companies parsed for %s", len(rows), stype) 132 133 except Exception as exc: # noqa: BLE001 134 msg = f"{stype}: {exc}" 135 errors.append(msg) 136 Actor.log.error("Error fetching %s: %s", stype, exc) 137 138 # ------------------------------------------------------------------ 139 # Build output envelope 140 # ------------------------------------------------------------------ 141 output = { 142 "year": year, 143 "season": season, 144 "roc_year": roc_year, 145 "exchange": exchange, 146 "statement_types_requested": statement_types, 147 "statement_types_successful": successful_types, 148 "total_rows": len(all_rows), 149 "errors": errors if errors else None, 150 "results": all_rows, 151 } 152 153 # ------------------------------------------------------------------ 154 # PPE charge — one event per successfully downloaded statement type 155 # Charge only fires when we actually produced data. 156 # ------------------------------------------------------------------ 157 if successful_types: 158 await Actor.push_data(output, "data-fetched") 159 Actor.log.info( 160 "PPE charge fired: %d statement type(s) successful (%s)", 161 len(successful_types), 162 successful_types, 163 ) 164 else: 165 Actor.log.warning( 166 "No statement types succeeded — pushing output without PPE charge. " 167 "Errors: %s", 168 errors, 169 ) 170 await Actor.push_data(output) 171 172 173if __name__ == "__main__": 174 import asyncio 175 asyncio.run(main())

1""" 2MOPS HTML table parser. 3 4MOPS returns multiple tables per page — one per industry type (banks, 5securities firms, insurance, general manufacturers, etc.). Each table 6has slightly different column headers because accounting standards 7differ by industry. This parser handles all tables and normalises 8them to a common output schema. 9 10Columns that exist in all industry types (always present): 11 stock_id, company_name, net_income, eps_basic, eps_diluted 12 13Columns present only in general manufacturing companies (Table 2, ~1,000 firms): 14 revenue, gross_profit, operating_income, pretax_income 15 16Financial-sector companies (banks, securities, insurance) use different 17terminology; those fields are set to None for non-applicable rows. 18 19Pure functions — no network calls, no side effects. 20""" 21 22from __future__ import annotations 23 24import math 25from bs4 import BeautifulSoup 26 27 28# --------------------------------------------------------------------------- 29# Column translation — maps every Chinese header we know about → English key 30# --------------------------------------------------------------------------- 31 32# Shared columns present in every table type 33COMMON_COLUMNS: dict[str, str] = { 34 "公司代號": "stock_id", 35 "公司名稱": "company_name", 36 # Net income variants across industry types 37 "本期淨利（淨損）": "net_income", 38 "繼續營業單位本期稅後淨利（淨損）": "net_income", 39 "繼續營業單位本期淨利（淨損）": "net_income", 40 "本期稅後淨利（淨損）": "net_income", 41 # EPS 42 "基本每股盈餘（元）": "eps_basic", 43 "基本每股盈餘": "eps_basic", 44 "稀釋每股盈餘（元）": "eps_diluted", 45 "稀釋每股盈餘": "eps_diluted", 46 # Tax 47 "所得稅費用（利益）": "income_tax", 48 "所得稅（費用）利益": "income_tax", 49 # Pretax income variants 50 "稅前淨利（淨損）": "pretax_income", 51 "繼續營業單位稅前淨利（淨損）": "pretax_income", 52 "繼續營業單位稅前損益": "pretax_income", 53 # Comprehensive income 54 "本期綜合損益總額": "total_comprehensive_income", 55 "本期其他綜合損益（稅後）": "other_comprehensive_income", 56} 57 58# General manufacturing / standard IFRS columns 59GENERAL_COLUMNS: dict[str, str] = { 60 "營業收入": "revenue", 61 "營業成本": "cost_of_revenue", 62 "營業毛利（毛損）": "gross_profit", 63 "營業毛利（毛損）淨額": "gross_profit", 64 "營業利益（損失）": "operating_income", 65 "營業外收入及支出": "non_operating_income", 66} 67 68# Bank-specific columns 69BANK_COLUMNS: dict[str, str] = { 70 "利息淨收益": "interest_net_income", 71 "利息以外淨損益": "non_interest_net_income", 72 "利息以外淨收益": "non_interest_net_income", 73 "呆帳費用、承諾及保證責任準備提存": "provision_for_bad_debt", 74 "營業費用": "operating_expenses", 75 "淨收益": "net_revenue", 76 "收益": "revenue_financial", 77 "支出及費用": "expenses", 78} 79 80# Balance sheet columns — maps Chinese headers → English keys 81# Two variants exist: financial-sector firms use 總額, general manufacturers use 總計 82BALANCE_COLUMNS: dict[str, str] = { 83 "流動資產": "current_assets", 84 "非流動資產": "non_current_assets", 85 "資產總計": "total_assets", 86 "資產總額": "total_assets", 87 "流動負債": "current_liabilities", 88 "非流動負債": "non_current_liabilities", 89 "負債總計": "total_liabilities", 90 "負債總額": "total_liabilities", 91 "權益總計": "total_equity", 92 "權益總額": "total_equity", 93 # Some sub-tables report parent-company equity as the top-level equity line 94 "歸屬於母公司業主之權益合計": "total_equity", 95 "歸屬於母公司業主權益合計": "total_equity", 96} 97 98# Cash flow columns 99CASHFLOW_COLUMNS: dict[str, str] = { 100 "營業活動之淨現金流入（流出）": "operating_cash_flow", 101 "投資活動之淨現金流入（流出）": "investing_cash_flow", 102 "籌資活動之淨現金流入（流出）": "financing_cash_flow", 103 "匯率變動對現金及約當現金之影響": "fx_effect_on_cash", 104 "本期現金及約當現金增加（減少）數": "net_cash_change", 105 "期初現金及約當現金餘額": "beginning_cash", 106 "期末現金及約當現金餘額": "ending_cash", 107} 108 109# Merge all column maps into one lookup 110ALL_COLUMNS: dict[str, str] = { 111 **COMMON_COLUMNS, 112 **GENERAL_COLUMNS, 113 **BANK_COLUMNS, 114 **BALANCE_COLUMNS, 115 **CASHFLOW_COLUMNS, 116} 117 118# Canonical output fields (always present, None if not applicable for that statement type) 119OUTPUT_FIELDS = [ 120 # Identity 121 "stock_id", "company_name", "period", "year", "season", "exchange", 122 # Income statement 123 "revenue", "gross_profit", "operating_income", "pretax_income", 124 "net_income", "eps_basic", "eps_diluted", "income_tax", 125 "total_comprehensive_income", 126 # Balance sheet 127 "current_assets", "non_current_assets", "total_assets", 128 "current_liabilities", "non_current_liabilities", "total_liabilities", 129 "total_equity", 130 # Cash flow 131 "operating_cash_flow", "investing_cash_flow", "financing_cash_flow", 132 "fx_effect_on_cash", "net_cash_change", "beginning_cash", "ending_cash", 133] 134 135 136# --------------------------------------------------------------------------- 137# Registry / constants 138# --------------------------------------------------------------------------- 139 140TYPEK_MAP: dict[str, str] = { 141 "listed": "sii", # TWSE main board 142 "otc": "otc", # OTC / TPEX 143} 144 145API_NAME_MAP: dict[str, str] = { 146 "income": "ajax_t163sb04", 147 "balance": "ajax_t163sb05", 148 "cashflow": "ajax_t163sb20", 149} 150 151VALID_STATEMENT_TYPES = list(API_NAME_MAP.keys()) 152 153 154# --------------------------------------------------------------------------- 155# ROC year helpers 156# --------------------------------------------------------------------------- 157 158def gregorian_to_roc(year: int) -> int: 159 """Convert Gregorian year to Republic of China year (e.g. 2024 → 113).""" 160 roc = year - 1911 161 if roc < 1: 162 raise ValueError(f"Year {year} is before the ROC calendar starts (1912)") 163 return roc 164 165 166def roc_to_gregorian(roc_year: int) -> int: 167 """Convert ROC year to Gregorian year (e.g. 113 → 2024).""" 168 return roc_year + 1911 169 170 171# --------------------------------------------------------------------------- 172# Number parsing 173# --------------------------------------------------------------------------- 174 175def _clean_number(value: str) -> float | None: 176 """ 177 Convert a MOPS number string to float. 178 179 MOPS uses commas as thousand separators and parentheses for negatives. 180 Examples: "1,234,567" → 1234567.0 | "(56,789)" → -56789.0 | "--" → None 181 """ 182 v = value.strip() 183 if not v or v in ("--", "－", "N/A", ""): 184 return None 185 negative = v.startswith("(") and v.endswith(")") 186 v = v.strip("()").replace(",", "").strip() 187 try: 188 num = float(v) 189 if not math.isfinite(num): 190 return None 191 return -num if negative else num 192 except ValueError: 193 return None 194 195 196# --------------------------------------------------------------------------- 197# Header translation 198# --------------------------------------------------------------------------- 199 200def _translate_header(raw: str) -> str: 201 """Return the English key for a Chinese header, or a safe fallback.""" 202 stripped = raw.strip() 203 if stripped in ALL_COLUMNS: 204 return ALL_COLUMNS[stripped] 205 # Fallback: safe snake_case 206 safe = stripped.replace(" ", "_").replace("（", "_").replace("）", "").replace("／", "_") 207 return f"col_{safe[:30]}" if safe else "col_unknown" 208 209 210# --------------------------------------------------------------------------- 211# Single table parser 212# --------------------------------------------------------------------------- 213 214def _parse_single_table( 215 table, 216 period_label: str, 217 year: int, 218 season: int, 219 exchange: str, 220) -> list[dict]: 221 """Parse one <table> element into a list of row dicts.""" 222 rows = table.find_all("tr") 223 if len(rows) < 2: 224 return [] 225 226 # Build header list from first row 227 header_row = rows[0] 228 headers = [ 229 _translate_header(th.get_text(strip=True)) 230 for th in header_row.find_all(["th", "td"]) 231 ] 232 233 results = [] 234 for row in rows[1:]: 235 cells = row.find_all(["td", "th"]) 236 if not cells: 237 continue 238 values = [c.get_text(strip=True) for c in cells] 239 if len(values) != len(headers): 240 continue 241 242 record: dict = {field: None for field in OUTPUT_FIELDS} 243 record["period"] = period_label 244 record["year"] = year 245 record["season"] = season 246 record["exchange"] = exchange 247 248 for key, raw_val in zip(headers, values): 249 if key in ("stock_id", "company_name"): 250 record[key] = raw_val.strip() 251 elif key in record: 252 record[key] = _clean_number(raw_val) 253 # Unknown/extra columns are silently dropped 254 255 # Skip non-data rows: stock_id must be 4-digit numeric 256 stock_id = record.get("stock_id", "") or "" 257 if not stock_id or not stock_id.isdigit() or len(stock_id) < 4: 258 continue 259 260 results.append(record) 261 262 return results 263 264 265# --------------------------------------------------------------------------- 266# Main entry point 267# --------------------------------------------------------------------------- 268 269def parse_table( 270 html: str, 271 statement_type: str, 272 year: int, 273 season: int, 274 exchange: str, 275) -> list[dict]: 276 """ 277 Parse a MOPS financial statement HTML page into a list of row dicts. 278 279 MOPS returns multiple <table class='hasBorder'> elements on one page, 280 one per industry type. This function parses ALL of them and merges 281 the results into a single flat list. 282 283 Args: 284 html: Raw HTML string from MOPS response. 285 statement_type: One of "income", "balance", "cashflow". 286 year: Gregorian year (e.g. 2024). 287 season: Quarter number 1–4. 288 exchange: "listed" or "otc". 289 290 Returns: 291 List of dicts, one per company row, with English keys and typed values. 292 """ 293 if statement_type not in VALID_STATEMENT_TYPES: 294 raise ValueError( 295 f"Unknown statement_type '{statement_type}'. " 296 f"Valid options: {VALID_STATEMENT_TYPES}" 297 ) 298 299 soup = BeautifulSoup(html, "lxml") 300 tables = soup.find_all("table", class_="hasBorder") 301 302 if not tables: 303 # Fallback: try any table 304 tables = soup.find_all("table") 305 306 if not tables: 307 return [] 308 309 period_label = f"{year}Q{season}" 310 all_rows: list[dict] = [] 311 312 for table in tables: 313 rows = _parse_single_table(table, period_label, year, season, exchange) 314 all_rows.extend(rows) 315 316 # Deduplicate by stock_id (same company can appear in multiple tables 317 # during transition periods — keep first occurrence) 318 seen: set[str] = set() 319 deduped = [] 320 for row in all_rows: 321 sid = row.get("stock_id", "") 322 if sid not in seen: 323 seen.add(sid) 324 deduped.append(row) 325 326 return deduped

1import sys 2import os 3 4# Add src/ to path so tests can import fetcher/parser directly, 5# matching how Apify runs them (python src/main.py with src/ on path). 6sys.path.insert(0, os.path.join(os.path.dirname(__file__), "..", "src"))

1""" 2Tests for fetcher.py and PPE charge safety. 3 4fetcher.py makes real network calls via httpx. All tests mock the HTTP 5client so no network is needed. 6""" 7 8import pytest 9from unittest.mock import MagicMock, patch 10 11import httpx 12 13from fetcher import ( 14 _build_payload, 15 fetch_statement_html, 16 MOPS_BASE, 17 REDIRECT_ENDPOINT, 18) 19from parser import gregorian_to_roc 20 21 22# --------------------------------------------------------------------------- 23# Helpers 24# --------------------------------------------------------------------------- 25 26def _make_mock_client(redirect_url: str, html_body: str) -> MagicMock: 27 """Return a mock httpx.Client that simulates the 2-step MOPS flow.""" 28 client = MagicMock() 29 30 # Step 1 response: POST returns JSON with redirect URL 31 r1 = MagicMock() 32 r1.json.return_value = {"url": redirect_url} 33 r1.raise_for_status = MagicMock() 34 35 # Step 2 response: GET returns HTML 36 r2 = MagicMock() 37 r2.text = html_body 38 r2.raise_for_status = MagicMock() 39 40 client.post.return_value = r1 41 client.get.return_value = r2 42 return client 43 44 45# --------------------------------------------------------------------------- 46# Tests: _build_payload 47# --------------------------------------------------------------------------- 48 49class TestBuildPayload: 50 def test_income_statement_payload(self): 51 payload = _build_payload("income", 2024, 4, "listed") 52 assert payload["apiName"] == "ajax_t163sb04" 53 assert payload["parameters"]["TYPEK"] == "sii" 54 assert payload["parameters"]["year"] == str(gregorian_to_roc(2024)) 55 assert payload["parameters"]["season"] == "04" 56 57 def test_balance_sheet_payload(self): 58 payload = _build_payload("balance", 2024, 1, "otc") 59 assert payload["apiName"] == "ajax_t163sb05" 60 assert payload["parameters"]["TYPEK"] == "otc" 61 assert payload["parameters"]["season"] == "01" 62 63 def test_cashflow_payload(self): 64 payload = _build_payload("cashflow", 2023, 3, "listed") 65 assert payload["apiName"] == "ajax_t163sb20" 66 assert payload["parameters"]["year"] == str(gregorian_to_roc(2023)) 67 68 def test_season_zero_padded(self): 69 payload = _build_payload("income", 2024, 1, "listed") 70 assert payload["parameters"]["season"] == "01" 71 72 def test_season_two_digits_preserved(self): 73 payload = _build_payload("income", 2024, 4, "listed") 74 assert payload["parameters"]["season"] == "04" 75 76 def test_roc_year_conversion(self): 77 payload = _build_payload("income", 2024, 4, "listed") 78 assert payload["parameters"]["year"] == "113" # 2024 - 1911 = 113 79 80 def test_payload_has_required_keys(self): 81 payload = _build_payload("income", 2024, 4, "listed") 82 params = payload["parameters"] 83 for key in ("encodeURIComponent", "firstin", "off", "step", "isQuery", "TYPEK", "year", "season"): 84 assert key in params, f"Missing parameter key: {key}" 85 86 87# --------------------------------------------------------------------------- 88# Tests: fetch_statement_html 89# --------------------------------------------------------------------------- 90 91class TestFetchStatementHtml: 92 def test_returns_html_from_step2(self): 93 mock_client = _make_mock_client( 94 redirect_url="https://mops.twse.com.tw/mops/web/ajax_t163sb04?signed=abc", 95 html_body="<html><body><table>...</table></body></html>", 96 ) 97 result = fetch_statement_html("income", 2024, 4, "listed", client=mock_client) 98 assert "<table>" in result 99 100 def test_posts_to_redirect_endpoint(self): 101 mock_client = _make_mock_client( 102 redirect_url="https://mops.twse.com.tw/data", 103 html_body="<html></html>", 104 ) 105 fetch_statement_html("income", 2024, 4, "listed", client=mock_client) 106 mock_client.post.assert_called_once() 107 call_url = mock_client.post.call_args[0][0] 108 assert call_url == REDIRECT_ENDPOINT 109 110 def test_gets_redirect_url(self): 111 redirect_url = "https://mops.twse.com.tw/mops/web/ajax_t163sb04?signed=xyz" 112 mock_client = _make_mock_client(redirect_url=redirect_url, html_body="<html></html>") 113 fetch_statement_html("income", 2024, 4, "listed", client=mock_client) 114 mock_client.get.assert_called_once_with(redirect_url) 115 116 def test_relative_redirect_url_gets_base_prepended(self): 117 mock_client = _make_mock_client( 118 redirect_url="/mops/web/ajax_t163sb04?signed=xyz", 119 html_body="<html></html>", 120 ) 121 fetch_statement_html("income", 2024, 4, "listed", client=mock_client) 122 called_url = mock_client.get.call_args[0][0] 123 assert called_url.startswith(MOPS_BASE) 124 125 def test_raises_on_missing_redirect_url(self): 126 client = MagicMock() 127 r1 = MagicMock() 128 r1.json.return_value = {"status": "ok"} # no "url" key 129 r1.raise_for_status = MagicMock() 130 client.post.return_value = r1 131 with pytest.raises(ValueError, match="redirect URL"): 132 fetch_statement_html("income", 2024, 4, "listed", client=client) 133 134 def test_raises_on_http_error_step1(self): 135 client = MagicMock() 136 r1 = MagicMock() 137 r1.raise_for_status.side_effect = httpx.HTTPStatusError( 138 "503", request=MagicMock(), response=MagicMock() 139 ) 140 client.post.return_value = r1 141 with pytest.raises(httpx.HTTPStatusError): 142 fetch_statement_html("income", 2024, 4, "listed", client=client) 143 144 def test_raises_on_http_error_step2(self): 145 client = MagicMock() 146 r1 = MagicMock() 147 r1.json.return_value = {"url": "https://mops.twse.com.tw/data"} 148 r1.raise_for_status = MagicMock() 149 r2 = MagicMock() 150 r2.raise_for_status.side_effect = httpx.HTTPStatusError( 151 "404", request=MagicMock(), response=MagicMock() 152 ) 153 client.post.return_value = r1 154 client.get.return_value = r2 155 with pytest.raises(httpx.HTTPStatusError): 156 fetch_statement_html("income", 2024, 4, "listed", client=client) 157 158 def test_accepts_redirecturl_key_variant(self): 159 """MOPS might return 'redirectUrl' instead of 'url'.""" 160 client = MagicMock() 161 r1 = MagicMock() 162 r1.json.return_value = {"redirectUrl": "https://mops.twse.com.tw/data"} 163 r1.raise_for_status = MagicMock() 164 r2 = MagicMock() 165 r2.text = "<html>data</html>" 166 r2.raise_for_status = MagicMock() 167 client.post.return_value = r1 168 client.get.return_value = r2 169 result = fetch_statement_html("income", 2024, 4, "listed", client=client) 170 assert "data" in result 171 172 173# --------------------------------------------------------------------------- 174# Tests: PPE charge safety — inspect main.py source directly 175# --------------------------------------------------------------------------- 176 177class TestPPEChargeSafety: 178 def _load_main_source(self) -> str: 179 with open("src/main.py") as f: 180 return f.read() 181 182 def test_exactly_one_ppe_charge_call(self): 183 source = self._load_main_source() 184 charge_calls = [ 185 line.strip() 186 for line in source.splitlines() 187 if "push_data" in line and "data-fetched" in line 188 ] 189 assert len(charge_calls) == 1, ( 190 f"Expected exactly 1 PPE charge call, found {len(charge_calls)}: {charge_calls}" 191 ) 192 193 def test_charge_fires_only_when_successful(self): 194 source = self._load_main_source() 195 assert "successful_types" in source, ( 196 "main.py must guard PPE charge with a successful count/list" 197 ) 198 199 def test_free_push_exists_for_all_errors_case(self): 200 source = self._load_main_source() 201 lines = source.splitlines() 202 free_push_lines = [ 203 l.strip() for l in lines 204 if "push_data" in l and "data-fetched" not in l 205 ] 206 assert len(free_push_lines) >= 1, ( 207 "main.py must have a push_data(output) call without PPE charge" 208 ) 209 210 def test_event_name_is_data_fetched(self): 211 source = self._load_main_source() 212 assert '"data-fetched"' in source, ( 213 "PPE event name must be 'data-fetched'" 214 ) 215 216 def test_two_push_data_calls_total(self): 217 source = self._load_main_source() 218 push_calls = [l.strip() for l in source.splitlines() if "push_data" in l] 219 assert len(push_calls) == 2, ( 220 f"Expected exactly 2 push_data calls, found {len(push_calls)}: {push_calls}" 221 ) 222 223 def test_actor_fail_called_on_invalid_input(self): 224 source = self._load_main_source() 225 assert "Actor.fail" in source, "main.py must call Actor.fail() for invalid inputs" 226 227 def test_multiple_actor_fail_guards(self): 228 source = self._load_main_source() 229 fail_calls = [l.strip() for l in source.splitlines() if "Actor.fail" in l] 230 assert len(fail_calls) >= 3, ( 231 f"Expected at least 3 Actor.fail() calls (one per validation), " 232 f"found {len(fail_calls)}: {fail_calls}" 233 ) 234 235 def test_exchange_validated(self): 236 source = self._load_main_source() 237 assert "TYPEK_MAP" in source, "main.py must validate exchange against TYPEK_MAP" 238 239 def test_season_validated(self): 240 source = self._load_main_source() 241 assert "season not in (1, 2, 3, 4)" in source or "1, 2, 3, 4" in source, ( 242 "main.py must validate season is 1–4" 243 ) 244 245 def test_year_validated(self): 246 source = self._load_main_source() 247 assert "1912" in source, "main.py must validate year >= 1912" 248 249 def test_statement_types_validated_against_valid_list(self): 250 source = self._load_main_source() 251 assert "VALID_STATEMENT_TYPES" in source, ( 252 "main.py must validate statement_types against VALID_STATEMENT_TYPES" 253 ) 254 255 def test_no_charge_in_else_branch(self): 256 source = self._load_main_source() 257 lines = source.splitlines() 258 in_else = False 259 for line in lines: 260 stripped = line.strip() 261 if stripped == "else:": 262 in_else = True 263 continue 264 if in_else: 265 if stripped and not line.startswith(" "): 266 break 267 assert "data-fetched" not in stripped, ( 268 f"'data-fetched' found inside else branch: {stripped}" 269 )

1""" 2Unit tests for parser.py — pure functions, no network calls. 3""" 4 5import pytest 6from parser import ( 7 gregorian_to_roc, 8 roc_to_gregorian, 9 _clean_number, 10 _translate_header, 11 parse_table, 12 ALL_COLUMNS, 13 BALANCE_COLUMNS, 14 CASHFLOW_COLUMNS, 15 OUTPUT_FIELDS, 16 VALID_STATEMENT_TYPES, 17 API_NAME_MAP, 18 TYPEK_MAP, 19) 20 21 22# --------------------------------------------------------------------------- 23# Sample HTML fixture — minimal MOPS income statement table 24# --------------------------------------------------------------------------- 25 26SAMPLE_INCOME_HTML = """ 27<html><body> 28<table class="hasBorder"> 29<tr> 30 <th>公司代號</th> 31 <th>公司名稱</th> 32 <th>營業收入</th> 33 <th>營業毛利（毛損）</th> 34 <th>本期淨利（淨損）</th> 35 <th>基本每股盈餘（元）</th> 36</tr> 37<tr> 38 <td>2330</td><td>台積電</td><td>2,169,846,000</td><td>1,235,477,000</td><td>878,662,000</td><td>33.86</td> 39</tr> 40<tr> 41 <td>2317</td><td>鴻海</td><td>1,500,000,000</td><td>(50,000,000)</td><td>80,000,000</td><td>5.50</td> 42</tr> 43<tr> 44 <td>說明</td><td>合計</td><td>--</td><td>--</td><td>--</td><td>--</td> 45</tr> 46</table> 47</body></html> 48""" 49 50SAMPLE_MULTI_TABLE_HTML = """ 51<html><body> 52<table class="hasBorder"> 53<tr><th>公司代號</th><th>公司名稱</th><th>利息淨收益</th><th>本期淨利（淨損）</th><th>基本每股盈餘（元）</th></tr> 54<tr><td>2801</td><td>彰銀</td><td>5,000,000</td><td>11,000,000</td><td>1.00</td></tr> 55</table> 56<table class="hasBorder"> 57<tr><th>公司代號</th><th>公司名稱</th><th>營業收入</th><th>本期淨利（淨損）</th><th>基本每股盈餘（元）</th></tr> 58<tr><td>2330</td><td>台積電</td><td>2,000,000,000</td><td>800,000,000</td><td>30.80</td></tr> 59<tr><td>2317</td><td>鴻海</td><td>1,500,000,000</td><td>80,000,000</td><td>5.50</td></tr> 60</table> 61</body></html> 62""" 63 64SAMPLE_NO_TABLE_HTML = "<html><body><p>No data available</p></body></html>" 65 66 67# --------------------------------------------------------------------------- 68# Tests: gregorian_to_roc / roc_to_gregorian 69# --------------------------------------------------------------------------- 70 71class TestYearConversion: 72 def test_2024_to_113(self): 73 assert gregorian_to_roc(2024) == 113 74 75 def test_2025_to_114(self): 76 assert gregorian_to_roc(2025) == 114 77 78 def test_1912_to_1(self): 79 assert gregorian_to_roc(1912) == 1 80 81 def test_pre_roc_raises(self): 82 with pytest.raises(ValueError): 83 gregorian_to_roc(1911) 84 85 def test_roc_to_gregorian_113(self): 86 assert roc_to_gregorian(113) == 2024 87 88 def test_roundtrip(self): 89 assert roc_to_gregorian(gregorian_to_roc(2024)) == 2024 90 91 92# --------------------------------------------------------------------------- 93# Tests: _clean_number 94# --------------------------------------------------------------------------- 95 96class TestCleanNumber: 97 def test_plain_integer(self): 98 assert _clean_number("1234567") == 1234567.0 99 100 def test_comma_separated(self): 101 assert _clean_number("1,234,567") == 1234567.0 102 103 def test_negative_parentheses(self): 104 assert _clean_number("(56,789)") == -56789.0 105 106 def test_double_dash_is_none(self): 107 assert _clean_number("--") is None 108 109 def test_empty_string_is_none(self): 110 assert _clean_number("") is None 111 112 def test_whitespace_only_is_none(self): 113 assert _clean_number(" ") is None 114 115 def test_decimal_value(self): 116 assert _clean_number("33.86") == pytest.approx(33.86) 117 118 def test_negative_decimal(self): 119 assert _clean_number("(3.75)") == pytest.approx(-3.75) 120 121 def test_na_is_none(self): 122 assert _clean_number("N/A") is None 123 124 def test_full_width_dash_is_none(self): 125 assert _clean_number("－") is None 126 127 def test_zero_string(self): 128 assert _clean_number("0") == 0.0 129 130 131# --------------------------------------------------------------------------- 132# Tests: _translate_header 133# --------------------------------------------------------------------------- 134 135class TestTranslateHeader: 136 def test_known_chinese_header_stock_id(self): 137 assert _translate_header("公司代號") == "stock_id" 138 139 def test_known_chinese_header_revenue(self): 140 assert _translate_header("營業收入") == "revenue" 141 142 def test_known_chinese_header_net_income(self): 143 assert _translate_header("本期淨利（淨損）") == "net_income" 144 145 def test_bank_column_translated(self): 146 assert _translate_header("利息淨收益") == "interest_net_income" 147 148 def test_unknown_header_gets_fallback(self): 149 result = _translate_header("未知欄位") 150 assert result.startswith("col_") 151 152 def test_strips_whitespace(self): 153 assert _translate_header(" 公司名稱 ") == "company_name" 154 155 def test_eps_basic(self): 156 assert _translate_header("基本每股盈餘（元）") == "eps_basic" 157 158 159# --------------------------------------------------------------------------- 160# Tests: parse_table — single table 161# --------------------------------------------------------------------------- 162 163class TestParseTable: 164 def test_parses_two_data_rows(self): 165 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 166 assert len(rows) == 2 167 168 def test_stock_ids_correct(self): 169 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 170 assert rows[0]["stock_id"] == "2330" 171 assert rows[1]["stock_id"] == "2317" 172 173 def test_revenue_parsed_as_float(self): 174 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 175 assert rows[0]["revenue"] == pytest.approx(2169846000.0) 176 177 def test_negative_value_parsed_correctly(self): 178 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 179 assert rows[1]["gross_profit"] == pytest.approx(-50000000.0) 180 181 def test_eps_parsed_as_float(self): 182 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 183 assert rows[0]["eps_basic"] == pytest.approx(33.86) 184 185 def test_period_field_set(self): 186 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 187 assert rows[0]["period"] == "2024Q4" 188 189 def test_year_and_season_fields(self): 190 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 191 assert rows[0]["year"] == 2024 192 assert rows[0]["season"] == 4 193 194 def test_exchange_field(self): 195 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 196 assert rows[0]["exchange"] == "listed" 197 198 def test_summary_row_skipped(self): 199 rows = parse_table(SAMPLE_INCOME_HTML, "income", 2024, 4, "listed") 200 ids = [r["stock_id"] for r in rows] 201 assert "說明" not in ids 202 203 def test_no_table_returns_empty(self): 204 rows = parse_table(SAMPLE_NO_TABLE_HTML, "income", 2024, 4, "listed") 205 assert rows == [] 206 207 def test_unknown_statement_type_raises(self): 208 with pytest.raises(ValueError, match="Unknown statement_type"): 209 parse_table(SAMPLE_INCOME_HTML, "banana", 2024, 4, "listed") 210 211 def test_all_valid_statement_types_accepted(self): 212 for stype in VALID_STATEMENT_TYPES: 213 parse_table(SAMPLE_INCOME_HTML, stype, 2024, 4, "listed") 214 215 216# --------------------------------------------------------------------------- 217# Tests: parse_table — multi-table (the real MOPS structure) 218# --------------------------------------------------------------------------- 219 220class TestParseMultiTable: 221 def test_parses_all_tables(self): 222 rows = parse_table(SAMPLE_MULTI_TABLE_HTML, "income", 2024, 3, "listed") 223 assert len(rows) == 3 224 225 def test_bank_row_has_net_income(self): 226 rows = parse_table(SAMPLE_MULTI_TABLE_HTML, "income", 2024, 3, "listed") 227 bank = next(r for r in rows if r["stock_id"] == "2801") 228 assert bank["net_income"] == pytest.approx(11000000.0) 229 230 def test_bank_row_revenue_is_none(self): 231 # Banks don't report 營業收入, so revenue should be None 232 rows = parse_table(SAMPLE_MULTI_TABLE_HTML, "income", 2024, 3, "listed") 233 bank = next(r for r in rows if r["stock_id"] == "2801") 234 assert bank["revenue"] is None 235 236 def test_manufacturer_has_revenue(self): 237 rows = parse_table(SAMPLE_MULTI_TABLE_HTML, "income", 2024, 3, "listed") 238 tsmc = next(r for r in rows if r["stock_id"] == "2330") 239 assert tsmc["revenue"] == pytest.approx(2000000000.0) 240 241 def test_deduplication_keeps_first(self): 242 # If same stock_id appears in two tables, keep first occurrence only 243 dup_html = SAMPLE_MULTI_TABLE_HTML.replace("2317", "2801") 244 rows = parse_table(dup_html, "income", 2024, 3, "listed") 245 ids = [r["stock_id"] for r in rows] 246 assert ids.count("2801") == 1 247 248 def test_all_rows_have_output_fields(self): 249 from parser import OUTPUT_FIELDS 250 rows = parse_table(SAMPLE_MULTI_TABLE_HTML, "income", 2024, 3, "listed") 251 for row in rows: 252 for field in OUTPUT_FIELDS: 253 assert field in row, f"Missing field '{field}' in row {row['stock_id']}" 254 255 256# --------------------------------------------------------------------------- 257# Tests: Registry / constants integrity 258# --------------------------------------------------------------------------- 259 260class TestConstants: 261 def test_all_statement_types_have_api_name(self): 262 for stype in VALID_STATEMENT_TYPES: 263 assert stype in API_NAME_MAP 264 265 def test_all_api_names_start_with_ajax(self): 266 for name in API_NAME_MAP.values(): 267 assert name.startswith("ajax_") 268 269 def test_typek_map_has_listed_and_otc(self): 270 assert "listed" in TYPEK_MAP 271 assert "otc" in TYPEK_MAP 272 273 def test_typek_values(self): 274 assert TYPEK_MAP["listed"] == "sii" 275 assert TYPEK_MAP["otc"] == "otc" 276 277 def test_all_columns_has_stock_id(self): 278 assert "公司代號" in ALL_COLUMNS 279 assert ALL_COLUMNS["公司代號"] == "stock_id" 280 281 def test_valid_statement_types_count(self): 282 assert len(VALID_STATEMENT_TYPES) == 3 283 assert set(VALID_STATEMENT_TYPES) == {"income", "balance", "cashflow"} 284 285 def test_balance_columns_in_all_columns(self): 286 for zh, en in BALANCE_COLUMNS.items(): 287 assert zh in ALL_COLUMNS, f"{zh!r} missing from ALL_COLUMNS" 288 assert ALL_COLUMNS[zh] == en 289 290 def test_cashflow_columns_in_all_columns(self): 291 for zh, en in CASHFLOW_COLUMNS.items(): 292 assert zh in ALL_COLUMNS, f"{zh!r} missing from ALL_COLUMNS" 293 assert ALL_COLUMNS[zh] == en 294 295 def test_output_fields_contains_balance_fields(self): 296 for field in ("current_assets", "non_current_assets", "total_assets", 297 "current_liabilities", "non_current_liabilities", 298 "total_liabilities", "total_equity"): 299 assert field in OUTPUT_FIELDS, f"{field!r} missing from OUTPUT_FIELDS" 300 301 def test_output_fields_contains_cashflow_fields(self): 302 for field in ("operating_cash_flow", "investing_cash_flow", 303 "financing_cash_flow", "net_cash_change", "ending_cash"): 304 assert field in OUTPUT_FIELDS, f"{field!r} missing from OUTPUT_FIELDS" 305 306 def test_total_assets_variants_map_to_same_key(self): 307 assert ALL_COLUMNS["資產總計"] == "total_assets" 308 assert ALL_COLUMNS["資產總額"] == "total_assets" 309 310 def test_total_liabilities_variants_map_to_same_key(self): 311 assert ALL_COLUMNS["負債總計"] == "total_liabilities" 312 assert ALL_COLUMNS["負債總額"] == "total_liabilities" 313 314 def test_total_equity_variants_map_to_same_key(self): 315 assert ALL_COLUMNS["權益總計"] == "total_equity" 316 assert ALL_COLUMNS["權益總額"] == "total_equity" 317 318 319# --------------------------------------------------------------------------- 320# Sample HTML fixtures — balance sheet and cash flow 321# --------------------------------------------------------------------------- 322 323SAMPLE_BALANCE_HTML = """ 324<html><body> 325<table class="hasBorder"> 326<tr> 327 <th>公司代號</th> 328 <th>公司名稱</th> 329 <th>流動資產</th> 330 <th>非流動資產</th> 331 <th>資產總計</th> 332 <th>流動負債</th> 333 <th>非流動負債</th> 334 <th>負債總計</th> 335 <th>權益總計</th> 336</tr> 337<tr> 338 <td>2330</td><td>台積電</td> 339 <td>1,500,000,000</td><td>3,000,000,000</td><td>4,500,000,000</td> 340 <td>500,000,000</td><td>200,000,000</td><td>700,000,000</td><td>3,800,000,000</td> 341</tr> 342<tr> 343 <td>2317</td><td>鴻海</td> 344 <td>800,000,000</td><td>1,200,000,000</td><td>2,000,000,000</td> 345 <td>300,000,000</td><td>100,000,000</td><td>400,000,000</td><td>1,600,000,000</td> 346</tr> 347</table> 348</body></html> 349""" 350 351# Financial-sector balance sheet uses 總額 variants instead of 總計 352SAMPLE_BALANCE_BANK_HTML = """ 353<html><body> 354<table class="hasBorder"> 355<tr> 356 <th>公司代號</th> 357 <th>公司名稱</th> 358 <th>資產總額</th> 359 <th>負債總額</th> 360 <th>權益總額</th> 361</tr> 362<tr> 363 <td>2801</td><td>彰銀</td> 364 <td>3,000,000,000</td><td>2,700,000,000</td><td>300,000,000</td> 365</tr> 366</table> 367</body></html> 368""" 369 370SAMPLE_CASHFLOW_HTML = """ 371<html><body> 372<table class="hasBorder"> 373<tr> 374 <th>公司代號</th> 375 <th>公司名稱</th> 376 <th>營業活動之淨現金流入（流出）</th> 377 <th>投資活動之淨現金流入（流出）</th> 378 <th>籌資活動之淨現金流入（流出）</th> 379 <th>本期現金及約當現金增加（減少）數</th> 380 <th>期末現金及約當現金餘額</th> 381</tr> 382<tr> 383 <td>2330</td><td>台積電</td> 384 <td>1,000,000,000</td><td>(500,000,000)</td><td>(200,000,000)</td> 385 <td>300,000,000</td><td>400,000,000</td> 386</tr> 387</table> 388</body></html> 389""" 390 391 392# --------------------------------------------------------------------------- 393# Tests: parse_table — balance sheet 394# --------------------------------------------------------------------------- 395 396class TestParseBalanceSheet: 397 def test_parses_two_rows(self): 398 rows = parse_table(SAMPLE_BALANCE_HTML, "balance", 2024, 3, "listed") 399 assert len(rows) == 2 400 401 def test_total_assets_general_mfg(self): 402 rows = parse_table(SAMPLE_BALANCE_HTML, "balance", 2024, 3, "listed") 403 tsmc = next(r for r in rows if r["stock_id"] == "2330") 404 assert tsmc["total_assets"] == pytest.approx(4_500_000_000.0) 405 406 def test_current_assets(self): 407 rows = parse_table(SAMPLE_BALANCE_HTML, "balance", 2024, 3, "listed") 408 tsmc = next(r for r in rows if r["stock_id"] == "2330") 409 assert tsmc["current_assets"] == pytest.approx(1_500_000_000.0) 410 411 def test_non_current_assets(self): 412 rows = parse_table(SAMPLE_BALANCE_HTML, "balance", 2024, 3, "listed") 413 tsmc = next(r for r in rows if r["stock_id"] == "2330") 414 assert tsmc["non_current_assets"] == pytest.approx(3_000_000_000.0) 415 416 def test_total_liabilities(self): 417 rows = parse_table(SAMPLE_BALANCE_HTML, "balance", 2024, 3, "listed") 418 tsmc = next(r for r in rows if r["stock_id"] == "2330") 419 assert tsmc["total_liabilities"] == pytest.approx(700_000_000.0) 420 421 def test_total_equity(self): 422 rows = parse_table(SAMPLE_BALANCE_HTML, "balance", 2024, 3, "listed") 423 tsmc = next(r for r in rows if r["stock_id"] == "2330") 424 assert tsmc["total_equity"] == pytest.approx(3_800_000_000.0) 425 426 def test_income_fields_are_none_for_balance(self): 427 rows = parse_table(SAMPLE_BALANCE_HTML, "balance", 2024, 3, "listed") 428 tsmc = next(r for r in rows if r["stock_id"] == "2330") 429 assert tsmc["revenue"] is None 430 assert tsmc["net_income"] is None 431 assert tsmc["eps_basic"] is None 432 433 def test_bank_total_assets_variant(self): 434 rows = parse_table(SAMPLE_BALANCE_BANK_HTML, "balance", 2024, 3, "listed") 435 bank = next(r for r in rows if r["stock_id"] == "2801") 436 assert bank["total_assets"] == pytest.approx(3_000_000_000.0) 437 438 def test_bank_total_liabilities_variant(self): 439 rows = parse_table(SAMPLE_BALANCE_BANK_HTML, "balance", 2024, 3, "listed") 440 bank = next(r for r in rows if r["stock_id"] == "2801") 441 assert bank["total_liabilities"] == pytest.approx(2_700_000_000.0) 442 443 def test_bank_total_equity_variant(self): 444 rows = parse_table(SAMPLE_BALANCE_BANK_HTML, "balance", 2024, 3, "listed") 445 bank = next(r for r in rows if r["stock_id"] == "2801") 446 assert bank["total_equity"] == pytest.approx(300_000_000.0) 447 448 def test_all_output_fields_present(self): 449 rows = parse_table(SAMPLE_BALANCE_HTML, "balance", 2024, 3, "listed") 450 for row in rows: 451 for field in OUTPUT_FIELDS: 452 assert field in row, f"Field {field!r} missing from balance row" 453 454 455# --------------------------------------------------------------------------- 456# Tests: parse_table — cash flow 457# --------------------------------------------------------------------------- 458 459class TestParseCashFlow: 460 def test_parses_one_row(self): 461 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 462 assert len(rows) == 1 463 464 def test_operating_cash_flow(self): 465 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 466 assert rows[0]["operating_cash_flow"] == pytest.approx(1_000_000_000.0) 467 468 def test_investing_cash_flow_negative(self): 469 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 470 assert rows[0]["investing_cash_flow"] == pytest.approx(-500_000_000.0) 471 472 def test_financing_cash_flow_negative(self): 473 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 474 assert rows[0]["financing_cash_flow"] == pytest.approx(-200_000_000.0) 475 476 def test_net_cash_change(self): 477 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 478 assert rows[0]["net_cash_change"] == pytest.approx(300_000_000.0) 479 480 def test_ending_cash(self): 481 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 482 assert rows[0]["ending_cash"] == pytest.approx(400_000_000.0) 483 484 def test_income_fields_are_none_for_cashflow(self): 485 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 486 assert rows[0]["revenue"] is None 487 assert rows[0]["gross_profit"] is None 488 489 def test_balance_fields_are_none_for_cashflow(self): 490 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 491 assert rows[0]["total_assets"] is None 492 assert rows[0]["total_equity"] is None 493 494 def test_all_output_fields_present(self): 495 rows = parse_table(SAMPLE_CASHFLOW_HTML, "cashflow", 2024, 3, "listed") 496 for row in rows: 497 for field in OUTPUT_FIELDS: 498 assert field in row, f"Field {field!r} missing from cashflow row"