# File Operations

SyncDB can export query results to local files and import files into database tables. It also exposes `FileTransfer` for file-to-file conversions with no database involved.

## Supported formats

| Format | Extension | Extra required | Notes |
|--------|-----------|---------------|-------|
| CSV | `.csv` | none | Values are strings; no type inference |
| Parquet | `.parquet` | `pandas`, `pyarrow` | Preserves types; best for large data |
| Excel | `.xlsx`, `.xls` | `pandas`, `openpyxl` | Human-readable; slow for large files |
| Pickle | `.pickle` | none | Python-only; not portable. **Security warning: loading a Pickle file executes arbitrary Python bytecode — never load files from untrusted sources.** |

Install file extras:

```bash
pip install "Qubdi-SyncDB[files]"   # pandas + pyarrow + openpyxl
```

## Export: Database → File

```python
# SQL string
sync.export_query_to_file(
    query="SELECT * FROM dbo.orders WHERE status = 'shipped'",
    output_path="exports/shipped_orders.parquet",
)

# Path to a .sql file — its contents are read and executed
sync.export_query_to_file(
    query="queries/shipped_orders.sql",
    output_path="exports/shipped_orders.parquet",
)

# Parameterized query (prevents SQL injection)
sync.export_query_to_file(
    query="SELECT * FROM dbo.orders WHERE region = ? AND year = ?",
    params=["US", 2024],
    output_path="us_orders_2024.parquet",
)

# Explicit format override (ignores extension)
sync.export_query_to_file(
    query="SELECT * FROM dbo.summary",
    output_path="summary.dat",
    file_format="excel",
)
```

Output parent directories are created automatically — no need to `mkdir` beforehand.

Returns the number of rows written.

## Import: File → Database

```python
# Append rows from a Parquet file into a PostgreSQL table
sync.import_file_to_table(
    input_path="exports/shipped_orders.parquet",
    destination="public.shipped_orders",
)

# Truncate first, then load
sync.import_file_to_table(
    input_path="customers.csv",
    destination="public.customers",
    fresh_insert=True,
)
```

If the target table does not exist, SyncDB creates it using column types inferred from the first row of data.

Returns the number of rows inserted.

## File-to-file conversion

Use `FileTransfer` directly for format conversion with no database:

```python
from syncdb import FileTransfer

ft = FileTransfer()

# Read any supported format — returns list[dict]
rows = ft.read("data.csv")
rows = ft.read("data.parquet")
rows = ft.read("data.xlsx")

# Write any supported format
ft.write(rows, "output.parquet")
ft.write(rows, "output.csv")
ft.write(rows, "output.xlsx")
```

### Common conversions

```python
ft = FileTransfer()

# CSV → Parquet (preserves column types from the first read)
rows = ft.read("data.csv")
ft.write(rows, "data.parquet")

# Parquet → Excel
rows = ft.read("report.parquet")
ft.write(rows, "report.xlsx")

# Filter rows in Python before writing
rows = ft.read("orders.parquet")
active = [r for r in rows if r["status"] == "active"]
ft.write(active, "active_orders.csv")
```

## Pipeline pattern: export → transform → import

```python
from syncdb import DatabaseConfig, FileTransfer, SyncDB

mssql_cfg = DatabaseConfig(engine="mssql", connection_string="...")
pg_cfg     = DatabaseConfig(engine="postgresql", connection_string="...")

# Step 1: export from MSSQL
src_sync = SyncDB(source=mssql_cfg)
src_sync.export_query_to_file(
    query="SELECT * FROM dbo.daily_report WHERE report_date = ?",
    params=["2024-12-31"],
    output_path="daily_report.parquet",
)

# Step 2: transform in Python
ft = FileTransfer()
rows = ft.read("daily_report.parquet")
rows = [r for r in rows if r["revenue"] > 0]   # drop zero-revenue rows
ft.write(rows, "daily_report_clean.parquet")

# Step 3: load into PostgreSQL
dst_sync = SyncDB(target=pg_cfg)
dst_sync.import_file_to_table(
    input_path="daily_report_clean.parquet",
    destination="public.daily_report",
    fresh_insert=False,
)
```